stephenbrooks.orgForumMuon1GeneralSeparate 443c vs 443d results?
Username: Password:
Search site:
Subscribe to thread via RSS
2007-09-09 02:30:03
So - as in the other thread, it's been pretty much established that 443c is "looser" and provides a dandy top-notch score that no one can improve upon if using 443d.

Ergo - since 443d is the version selected by those interested in the science - why not segregate it for those trying to move the yield forward from those interested in occupying a top spot regardless of quality?

If 443d is the client du jour - I for one would like to see 443c results moved to a (*) status - or some "second tier" page.  It's a little frustrating attempting to conjur up a "winning" 443d design - only to see a 443c result sitting on the #1 spot - just because it's a 443c. There could be a whole new page set up for the category of " (*) lame... but we'll acknowledge it nonetheless since someone can't or won't upgrade."

I suspect that only about 5 people would argue with this logic.  I like scores as well as the next guy - but if I've produced the #1 443d result (as I have many times) - I'd like to see the score unsullied by 443c's that won't be included in the optimization (since we already know the 443c is "lame" in that regard).  If the score doesn't reflect "real" data that will move the project forward - then it's garbage... and shouldn't be afforded a "top billing" spot IMHO.

OK - I'm better now... I don't have vast Mpts production - so all I have left is quality.  Tired of being #5 when I earned #1.

Stephen Brooks
2007-09-09 14:31:21
Guess I could change the "filter" that calculates the max.  score to only accept v4.43d results, though that would only apply to new results being submitted becuase otherwise I'd have to recount the entire 30GB or something database, taking the stats off for probably the best part of a day!
2007-09-09 17:54:57
I'd have to think that'd be a suitable solution, Stephen.  There's certainly no cause for shutting down the stats.  The progression of the optimization would eliminate the "c" scores from the top results soon enough.  I know it's nit-picky and whiny, but seems a better solution than going back to a "c" client on a machine just to satisfy my need for being in the top 10 results. 

I'm guessing there's a couple of other manual optimizers that'd agree... and some "c" users that won't.
Stephen Brooks
2007-09-10 12:10:49
Well, I made a change and we're trying it.  There may be complications in earlier optimisations for which there were never any v4.43d results, though.
2007-09-10 15:12:43
You think that is scientific and logical correct to favour cosmetic stats manipulation instead of looking for the real source of the problem in the first place?
Stephen Brooks
2007-09-10 15:55:46
Well I'm not going to be able to retrospectively change v4.43c!
2007-09-10 19:30:30
Much more of a problem, of late, has been erroneous results appearing at the head of the sample files.  At the present time, I am unable to reproduce the top three sample results, all generated by [TA]amd.borg.  The following two results of LAURENU2[] are reproducible, so the problem is probably not with my PC.
2007-09-10 21:21:26
Hmmm... and I thought I was the only one having difficulty optimizing some of those designs.  Not only do I not get improvement, the yield is VASTLY lower than anticipated.  Everything seems to stagnate for a bit - and then... off we go again for no apparent reason.
2007-09-11 00:47:23
Nor, is the current high score of [DPC]ZMT~Mmikie reproducible.  Guess I shall have to run without the benefit of sample files.
2007-09-11 05:48:10
Thanks for the years of tweaking.
Stephen Brooks
2007-09-11 12:02:31
Well it looks like the small deviations in score are from the random numbers used in Muon1 not quite being deterministic for a given seed yet.  If you remember before v4.43, all simulations were with a random seed and if you re-tried the same one you would get a range.  Right now the results in the sample file look like they're still within the statistical range of a "correct" result.  This may also be the difference between the c and d versions: seed and seedpitch aren't specifying the random sequence completely deterministically.

Anyway that means we should check two things (at least) -

1. If you retry the same result off the sample files on _your_ machine, does it give the same result over and over?  And is the effected by whether it goes into recheck or not?

2. Does threading have an effect, i.e. is there some race condition in the code I've not managed to eliminate yet meaning results for multiple threads aren't consistent.

Also I assume when DanC means "vastly lower" he means quite a bit lower, still within the random range, and not "totally wrong". You ought to be able to reproduce the old-style random range by making a copy of the lattice file and deleting the "seed" and "seedpitch" parameters, then running that.

I'm going to have another look at the way the seeds are set up now.  If it's still correct, then we have a problem with "alpha particles" or overclocked machines causing bit errors, which means the sample files ought to be buffered and sent back for verification by a couple other people before being used.  So you'd have two "levels" of them, one verified and one needing verification.  But I won't do that unless it's actually necessary.
2007-09-11 12:46:27
With the current top sample result the difference is approx.  0.75%

1.151338 (612.7 Mpts) [v4.43d] <DecayRotB> #time=76901; by [TA]amd.borg

1.160353,1.090150,1.127877,1.140341,1.165784 (612.1 Mpts) [v4.43d] <DecayRotB> {01236D2932548A48FECAD654}

1.151338 Vs.  1.142857.

Regarding the determinism for a given seed, from my current results.txt:
0.761746 (102.9 Mpts) [v4.43d] <DecayRotB> {E395BEED2832EF050CFB0313}
-0.373532 (71.4 Mpts) [v4.43d] <DecayRotB> {C045866F05DDE3892A9B4D30}
1.148358 (612.3 Mpts) [v4.43d] <DecayRotB> {8F0FB7159054F18035CCD58B} *
1.144381 (611.8 Mpts) [v4.43d] <DecayRotB> {213DFFA1B3923D70E1B55C05}
1.142712 (611.7 Mpts) [v4.43d] <DecayRotB> {65A0B01F8756015ACEF59A3A}
1.147544 (611.9 Mpts) [v4.43d] <DecayRotB> {BA7B7F91F9145DB1BDD0406A}
1.142670 (122.3 Mpts) [v4.43d] <DecayRotB> {E91F2B24F386E9F080621C55}
1.144585 (611.9 Mpts) [v4.43d] <DecayRotB> {246847A0FA03FE3EED0B2A24}
1.148358 (612.3 Mpts) [v4.43d] <DecayRotB> {8F0FB7159054F18035CCD58B} *
-1.478999 (23.3 Mpts) [v4.43d] <DecayRotB> {20F287F6400EEFF0B815911F}
1.144974 (611.8 Mpts) [v4.43d] <DecayRotB> {9E8BF8C139FCD9C982F6A9D8}
1.145642 (122.6 Mpts) [v4.43d] <DecayRotB> {7CBAC4AB32ACCD326A26E3F2}
1.140512 (611.3 Mpts) [v4.43d] <DecayRotB> {01085EE48D6A0B3397C8F603}

There are two identical results of 1.148358, one of which will be lost on send.

I shall endeavour to retry both this file producing 1.148358 and that of amd.borg run single threaded. 
2007-09-11 13:57:06
Results from re-run (single-threaded) of amd.borgs sample file:
1.160353,1.090150,1.127877,1.140341,1.165784 (612.1 Mpts) [v4.43d] <DecayRotB> {01236D2932548A48FECAD654}

No difference in individual scores.
Stephen Brooks
2007-09-11 18:50:00
OK, thanks a lot for that.  [I'm glad the 0.75% difference was a RELATIVE error!]

We can also estimate a standard deviation, with the current number of particles, is about 0.030% (absolute, 2.7% relative) and 0.014% (1.2% relative) for a rechecked result.

Remembering that Muon1 averages the central 3 values out of the 5 rechecks, your correct 1.142857 comes from runs #1,3,4. The value 1.151338 from AMD.borg can be reproduced by taking the average of runs #1,3,5, which can be explained if run#4's value for him was replaced by an erroneous larger value, leaving those as the "central" three.

So we see evidence of "alpha particles" or bit errors - but I carefully chose the "average of the central three" technique to guard against 1 erroneous simulation in 5, so all it does is move the result a small way well inside the statistical range.  This could be improved by a double/triple-checked sample file system, which will probably require a future update of the client.

The business of getting two "clone" results with the same checksum indicates there is some crowding going on, with the program checking the same genome twice, and because Muon1 is deterministic (seeded random) now, it really does give you exactly the same result.  That is one I will see if I can rectify immediately (though I'm trying to just release a catch-up/maintenance version soon).
2007-09-12 14:57:43
1.152491 (612.1 Mpts) [v4.43d] <DecayRotB> #time=76958; by [TA]nefariouscaine

1.164966,1.086145,1.134550,1.145838,1.166066 (612.3 Mpts) [v4.43d] <DecayRotB> {68016699AD4A315988247460}

1.152491 vs. 1.148451

Again today's highest sample result cannot be replicated.  Perhaps it would be instructive, and beneficial to future versions, if an analysis of the optimizers sensitivity to these erroneous results could be carried out.
Stephen Brooks
2007-09-12 16:33:07
Well it's hopefully not too sensitive to a relative difference of 0.4% like the above, because to achieve those sorts of accuracies statistically we'd need to run millions of particles per simulation.  So it is going to have to cope with some noise whatever we do, and I wrote the whole optimiser with noise in mind, so I don't try to do "unstable" things that rely on the results being accurate to 6 decimal places.  I have done a few tests before, but whether the noise actually hampers the optimiser depends on a lot of factors in the problem.

Right now I'm running one of the PhaseRotDD sampleresults on version 4.43c and d, and "v4.43e", which is actually the development version soon to become 4.44. I'm looking at repeatability issues.  The results ought to be repeatable now, except for the ones like you have above, which look like they're caused by a bit-flip error on for example an overclocked system.
Stephen Brooks
2007-09-13 14:37:52
I've found that in the current (beta) codebase, results are repeatable until you change the number of threads Muon1 is configured to use, at which point they seem to use the random seed differently.  I'll try and fix this, hopefully it'll clear up some other issues.
2007-09-15 00:51:07
Stephen - is there any way to identify a "bit-flip error" in the results?  (I'm guessing not)

Just watching the behavior of my clients - it seems like these may in part be helping us to achieve that "dandy" stair-stepping approach to this optimization.  The client tries repeatedly to optimize a design that had yielded a higher muon percentage than it should have... and gets "stuck." Had it not been for the error, that path may have been abandoned by the client sooner - or... am I missing something?

I took one client offline, and deleted the samplefiles.  It's almost tempting to delete results.dat as well - and let it develop it's own path.  I certainly wish I had a few Teraflops to toss at that approach - but alas - only 3 machines... so I must get creative.

Stephen Brooks
2007-09-18 20:27:57
I have just found something a bit odd (and annoying) while testing the code.  Does anyone know how the "rounding mode" is set on an x87 FPU when there are multiple threads running?  I'm worried that for certain functions such as taking the floor() of a value, that function temporarily puts the FPU into a different rounding mode than usual, but if another thread takes control during that time, the calculation in that other thread also inherits the different rounding mode!

There ought to be some way to stop this... Right now my repeatability tests find the last few significant bits of the particle coordinates have been twiddled between 1 and 4 threads.
2007-09-18 21:31:53
*waves to Dan*

Can someone explain what is meant by 'random seed' ?

>>> results are repeatable until you change the number of threads Muon1 is configured to use, at which point they seem to use the random seed differently<<<

What's the consequence of this?
Stephen Brooks
2007-09-18 23:38:23
Not sure at present whether this floating point rounding error is the same as the "using the random seed differently" error.  One produces an error that is equivalent to running a valid simulation, but with a different implementation of the random number generator; the other produces an error that is possibly even smaller still.

As for more annoyances, starting tomorrow I'm on jury service, which probably lasts 2 weeks.  I can check in to fix the stats thing, but probably won't be able to release a new version until after I'm back.
2007-10-12 12:19:42
Stephen, are you any closer to releasing the proposed maintenance upgrade? 
Stephen Brooks
2007-10-16 19:49:17
Yes, I need to get back to that.  Had done a bit of work on improved muon magnets (how to bend muons without the beam spraying around) that could go into Muon1 later in the interim.  It looks like tomorrow is going to be dedicated to rather mirky debugging.
Stephen Brooks
2007-11-09 16:50:03
OK just a quick update.  I emulated the "floor" function beautifully without inducing FPU flag changes.  Annoyingly, using floor_safe instead of floor did absolutely nothing to the descrepant behaviour in my run between 1 and 4 threads, so it's back to the drawing board, and most likely ripping the guts out of Muon1 until I see what particular duodenum is causing the difference between the two.
2007-11-09 23:50:39
thanks for the update Stephen!  /happy hunting
Stephen Brooks
2007-11-12 10:10:55
The end result of last Friday was that I could vary the number and location of the descrepant particles (between 1 and 4 threads) by commenting out parts of one of the innermost routines in Muon1, and today will concentrate on tracking exactly which bits of it are compiling to inconsistently-running code.
Stephen Brooks
2007-11-27 12:47:52
Fixed it!

It turns out Windows decides that when it creates a new thread it will set the FPU precision to 53 bits (64/double) even though the default behaviour for the main thread of programs is to use 64 bits (80/long double).  This has to be one of the stupidest bits of design I've seen for a while, sadly only beaten by the behaviour of Direct3D that changes the FPU to 23 bits (32/float) when you initialise the library!

This is fixed by starting each newly-created thread with the call
_controlfp(_PC_64,_MCW_PC); // #include <float.h> for this

Anyway I'm now getting consistent results when I vary the numbers of threads.  Testing of the new version is in progress but I expect it will differ slightly from the old 4.43c and 4.43d version just because of logical changes to the algorithm.
2007-11-28 06:16:37
Bravo Stephen!  Nicely done. 

Told you so.  Told you you'd find it.  LOL.
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.orgTwitter: stephenjbrooksMastodon: strudel charactersjbstrudel RSS feed

Site has had 26080403 accesses.