stephenbrooks.org : Newest High Muon1 Yield

Jwb52z 2002-12-05 15:12:22	Stephen, can you have this newest high checked so we can know if it is real this time?
Stephen Brooks 2002-12-05 15:34:14	If it's above 3.2%, it's probably not real # I'm thinking about making the next version of Muon re-check its results, so we'll actually start from scratch again (or at least a new sub-column on the stats table). There will also be a way that people can put the old designs into a queue for the new version to check out, so we don't have to evolve all the way from 0% again. "As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"
Zonar 2002-12-06 04:33:53	quote: Originally posted by Stephen Brooks: I'm thinking about making the next version of Muon re-check its results, so we'll actually start from scratch again (or at least a new sub-column on the stats table). There will also be a way that people can put the old designs into a queue for the new version to check out, so we don't have to evolve all the way from 0% again. How do you want to do that checking? Doing every run twice, or when it's above a certain value (propably when it's among the best results in the results.dat file)? Or do I misunderstand you?
DukeBox 2002-12-06 08:49:18	How are the 'false' high muons handled ? Do people still get the higher credit for them (also the multiplier ?). If so, i'll quit muon cause the stats and contest element is the only thing why i'm participating.
David Bass 2002-12-06 12:10:22	If you read the discussions concerning the statistical analysis of the results so far, you will see that some progress is being made to resolve the issue of abnormally high readings. I would say that, at the moment, there are only a very few readings that appear to be so far out of band that they are likely to be real errors. The remaining high values are likely to be artefacts of the random nature of the simulation - when many systems are converging on the same solution, then the throw of the dice will ensure that the same factory configuration will produce results that will occasionally differ by significant margins. That is merley the "luck of the draw". As for better statistics, the current method for scoring simulations is, with all due respect to Stephen, a poor one produced in a hurry to reduce the impact of the non-linear effort requirements for high-efficiency factories, although it is far better than the original, raw count of particles simulated. A similar method to that adopted by UD of measuring cpu time and weighting it by a semi-arbitrary hardware power estimate to produce "points" might be appropriate, given that the nature of the simulation precludes an accurate estimation of the effort expended based on information currently returned. The problem in implementing it is likely to be one of the programming effort involved. Phew, what a long one
AySz88 2002-12-06 12:30:24	Last I heard, there's less than 20 of these abnormal results, so they really don't affect the statistics and ranking much at all.

Herb[Romulus2]
2002-12-06 13:09:04

Even if it's a false positive result, the credit given isn't wrong that much, though maybe inaccurate, but distributed evenly then.

I think, it's still a fair contest and all participants are treated in the same manner

For a one mans project of that kind of stuff, I think it's simply great

your milage may vary

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.

DukeBox
2002-12-06 13:24:42

So, why not manually edit your results.dat, and change al your 3.0x% to 3.5% wich has a higher multiplier, so you get more points ?

It's just to easy to cheat !

Sorry, but i'm afraid a lot of people will 'use' this bug to get higher in the stats.

Stephen Brooks
2002-12-06 15:02:43

quote:
Originally posted by Zonar:
How do you want to do that checking? Doing every run twice, or when it's above a certain value (propably when it's among the best results in the results.dat file)?

Yes, that's the idea. If it's already one of the lower results then no need to do anything: it'll be low regardless of any fluctuation. If it's near the top then I would make it be re-tested another 3 or 4 times (although users could specify an even higher number if they want), and then the final result that gets sent through would count for 4 or 5 times as many particles to give credit for that extra work, and would have as a percentage probably some sort of average of the results of the individual runs. I say "some sort of" average because I'd not want to average in the odd run that was an extreme outlier caused by a computer error. The median of 5 would do, or perhaps the mean of the middle three.

Probably a good heuristic here would be to re-test if a design scores within 0.1% of the best result in results.dat on its first run (it may lose a few by chance, but much lower and we'd be re-checking everything, which is inefficient if all I'm interested in is the best). Note that the highest result in a file such as this will _always_ have been re-tested.

So with that in place, it'd be fairly simple to find the highest _and_ most-reliable (as in most-rechecked) results and use those as the final recommendation.

~~~~~

In v5 I was planning to count the number of particle-timesteps done in a simulation, which will be fairly close to proportional to raw FLOP count. This will be a huge number (I'd guess roughly 1 billion per simulation) so I'll scale it down by 10^6 and report those instead of the "number of particles" field. Anyway that will make the scoring pretty much as fair as it gets for a complex DC project.

I've cheated and gained about 10000 points on UD by running Muon at a slightly higher priority because UD counts "effort" as wall-clock time multiplied by some factor

Mostly because I couldn't be bothered to uninstall the UD agent so it just ticked away, thinking it was getting 400MHz!

John Kitchen was worried about the current ad hoc "bonus multipliers" system so plotted a graph of CPU time per point against yield. I think it turned out that the higher percentage runs are actually looked on slightly more favourably than the lower ones, encouraging use of the best1000 etc. The fast/low random results were scoring a bit more than they should, but unless people do some rather complex cheating, they'll get a fixed 25% of those in the current client.

~~~~~

--[So, why not manually edit your results.dat, and change al your 3.0x% to 3.5% wich has a higher multiplier, so you get more points ?]--

You could do this at any stage. I haven't yet installed any anti-cheat measures (apart from repeat removal), although from the look of the results it doesn't seem like anyone is cheating. I've already written the checksumming code, as it happens, and what remains is to finish off the remaining v4.22 features, debug and compile it.
But to be honest since this project has already found something very near the top design I haven't seen a lot of point in doing this. However my supervisor has said that if I count the hours I spend on coding these versions this holiday (even though I'm not formally employed by them until Easter), I'll get payed consultancy rates for it, so you see where I'm headed...

But I would hazard a guess that there is little or no deliberate cheating on this project and all the high-ranking users really are genuine. I haven't seen any "obvious" cheats like large numbers of results with particles or % muons in great excess of what they should be: the only times they are unnaturally high are so rare and distributed amongst users in such a way that they appear to be glitches and don't really effect the stats anyway. If you don't like it as it is, come back mid-2003? I'll have been able to get my act together by then

"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"

[DPC] Jiriki 2002-12-07 08:18:04	I'm one of the people who probably had a glitch, bug or whatever. Should I delete the result (3,560326) from my results.dat file? BTW if it helps, I got this result using a self compiled linux client. All other results from this client looked normal but I stopped using it for the moment and switched back to the supplied one.
DukeBox 2002-12-07 11:47:52	I was using the original client.. with all my doubtful results, with my own client wich has no prediction and save implementation, i never have strange results. Maybe the bug is in the '.sav' file wich can become corrupted when shutting down while saving ? The only thing i would like is that all 'high' results are recalculated so we can see who has the real highest muon.
Stephen Brooks 2002-12-07 16:30:46	If you have a suspicious result, one way to tell if it's fake is to get another Muon installation, and create a results.dat file containing ONLY that result, and then letting that one run a few times. If in 10-20 runs it never gets anything near your original muon percentage, it's probably wrong. I'd suggest deleting any of these strays that you notice, although I'm not saying you _have_ to go and search your result-files for them. The SAV file issue might be causing this and the current source I have here uses an atomic "rename" operation (does it? - checks - yes it does) which I think is going to reduce the likelihood of it ever reading in a corrupted file. "As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"

David
2002-12-10 03:24:51

quote:
Originally posted by Stephen Brooks:
[snip]
In v5 I was planning to count the number of particle-timesteps done in a simulation, which will be fairly close to proportional to raw FLOP count. This will be a huge number (I'd guess roughly 1 billion per simulation) so I'll scale it down by 10^6 and report those instead of the "number of particles" field. Anyway that will make the scoring pretty much as fair as it gets for a complex DC project.

This is hugely better - ISTR that you contemplated doign this back when the issue first came up, but hacked in the current system to get by.

quote:

I've cheated and gained about 10000 points on UD by running Muon at a slightly higher priority because UD counts "effort" as wall-clock time multiplied by some factor Mostly because I couldn't be bothered to uninstall the UD agent so it just ticked away, thinking it was getting 400MHz!

Well, no method is perfect - I suspect that the one you have indicated is better than most.

Looking forward to getting going with a new client version.

David