stephenbrooks.orgForumMuon1GeneralTop results database!
Username: Password:
Search site:
Subscribe to thread via RSS
[DPC]Stephan202
2003-05-20 13:52:28
The DPC have made a collection of their best results.  This database is as of now open to the public!

Visit http://stephan202.qik.nl/ to download the results and add your own.

Note!  Some people on this forum believe that this method is not good for the scientific side of the project!  Please read about the pro's and con's and make a decision for your own!

---
Dutch Power Cow.
MOOH!
Herb[Romulus2]
2003-05-20 21:57:07
That's a kind move of DPC, very much appreciatedSmile

Saves me from messing around with our own "250 best of" manually every dayWink

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.
MaFi
2003-05-21 07:24:06
hi
i can't reach "stephan202.qik.nl" via http.  i can ping it (137.224.222.232, is this right?), so i think it's not a typing error.

markus
Herb[Romulus2]
2003-05-21 08:08:00
F***** up this afternoon, but now working again.

Impressive progress in the database over the last 8 hoursSmile

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.
Pollock[Romulus2]
2003-05-21 08:41:38
Thanks Stephan, great idea!  It is nice to see someone not self-serving and doing something to better the project.  I am sure that most of us running the muon project are doing so because it accomplishes a REAL purpose rather than just a silly contest to put money in somebody's pocket by finding worthless keys, etc.
Thanks, again.

Suggestion: It may be wise to remove the 13.525438 result from the top of the list.  It appears that it is likely a 'rogue' result and may be contaminating many others.  If people leave that result in their results.dat file, none of the other results below it are re-checked.  It could spawn thousands of unknown 'rogues' that get past the verification check. 

In my case, I just cut out the 50 best results over 1000Mpts.  and used that as a .dat file.  In the future, I will send you everything that can be verified.  We all want to reach a solution, but I hope we can do it without a massive re-check to make sure it is a good solution.

[This message was edited by Pollock[Romulus2] on 2003-May-21 at 17:05.]
[DPC]Stephan202
2003-05-21 11:06:31
At times the server may be down.  The server is also used for online games and the script sometimes asks for a 100% CPU load Wink

As for removing some results from the database: I will need to extent the update script for that purpose, which will make it even slower.  I will think about it and consider some filter methods.

What do you think about:
- only accepting V4.31c results?
- only accepting rechecked results?
- only accepting results with a resonable yield*runs:mpts ratio?

---
Dutch Power Cow.
MOOH!
Pollock[Romulus2]
2003-05-21 12:35:23
- only accepting results with a resonable yield*runs:mpts ratio?
That would be the closest solution, but it would likely not work at this point.  Most of the fresh results are not being checked, though, so that would not really be feasible.  I think the only real solution would be a change in the software to allow broader re-checks.  It would slow things down, but it would also be more accurate. 
Maybe others would have some better ideas?

It looks as if some 13.7-13.9% are already showing up, so maybe that will open things up.
[DPC]WAU-Spons
2003-05-21 14:00:31
Don't forget.  accepting only 4.31c will mean that all unix results are also rejected.  I think the other 2 options will rule out most rogues
Bill[Romulus2]
2003-05-21 15:26:26
If unix results are to be ignored, please notify us in advance.  No point in running a box whose results are worthless.  I'm currently using the Solaris port myself and liking it.
Pollock[Romulus2]
2003-05-21 16:12:24
There are some Pentium 4 users that have been having problems with v4.31c and have switched back to the "b" version, too.  Killing those results is not a good option.  Many of the "b" results have been verified as good.  I would think that a bad result from the b version would have already spawned into the c version, anyway.  My concern is the lack of re-checks.  At this point, it is hard to tell what to trust, because very few of the new results are being verified.  Unless somebody can think of a better solution, we will have to run it and hope for the best. 

I am running only a few of the verified results, with some 12.00% results from our (now) out- dated team file mixed in and hoping for some fresh results.  It is running between 12.50-13.35 for now.I'll send the good ones.
Stephen Brooks
2003-05-22 04:29:59
quote:
Originally posted by Pollock[Romulus2]:
- only accepting results with a resonable yield*runs:mpts ratio?
That would be the closest solution, but it would likely not work at this point.  Most of the fresh results are not being checked, though, so that would not really be feasible.

No, it's entirely possible with the v4.31b results too.  Just count non-rechecked results as ones that have been run 1 time.  The only difference the lack of rechecking will make to that ratio is to increase the variation of it a bit, but it will still eliminate any dubious results.  I'd only cull results where the amount of calculation (Mpts) seems to be much _lower_ than it should have been, though.
quote:
There are some Pentium 4 users that have been having problems with v4.31c and have switched back to the "b" version, too.  Killing those results is not a good option.  Many of the "b" results have been verified as good.  I would think that a bad result from the b version would have already spawned into the c version, anyway.

For now, there isn't much from the new version so far, so killing the old results might not be an idea just yet.  As for bad results "spawning" more bad results into a new version, that is complete fantasy.  The design may be picked up by the newer version, but it will be simulated correctly like the others, so will be given a low score if it deserves a low score.
If you want more rechecking to go on, it's best to start again with a reduced results.dat that only contains re-checked results from v4.31c, just in case there were some abnormally-high v4.31b results that were convincing the program higher yields had already been reached.

Today's weather in %region is Sunny/(null), max.  temperature #NAN°C
MaFi
2003-05-22 06:12:56
hello
i composed a file as stephen suggested.  you can find it here:
http://de.geocities.com/ma_fi_78/muon/engl.html
happy simulating Big Grin
markus
Herb[Romulus2]
2003-05-22 08:04:36
Perhaps Stephan202 can add a filter to his script as an additional option??

I've compiled 163 so far here

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.
Pollock[Romulus2]
2003-05-22 09:13:18
As for bad results "spawning" more bad results into a new version, that is complete fantasy.

Glad to hear that.  Maybe I have seen too many bad horror movies!  It was never meant to start the pot boiling, just to point it out, so people can decide what to keep or remove.  Apparently, it depends on personal preference.  Maybe with different people running different revisions of the .dat file, we can spin off a wider variety of results. 

Stephan202 and DPC, MaFi & Herb, Thanks guys, great work.
[DPC]Stephan202
2003-05-22 09:33:31
I will take a look my code tonight to see if i can easily implement Stephen's suggestions without too much slowdown to the update script.

As can be seen, simply only publishing rechecked results won't work - they can be erroneous too.

The only way to properly filter results will thus be by 'yield*runs:mpts ratio'.
I will use Stephen's graphs to decide proper upper- and lower boundaries for this ratio.  Or should I only filter results with too few mpts?

---
Dutch Power Cow.
MOOH!
[DPC]Stephan202
2003-05-22 09:58:25
Allright.  I just took a good look at the graphs, made some calculations, and I think it would be fair to require at least 18.5 Mpts for each %.
This would mean that:

Mpts / (yield * runs) > 18.5

Stephen, can you agree on this?

---
Dutch Power Cow.
MOOH!
MaFi
2003-05-22 10:34:17
and maybe an upper bound of around 25 Mpts/% ... ?
markus
Pollock[Romulus2]
2003-05-22 10:38:54
Stephan, it appears to me that there are only 2 obviously odd results in that list.  The results being generated are climbing gradually and may soon reach that 13.52 mark, then the rechecks will begin, anyway.  Unless there are some newer 'rogues' that pop up at the top of the list, none of this will matter.  It may be best to just leave it alone, unless more of them sprout up.  You have my apology for stirring the pot.
[DPC]Stephan202
2003-05-22 10:52:54
Lol, don't feel sorry for this, I don't mind doing a little scripting.  The only problem I'm afraid will show up is an enormous run time.  I can't be sure about that at this time though.

Btw, I would personally go for an upper limit of 21.5 or 22 Mpts/%

Also, why should I just not create another type of top result file next to the six that are currently aviable.  This special file would only contain recheckt results which match the criteria of 18.5 to 22 Mpts/%

If this would mean an increased run time then I can just set the update script to update this file less frequenty.

Oh well, I'll see what I can do.  This is the theory, now on to the code.

---
Dutch Power Cow.
MOOH!
Herb[Romulus2]
2003-05-22 11:49:14
From the experience I'd made so far (also with the earlier version), it just doesn't make much sense to look over more than 250 top-results.  My continuous increases in quality were always related to purely replacing the results.dat against a newer/better one (around 250 and less). 
You can check that easily by comparing both of my versions 4.3 and 4.2 results of particles/results ratio.  I think I'm relatively the leader of the pack Big Grin

Today I've switched strategy nowSmile
I've set up a spreadsheet to see only from the top 5 results (DPC) which individual parameters promise the best accelleration in quality.  4 boxes are continously rechecking them (see my link above concerning the rechecked top results).

I have another 4 boxes just free floating against the standard top 250 from DPC, just to not missing something better than already achievedBig Grin

Hope all this crap makes any sense to you Cool opening the next can Wink

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.
Pollock[Romulus2]
2003-05-22 12:43:53
Good point, Herb.
Anything over 250 results is really a waste of storage space.  It doesn't seem that the client even reads all of a 250 file.  Anything less than that might narrow things down too much if everyone used the same one. 

Stephan, if you want to try the filter, 21.5 would work for now, but it seems that 22 would weed out anything abnormal and allow for growth.  Cutting off the 500 and 1000 files may speed things up and make up the difference.
[DPC]Stephan202
2003-05-22 13:37:23
I just finished the script.  The slowdown seems to be minimal, so no other changes had to be made.  The script filters out all results which do not apply to the 18.5-22 rule.  Its does not create separate top files for this method: all results downloaded from the up- and download page will apply to the 18.5-22 rule.

Not yet though.  I'm not hosting the script myself and I can't reach the person who does at this moment.  So you'll have to wait till tomorrow Razz

---
Dutch Power Cow.
MOOH!
[DPC]Stephan202
2003-05-22 13:57:34
I just contacted [DPC]Ingmar (the hoster of the script) and the script is now updated.
As you can see it already did it's wok.  Some of the highest yields are removed.  This explains why the highest result in the file hasn't been rechecked 5 times.

http://stephan202.qik.nl/

---
Dutch Power Cow.
MOOH!
Pollock[Romulus2]
2003-05-22 14:06:22
Nice job, if anybody complains about that, throw something at them.  You have done enough work.
Stephen Brooks
2003-05-22 16:00:35
quote:
Originally posted by Pollock[Romulus2]:
As for bad results "spawning" more bad results into a new version, that is complete fantasy.

Glad to hear that.  Maybe I have seen too many bad horror movies!  It was never meant to start the pot boiling, just to point it out, so people can decide what to keep or remove.

Sorry for sounding annoyed in my post back there... I just want to make sure we don't get "muon myths" developing Smile

Seems like everyone has been very busy since I last looked at this thread!  What you're doing seems to be fine.  In v4.31c there appear to be very few rogue results anyway, and none with dangerously-high yields, so you should be pretty safe, especially with the Mpts-ratio filtering thing implemented.

Incidentally I've been impressed with the amount of amateur results-mongering that has gone on with Muon, and now with the checksum system it's pretty safe too!  - the worst thing that would normally happen is you'd lose some results if you corrupted them accidentally.  I suppose this is the benefit of using an "open" or human-readable result format.  I could have used XML but that would have expanded the files to about 5x their current size, or going in the other direction, there's the possibility for smaller binary format files to reduce network load and for fast indexing here.  Maybe some time I'll make some converters to change results files between those possible formats.
Right now I've got some people complaining about a crash in v4.31c though... and then when I fix that I can release a newer commandline porting code.  I want to get the Muon stats on this site a little more operational too, but I have exams coming up on June 3/5 and then a (Neutrino-Factory-related) trip to New York for a week.  However they will come later this summer.

Today's weather in %region is Sunny/(null), max.  temperature #NAN°C
Pollock[Romulus2]
2003-05-22 17:48:12
quote:
Sorry for sounding annoyed in my post back there... I just want to make sure we don't get "muon myths" developing


Understandable, I was actually trying to do the same thing.  If a large number of 'rogues' suddenly appeared, there would be accusations of cheating and Stephan202 & Co.  would have taken a beating for trying to help.  It was just ignorance on my part.  Thanks for clearing things up. 

Most of the tweaking we do on Rom2 is just friendly competition among us.  We have no chance of competing with DPC and others in the total points, so we have fun with the highest result when we can.  It makes things more interesting.
[DPC]Stephan202
2003-05-23 10:50:20
This is a graph of the distribution of mpts per yield-%



As you can see I changed the range from 18.5-22 to 19-22.5. It seems we have accedently trown away some good results within the 22-22.5 range.  There was nothing below 19, so that's why I raised the lower boundary.

I will make a nice page for this graph, perhaps reorganize things a little, and then it will be published on the up-/download site and be updated every hour.

---
Dutch Power Cow.
MOOH!

[This message was edited by [DPC]Stephan202 on 2003-May-23 at 20:07.]
Pollock[Romulus2]
2003-05-23 15:30:39
By just glancing through the list, I didn't see anything that close to x22, but you seem to have a good handle on it.Smile The 22.5 limit will leave more growth room, anyway.  The numbers seem to be growing a few points per day.  Both of mine are running 13.40-13.51% regularly and slowly climbing.
Herb[Romulus2]
2003-05-24 05:35:14
I watched that slow increase already as we were still in the 12% region Wink Indeed I'm missing a few of these very good ratio results, however they will come back I think.

Very good work everybody Smile

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.
Ulan
2003-06-16 23:37:40
Hmph. 

I downloaded Herb's top results file (http://webwi.de/data/short.zip) and unzipped it to a fresh install of linux muon.  I manualsent my results a day later.  Problem is that the stats server thinks that i've made a huge 300,000 point update!  I think it's reading Herb's results combined with mine

What to do to fix it?  It looks like i'm a cheat, i only usually output a maximum 60,000 points a day.  Thanks
[DPC]Stephan202
2003-06-16 23:57:52
Did you save Herb's file as results.dat, not results.txt?  If not, Then that's the problem.  I don't know if Stephen can easily fix this.  Perhaps with a dupe check, but I don't think he does cross-user checking.

---
Dutch Power Cow.
MOOH!
Ulan
2003-06-17 00:11:14
Ahhh, my bad

Pity for me that there isn't any protection against stupidity like this
[DPC]Stephan202
2003-06-17 01:16:34
In essence the .dat and .txt file are formatted the same.  So there's no telling whether something is .dat- or .txt-data.

---
Dutch Power Cow.
MOOH!
Herb[Romulus2]
2003-06-17 07:40:00
If you've just unzipped the file, it has a dat extension and there is no harm.  If you saved it as txt, I think it will be ignored, as all the results have an already know checksum and that's what is compared only, I think.

-------------------------------
I'd say more, but I can't reach the keyboard from the floor.
Ulan
2003-06-18 17:07:36
Hmm, maybe you are right Herb.
9 Hours ago i totally deleted my results.dat and results.txt due to segmentation faults, but in that time my new results.txt has grown to 220K, mostly due to low particle rates (and the fact that the linux port is just sooo much faster).

I also checked how many results were accepted when I got that 400,000 point boost - 400 results, which is around normal anyway, but i only ever got 40,000 to 50,000 points for previous flushes. 

Why now after using you're results.dat did I get 10 times more points for the same number of results?
Mike Malis
2003-06-18 19:42:48
Ulan, if you are running simulations that produce a higher yield of muons then there are more points of calculation required per simulation for the greater number of muons that are not "lost".

Mike Malis
Pollock[Romulus2]
2003-06-18 20:21:57
That cannot possibly be right.If thet were 400 junk results, the point total would be <10,000 Mpts.  If you wre running Herb's 'best 250' as a .dat file, it is highly unlikely that you would generate 400 results in that time-frame.  I run 2 Athlons[1.2 Ghz&1.7Ghz) and would be lucky to generate 100 total results in 24 hours with herb's file.
To generate 400,000 Mpts., you would need to run at least 300 high-yield re-checked results.  Physically impossible in that amount of time without a gigantic herd of computing power. 
Herb's file contains 250 rechecked results totalling @340,000 Mpts, that would account almost exactly for the difference.

No doubt that Herb's file was mistakenly sent in as a results.txt file.  Apparently, there was no dupe check done on it.
[DPC]Stephan202
2003-06-18 22:41:07
Most likely because, as I said before, there is no cross-user dupe-check.

We'll wait for Stephen's explaination...

---
Dutch Power Cow.
MOOH!
Ulan
2003-06-18 23:36:52
Mike Malis, thanks for clearing up how the points system works.

It's looking quite certain that Herb's high-yeild results were sent along with about a 100 of my own, giving me that 400,000 point update.

I'm hoping that Stephen will possibly know of a way to correct my mistake Roll Eyes
Stephen Brooks
2003-06-19 09:19:36
quote:
Originally posted by Ulan:
Pity for me that there isn't any protection against stupidity like this

Unfortunately there isn't. If you can tell me the checksums of the first and last result in your "false" block, I might be able to go through and remove that chunk manually though.  Smile

If you think about it, dupe-checking the whole database is not a trivial assignment given that it's 1.24GB just for this current version.  There are ways to do it quickly but these do require quite a large amount of clever indexes that need to be built.  Perhaps I will do that at some later stage.  For now, just a bit of dupe-checking within each user seems just about enough to stop blatant cheating, but not some other sorts of errors like yours.

Today's weather in %region is Sunny/(null), max.  temperature #NAN°C
Pollock[Romulus2]
2003-06-19 10:29:42
Sorry, guys.  Didn't intend to be rude, it has been a rough week.  Ulan, glad to see that you are honest.  Smile

Stephen, if the results sent in are kept in the same order that they are sent, it would be easy to remove Herb's numbers from that list.  Ulan's top result 13.759152 is an exact match for the result that was at the top of Herb's list at that time, it had been on the list for a few days before this happened.  The rest of the list would be in order with the lowest near the 13.62 range. 

The top result: 13.759152 (1346.4 Mpts) [v4.31c] {E29708F3}
Ulan
2003-06-20 21:12:58
I totally wiped the results.txt and results.dat and extracted a fresh top250 results.DAT to my muon directory two days ago... Eek
Ulan
2003-06-20 21:19:16
Thanks for the help Pollock.

Maybe it'll be a good idea to adivise potential top results file downloaders not to make the same mistake I did - which was to extract the downloaded top-file as results.txt AND results.dat.  Wink
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.orgTwitter: stephenjbrooksMastodon: strudel charactersjbstrudel charactermstdn.io

Site has had 16401043 accesses.