Herb[Romulus2] 2003-10-09 02:41:17 | I have a client with a queue which doesn't progress, the client produces normal results and ignores the queue. It was generated from the client and stopped yesterday evening at this point: ...;#runs=2;#queued=1; 9.180134,9.100697 (682.4 Mpts) [v4.32b] {18DA0521} What shall I do with that file? Somehow I've the strange feeling that I missed something ------------------------------- I'd say more, but I can't reach the keyboard from the floor. |
[DPC]TeamNWW - Huub 2003-10-09 03:10:12 | This can't be a coincidence.. i have the exact same thing here.. .......alf=999;#runs=2;#queued=1; 9.265328,9.174261 (681.5 Mpts) [v4.32b] {40D9CE1E} After these two runs of the queue the client produces normal results and ignores the queue.. I already posted in: http://www.stephenbrooks.org/groupee/forums?a=tpc&s=724606111&f=144606111&m=1426069934 but this is probably the better place for this.. btw.. this queue was not client genarated but a 'tweaked' one.. Second edit: I moved the queue.txt to another machine.. it still is ignored.. i will now try the commandline client and see if it gives some kind of error.. Third edit: this is what i get when i start the commandline: Muon1 started Hello, [DPC]TeamNWW New simulation Searching for auto-saved file... No file present Record rejected due to bad checksum: was {40D9CE1E} Reading genome scores... 1113 scores loaded Making new genome, TrialType=Random Done Interpreting lat............ [This message was edited by [DPC]TeamNWW - Huub on 2003-Oct-09 at 11:21.] Fourth edit: I stripped the "#runs=2;#queued=1;" an put a new "TEST [This message was edited by [DPC]TeamNWW - Huub on 2003-Oct-09 at 11:42.] |
Herb[Romulus2] 2003-10-09 04:38:26 | Hmm, if that happens more often, it's a waste of good effort then Having a bad CRC in the middle of the normal process looks like a bug somewhere ------------------------------- I'd say more, but I can't reach the keyboard from the floor. |
[DPC]TeamNWW - Huub 2003-10-09 07:16:05 | Instead of doing an edit on the post above.. a new reply.. this is getting weird.. after stripping the "#runs=2;#queued=1;" and putting a new "TEST" under it didn't run even once.. I now put it on a third machine (was running 4.13 - sorry Stephen) and strated commandline.. it started at least.. lets see if it will finnish too.. |
Pollock[Romulus2] 2003-10-09 09:43:58 | Were those queues run with the 'b' or 'c' version? Having a similar problem here. When I start a test queue, it runs with no problem. The rsults are NOT reported to the queue.txt file, though. They are written directly to the results.txt file, but NOT the results.dat file. This has happened twice on different machines. Both have been caught and stopped after four runs, so no idea how many times they would actually run. It has created at least six dupes in the results.txt files. Three from each test. The strange part of it is that I have run at least 25 other queues with no problem before this. The problem queues were just slightly modified versions of others that ran perfectly. Both were run on 4.32b, so I'm trying the 'c' version now to see if it continues. Only the command-line client has been used for all runs to this point, nothing has changed there. EDIT: The 4.32c version did the same thing but it also wrote the score to the rsults.dat file this time. I only hope that at least one of the runs will be credited. It may read all of them as bad checksums. [This message was edited by Pollock[Romulus2] on 2003-Oct-09 at 17:53.] |
Herb[Romulus2] 2003-10-09 21:10:55 | It was with the background / C version published here first in the forum. I need to check the results.dat for that, when I'm at work. ------------------------------- I'd say more, but I can't reach the keyboard from the floor. |
[DPC]TeamNWW - Huub 2003-10-10 00:50:34 | Yes!! it worked this time! #runs=5; 9.191620 (1704.5 Mpts) [v4.32b] {1121486F} It did take an amazing 17 hours to run on an PIII-866 though.. It died on a 32c background/commandline; (first machine) then died on a 32b background (second machine) but then did the complete 5 runs on a 32c commandline!! (third machine) first and second machine are P4 - third is a PIII.. |
Herb[Romulus2] 2003-10-10 02:04:00 | This is still a strange thing. My client wrote continuously to results.dat. Over night it overwrote the queue with a new one 9.291731,9.194906,9.227436,9.190381 (1293.1 Mpts) [v4.32b] {8E487410} This was done on a P3/866 Compost fileserver. ------------------------------- I'd say more, but I can't reach the keyboard from the floor. |
[AMD Users] Michal Hajicek 2003-10-10 13:47:58 | Same error several times, random (after 2nd run checksum error), 4.32b commandline, modified queue not needed. 4.32c cmdline OK up to now. (W2k, Athlon XP) edit: same error with 4.32c cmdline, no manualseed, queue blocks all further rechecking [This message was edited by [AMD Users] Michal Hajicek on 2003-Oct-11 at 22:12.] |
Pollock[Romulus2] 2003-10-10 22:59:35 | The failed queues were run on WinME and Win2000. The last one that I tried on the WinME drive started to do the same thing. I shut it down an switched to the XP partition. The exact same queue was copied to the Muon folder on XP and ran fine. Four others have also worked on XP. All were run with command-line. Both machines here are Athlons. One is an XP2000+ and the other a T-Bird 1.2gig. |
Stephen Brooks 2003-10-14 09:45:58 | I've just been rewriting parts of my result-handling code, with two upshots: 1. The optimiser is now becoming more generic, so I can apply it to other things as well as Muon (I have even considered using the optimiser to optimise its own genetic algorithm parameters ). 2. Now that I'm testing this, I'm observing the same kind of strange queue behaviour as you are. Hopefully when I debug this, it'll have fixed your bug too (or they were the same, or something). It doesn't make any sense: that's why they call it "virtual" |
Pollock[Romulus2] 2003-10-15 12:04:21 | I gave the 'd' version a test run. After two test queues, same result. Both of them reported results to results.dat and results.txt, but not to the queue.txt There was one difference. Both queues just stopped running after the first run. The client just went back to reading the .dat file and generated a new run. I thought maybe it was because the test result was lower than the top one in results.dat, but I hid results.dat for the second one and it did the same thing. They just ran once and then ignored the queue.txt file. I like it that way. It saves a lot of re-checking bad queue runs. I did see one having problems with the 'c' version in command-line. It ran the first two times perfectly. The third run reported a checksum error at 60ns into the run. It tried to run three more times and reported a checksum error at various times in the simulation. It left the queue file behind with two finished results and then just went back to random generations. |
Stephen Brooks 2003-10-16 03:55:37 | quote:That's how it's meant to work: if the test result isn't the highest yet found, it is not re-checked, which is the same reason why not all runs generated from results.dat are not rechecked. Doing the repeated runs takes 5x the time while only reducing the error to about 40% of what it was, so it is mostly useful just for distinguishing between the highest results. It doesn't make any sense: that's why they call it "virtual" |
[DPC]TeamNWW - Huub 2003-10-20 01:06:57 | d-version - more queue problems.. Stephen, it looks like you forgot something when rewriting your code.. situation 1: normal running. When i let muon run normally it will not create a queue.txt if it produces a score higher then what is in the .dat - that score will just be added to the .txt file with no rechecking done and a new run will start. (9.62 was highest in .dat; 9.63 was new score with only about 330Mpts) situation 2: manual seed. If i create a queue.txt, version d now just keeps rechecking that queue, putting every run in results.txt. queue.txt is not updated (no 'runs=x' at end of line 1, 'TEST I'm no programmer, but it seems you forgot to tell muon to write/update the queue.txt file |