Pollock[Romulus2] 2004-05-28 08:28:15 | Stephen, I just noticed some odd results while running a TEST queue with v4.41 this morning. The client was shut down in the middle of the second run in this queue. When it was restarted, the first result produced a negative value which ,by appearance, should have been a positive value. The result after four runs: 0.250348,-0.275105,0.286971,0.288988 (1657.4 Mpts) [v4.41] {DBC5EDBF79C46B4C60C216AE} <ChicaneLinacB> All four were run in the command-line client and appeared to be nearly identical except for the result. This is the continuation of run #2 and the entire run #3: Muon1 started Loading bending magnet fields... Done Loading genomes... 76, done. Searching for auto-saved file... Building proximity grid 6x2x50 (600 cells)... Done Restored simulation at 23.2ns - Starting - t = 42.63ns (24073/82545 particles) 83.8 Mpts Auto-saving... t = 60.95ns (25253/85471 particles) 129.1 Mpts Auto-saving... t = 78.58ns (25696/87132 particles) 174.1 Mpts Auto-saving... t = 95.73ns (25857/88252 particles) 218.4 Mpts Auto-saving... t = 112.88ns (25453/88993 particles) 262.7 Mpts Auto-saving... t = 134.51ns (11148/89269 particles) 299.9 Mpts Auto-saving... t = 183.48ns (5417/89334 particles) 340.1 Mpts Auto-saving... t = 325.94ns (2191/89338 particles) 383.2 Mpts Auto-saving... t = 500.00ns (783/89334 particles) 412.7 Mpts Quarantined result has now been run 2 of 5 times New simulation Rechecking quarantined result Interpreting lattice file 'ChicaneLinacB'... Done Beamline consists of 408 units Adding components to simulation space Tantalum rod source data loaded. Building proximity grid 6x2x50 (600 cells)... Done Tracking central particle to synchronise RF phases... Done Done adding components Determining nearby components... Done - Starting - t = 22.61ns (21498/76826 particles) 38.0 Mpts Auto-saving... t = 41.55ns (23878/82295 particles) 81.2 Mpts Auto-saving... t = 59.27ns (25031/85255 particles) 124.7 Mpts Auto-saving... t = 76.26ns (25602/87059 particles) 167.8 Mpts Auto-saving... t = 93.04ns (25870/88203 particles) 211.0 Mpts Auto-saving... t = 110.01ns (25850/88983 particles) 254.9 Mpts Auto-saving... t = 128.30ns (13495/89361 particles) 292.0 Mpts Auto-saving... t = 169.69ns (7153/89431 particles) 331.3 Mpts Auto-saving... t = 277.52ns (2582/89437 particles) 372.5 Mpts Auto-saving... t = 497.40ns (796/89431 particles) 412.6 Mpts Auto-saving... t = 500.00ns (788/89431 particles) 412.8 Mpts Quarantined result has now been run 3 of 5 times The stats results of this run are not important. I have already far surpassed 0.28 in the chicane. The only point in running this queue was to introduce a new and radically different breed to the sample file. Hopefully, this is not a bug, but it did seem extremely odd. |
Stephen Brooks 2004-05-28 11:19:23 | Looks like something is not being saved properly... but what? I will look into this further - v4.41a is out already and fixes some other stuff. I'll try to do some checking of the autosaves at work next week. |
[TA]z 2004-05-28 12:55:22 | some weirdness here too... I noticed a queue.txt that contained two different sets of parameters and that this second set in the queue.txt is being ignored -1.#IND00 (10.8 Mpts) [v4.41] {F3BA3FE0DCC9FED5A778AC98} <ChicaneLinacB> a fresh launch of the client yields: [WARNING] Result version (0.000000)<MINVER(4.300000) Continue (Y/N/These/All)? I've noticed a significant drop in production today, so I believe this is happening to quite a few of my clients... Just haven't had the time to check them yet. More later... |
Stephen Brooks 2004-05-28 12:58:19 | The '-1.#IND' error was fixed in v4.41a (which appeared on the website about 6 hours ago). The only known-but-unresolved bug is now the autosave, and so long as you don't restart much, it shouldn't affect you a lot. |
Pollock[Romulus2] 2004-05-28 16:32:28 | Update: The final result of the simulation above seems to have read the negative run as a positive. 0.274375 (2070.0 Mpts) [v4.41] {3517F2322C29C1EA46330386} <ChicaneLinacB> It looks like the negative score was just a glitch. Several other TEST queue runs have been stopped and restarted with no sign of a problem. Apparently this was just a false alarm. |
kitsura 2004-05-28 17:37:28 | I was just going ask about the divison by 0 error since it was not mentioned in the version history but seems that its fixed now. |