stephenbrooks.org : Muon1 v4.46 alpha 1

Zerberus 2013-03-19 21:19:45	"I now wonder what Zerberus meant by the "show highest Mpts" - did you mean in graphics mode or commandline mode or both?Â Because I've only done graphics mode at the moment." Graphics mode.Â So you don't have to fire up viewresults.exe each time to see what to beat currently. Btw.Â is there any quotes support in this forum? Edit: It is very confused if your post gets to be the first on page two but you are still on page 1 after post. It looks like your post didn't make it at all. [Edited by Zerberus at 2013-03-19 21:23:24]
Stephen Brooks 2013-03-19 22:26:28	You're not supposed to be beating Mpts though: that's the amount of calculation. You're meant to be beating muon yield % (the eventual number of muons getting to the exit). What it will tell you is some sort of estimate of how long your simulation could take. I sometimes do quotes using the <blockquote> tag out of HTML.
K`Tetch 2013-03-20 02:08:43	ok, no problems with it so far, it is a tad faster. maybe 10% than 4.45 on my q6600 I'll check the span stuff in the am (remind me stephen, send me an IM)
K`Tetch 2013-03-20 17:39:57	ok, the visual span mode seems to work fine. The 3-bar cursor helps with aiming too
K`Tetch 2013-03-20 17:49:32	The only thing I've noticed is the occasional particle escape problem happening again, like here [Edited by K`Tetch at 2013-03-20 17:50:20]
Stephen Brooks 2013-03-20 18:04:19	Hmm, particle escapes might be worth me doing some debugging on, thought I'd fixed that years ago!
Zerberus 2013-03-20 18:09:03	That's exactly what I mean by ''leaking''. Those greatly distract the auto-focus and auto-zoom. If they are harmless, could you at least tweak the focus/zoom to ignore those?
K`Tetch 2013-03-20 21:15:29	I ran a bunch, it only showed up once or twice, but it still happens now and then. But yeah, I don't think we've had leakers since 4.43 (the chicanes were REALLY bad for it) Zerberus, might be an idea to pause things when you get leakers, zoom to the early part, and I think there's an option for 'bounding boxes' - see if there's a gap there. might be it. could be a missing (0'd out) component whose box is thus missing. or something.
Stephen Brooks 2013-03-20 22:07:36	I have a mechanism that's supposed to kill the leaky ones but it might fail at the very beginning of the beamline because particles "before the beginning" don't have a relevant component to read a limiting radius from (if that makes sense?)
Zerberus 2013-03-20 23:42:31	Do the leaked ones maybe have a negative offset that could be detected? The simulator does track each particle, correct?
[OCAU] badger 2013-03-21 00:19:53	/[quote] = Steven Brooks:" So those last 3 on the E8400 are comparable to each other?Â (all same version but different particles)" [quote] Those particular results are 2 from v446 and one from v445. I'm quite interested in that graph: 1. Most of the results seem linear (ish) but there is a secondary set of results with much higher time per timestep - is this near the end where there are fewer particles? 2. results are much more varied in the region 100-150 particles - more results are plotted here, but especially 4 threads has 2 distinct distributions besides the smaller one mentioned in 1. 3. It appears for this machine that it is more efficient to do more particles per thread - if I have the right idea that total particles simulated/ s.thread = particles /time per step? I did a quick calculation reading numbers off the graph (using the lower line): Numbers are in kpts/s Particles/thread threads 1 2 3 4 10 300 120 100 75 20 300 200 171 141 40 343 300 300 240 60 360 400 400 360 80 343 436 480 436 100 353 462 600 500 200 375 600 750 800 This would give best result for 200 particles, 4 instances each with 1 thread (assume it scales across cores) at 1500 kpts.Â This is ignoring the effect of the end of each simulation when particles numbers fall, but with multiple instances running it might work out ok. I might do a few experiments with different particle and thread numbers [Edited by [OCAU] badger at 2013-03-21 00:20:50]
Stephen Brooks 2013-03-21 10:59:05	--[It appears for this machine that it is more efficient to do more particles per thread]-- That's always true because of the thread overhead. What I'm trying to find is particle per extra thread. I.e. the point at which it's worth having another thread, with the associated reduction of particles per thread (efficiency) being more than compensated by having another core (power). I can't just choose how many particles I have because threads * particles/thread = however many particles are in the simulation at the time. If OO is the overhead and ............ are the particles, it looks like this: 1 thread: OO............ 2 threads: OOOO:::::: 3 threads: OOOOOO3333 4 threads: OOOOOOOO444 Where : 3 4 indicate 2,3,4 particles being calculated in parallel. The actual timestep times will vary even on a per-particle basis because different locations in the accelerator require different complexities of magnetic fields etc. to be calculated.
Stephen Brooks 2013-03-21 11:41:45	Re-download the alpha 5 archive and there is an updated muon1_debug.exe in it. With this version, particle picker mode (pressing I or clicking both mouse buttons at once) can show extended information if you press SHIFT once it's active. This shows ALL the attributes of a given particle. When you next get a leaky particle, try activating this mode and send me a screenshot (Ctrl-S) of its properties.
[OCAU] badger 2013-03-21 23:47:42	Hmm, trying to get my head around the particles per thread setting, what it actually does.Â In my mind it limits the thread to a max of that many particles, so if there are too many particles, then more threads are created?Â So for 10 000 particles with a setting of 50, there will be 200 threads?Â I note on my i7 that if I set #threads to 4 (4 cores IIRC) I get 50% cpu use only.Â Does the auto setting just set the number of threads at the start of the simulation, or does it dynamically change thread numbers throughout based on the particles per thread setting?Â (I'm guessing the latter). I'm watching my 2 instances set to 150 particles in resource monitor in win7. One is averaging 74.3% of CPU and the other 19.1% (low and BG priority respectively) they are set to 8 threads, although resource monitor says they are using between 1-8 threads each (not sure if they are the same threads) [Edited by [OCAU] badger at 2013-03-22 00:02:27]
Stephen Brooks 2013-03-22 21:26:17	--[So for 10 000 particles with a setting of 50, there will be 200 threads?]-- LOL. No, the number of threads created is the minimum of the number of logical processors in your system (e.g. 4 for quad core or dual core with hyperthreading) and the number dictated by the particles per extra thread setting. So if it's set to 50 and you have 4 logical processors, up to 50 particles is single-threaded, 51-100 uses 2, 101-150 uses 3 and 151 and above uses 4 threads. Can you catch me some leaky particles?

Zerberus
2013-03-23 00:30:43

Does the lattice matter?Â I used the axial3 one (more particles, more chance to leak).

How many should we hunt for?
[Edited by Zerberus at 2013-03-23 00:32:12]

Zerberus 2013-03-23 03:30:12	Other things I noted: -For the phase space graphs, the axis description and scale overlap. -If progressing a queue, could a simple ''run x of x'' be shown in the graph mode? -When pausing the simulation, Autoview is switched off, but won't switch on again when unpause.
Stephen Brooks 2013-03-23 13:39:00	Thanks, that tells me what I need to know. - It's surprisingly hard to get computer-generated graph labels not to overlap, may or may not fix. - Queue/quarantine status in graphical mode is an interesting idea, I could put it under the "Size of results database" line. - Switching autoview off when paused is intentional because otherwise there's not much point in pausing because you can't look around. I left it off so that you can easily retain your view when the simulation resumes (just press V to switch it back on).
Zerberus 2013-03-23 18:53:39	IMHO Autoview should be restored if it was ON at the time of pausing.
GP500 2013-03-27 11:02:28	on alpha 4, Particles per extra thread (limits threading overhead): 25 Gives me a 50-100 kpts more. Still a great deal of idle time, but it's more frequent in the 90% usage @ average the idle-time is still more then 10%. I can test this now on a new win7 install. see if that will make a difference. @stephen compiler: i had litlle google search: A more up to date compiler from the same base that you use now. http://sourceforge.net/projects/orwelldevcpp/ CERN also has a free to use compiler for c++ http://root.cern.ch/drupal/content/cint Ps: i'm not a programmer, hope this is of use.
Stephen Brooks 2013-03-27 12:02:34	I've just found a bug of my own: when the window is closed using the close button by the user, it doesn't save anything (no autogfx.sav or display.txt) as it does if you were to press Q or Escape. I've got a busy few weeks coming up (seminars to give and trips to Brookhaven National Laboratory and CERN) so further alphas and the final 4.46 release might have to wait until late April.Â Continue to report any bugs, crashes etc. you find in this thread, though, and I will pick it up later. [edit] OK, in the meantime there's a new _alpha6 available from the website. This has fixed pause unsetting autoview (pause is now a temporary override) and also adds a display of queue info in graphical mode when queue.txt is active. Not fixed yet in this version is the leaky particles problem (and this messing up autoview), Zerberus's phase space graph labels problem and the not-saving-when-using-the-window-close-button problem.

Zerberus
2013-03-27 23:30:23

No problem, I think I'll use NoClose for now.

Stephen Brooks 2013-03-28 16:59:27	There ought to be a way for me to fix this in my software libraries but it's an issue of getting the window close signal to be interpreted in the same way as pressing Q or Escape. Maybe I can fix it on a plane journey
Stephen Brooks 2013-04-10 11:14:22	I just got back to the UK this morning and I did indeed fix the issue on the outbound plane journey, I've just got to carefully merge my files from the laptop back with the main workstation now.
[OCAU] badger 2013-04-11 05:47:17	Still running happily, but I've been playing with the particles setting and analysing the results to try to work out what is best for output. I ran one instance, auto cpus, with autosend turned off, overnight or for a few days with no other usage of the machine. So far I have tried 25, 50 and 75 particles, with muonbench set to 30s intervals (to remove a lot of the timing variability. Prelim results (linear regression of the data) showed 25 at 2606 pm 0.5 Mpts/s, 50 at 2609 pm 0.6 and 75 at 2645 pm 2. However these were done sequentially and I think the quality of results improved over that time.Â I checked the proportion of results over 320 Mpts, and it was 48.7% for 25, 49.7% for 50 and 61.5% for 75 particles.Â Thus the 75 speed is higher due to a much higher proportion of high Mpts yielding simulations, not necessarily because 75 is faster. So I haven't been able to isolate the effect of particles alone.Â There is also a huge variation in speed from sim to sim, and when I plotted speed (Mpts/s) against size (Mpts) the scatter plot is split into distinct rows of events.Â I'll have to try to find a way of posting my graph. [Edited by [OCAU] badger at 2013-04-11 05:53:34]
Stephen Brooks 2013-04-11 16:13:48	Maybe you should start the test cases with an empty results.dat so the sims are comparable. Did you have samplefiles being downloaded? That graph would be interesting - you just need to find a linkable image bin on the web (or just e-mail it to me and I'll post it).

[OCAU] badger
2013-04-15 03:09:13

I've emailed you the graph.Â I'm sure the scatterplot "rows" feature is due to the 30s sampling interval of muonbench.Â I've saved the results.dat from the last run, and will restart with the older results.dat and no samplefiles downloads to try to get a comparable result.Â The problem with starting with a blank results.dat is that it doesn't represent "real" output, as it will be much slower with many shorter runs, as well as rapidly advancing in time.Â

Zerberus 2013-04-15 14:37:40	Question: We know how to debug crashes, but how to debug deadlocks/hangs? Alpha6 constantly hangs on some of the older lattices (ChicaneLinac), but as it doesn't crash, there's no debug information. Windows reports only "The program muon1.exe version 0.0.0.0 stopped interacting with Windows and was closed."
Stephen Brooks 2013-04-15 21:08:24	When does it hang? That is, what was the last thing it printed in commandline mode (or alternatively what did the graphics display look like, if it got that far)? Can you tell me exactly which lattices fail?
Zerberus 2013-04-16 05:48:05	I specifically tested ChicaneLinacB90, but from a quick check all ChicaneLinac lattices behave the same. In nearly all cases it stopped at the end of the initialization phase, just before starting the actual simulation. The last text line was ''Starting...''. Only once it hung at another lattice, in the middle of a simulation. Good question about the command line, so far I haven't tested that one, only graphics. I don't know if it has something do do with the hang, but it usually hung in a recheck phase, and I could read a ''central particle lost'' message in all cases (at least in nearly all). When it hangs next time, I take a screenshot (could be useful as the init text is still readable).
Stephen Brooks 2013-04-16 08:24:10	So [OCAU]Badger's graph (I just posted it) is Mpts vs. Mpts/s right?
K`Tetch 2013-04-16 22:25:38	Did you modify the old lattices to add a backstop, zerberus? The old ones had the backstop (the thing that surrounds the rod at the start) hardcoded, and it wasn't until 4.44 (I think) that it became part of the lattice. (decayrotB was the last with it hardcoded iirc) That could be a factor as I think. I've not run any old lattices, but I've been running my current version on cmdline for a while and had no hangs.
Zerberus 2013-04-17 05:21:29	"Did you modify the old lattices to add a backstop, zerberus?" Yep, I had to. Or is there another way? And why did it never hang with 4.45?
[OCAU] badger 2013-04-19 02:11:02	Stephen, Yes the graph is Mpts vs Mpts/s. I've started on running starting with the same results.dat, no samplefile downloads, no auto send, no other PC use and muonbench with 10 s intervals (this may create an overhead but it should be the same for all runs. For the first run (16 hours) I also checked the create/modified dates on a fresh results.txt to confirm. The statistical variation between simulations is very large though, so I might need a lot more than 16 hours (about 60-70 simulations) to get decent results.
Nekto 2013-04-20 22:01:11	You can use newer versions of gcc for x64. I think most people on x64 have lots of RAM so optimizing in used processor resources is more important. http://mingw-w64.sourceforge.net/
Stephen Brooks 2013-04-22 15:06:32	Aha! MinGW-w64 looks very hopeful that it's the sort of thing I want (C++ on Windows without the nasty Microsoft "Visual C" crud). DevC++ by comparison uses an old version of MinGW for Win32. Thanks for that.
[OCAU] badger 2013-04-24 02:54:51	More on speed vs particles. The biggest problem I've had is the random distribution of sims between high Mpts yielding sims that give good speeds and shorter but lower rate sims. Even running for 16 hours there was so much variation due to this I couldn't get any useful data. However I hit on the idea of splitting them up into groups of simulations: 0-200 Mpts, 200-300, 300+ (about <1m, 1-2.5m, and >2.5m of sim time). About 1% of sim time is 0-200 so I ignored them (plus even with muonbench at 10s the variation is way too big to get anything useful). 200-300 is between 5 and 10% of simtime 300+ is 88 - 95% of sim time so is most important. results so far (from 455 sims, starting from fixed results.dat for each particle value, no samplefile download, no autosend): particle muonbench 200-300 300+ 40 2.5867 2.04 2.670 50 2.6427 2.04 2.688 60 2.6268 2.06 2.687 70 2.6731 2.08 2.714 40 and 60 particle results from muonbench were low because the proportion of 200-300 was 10 and 7.5% respectively rather than the usual of 5%. 80, 90 and so on I'll do next.
Stephen Brooks 2013-04-25 12:21:42	I tried to repeat Zerberus's problem with the ChicaneLinac* lattices hanging (running commandline mode overnight) but it appears to work fine. Guess I'll try windowed mode for a while
Stephen Brooks 2013-04-25 14:41:36	I've just uploaded v4.46 alpha 7, which should be the last alpha release provided no-one finds any more bugs. Changes: - Quits properly when window close button is used (and saves display.txt and autogfx.sav) - Global limiting bore of 1 metre added by default to stop escaping particles - Axis description and scales no longer overlap on phase space graphs I think
Zerberus 2013-04-25 15:19:45	Does it improve on the ''stopping'' problem?
Stephen Brooks 2013-04-25 15:53:48	No, I couldn't replicate that problem. ChicaneLinacB90 worked fine both at the commandline and in windowed mode. Try a fresh install.
Zerberus 2013-04-25 17:43:32	Could it be a graphical problem? Recently I improved my display resolution from 1280x1024 to 1920x1080. Will try again.
[OCAU] badger 2013-04-26 02:06:25	Now have results for 80 particles suggesting ideal (for this PC at least) is in the 70-80 range: particle muonbench 200-300 300+ 40 2.5867 2.041 2.670 50 2.6427 2.043 2.688 60 2.6268 2.055 2.687 70 2.6731 2.076 2.714 80 2.6486 2.064 2.708
Stephen Brooks 2013-04-30 15:29:43	I got bored of alpha versions so Muon1 v4.46 has gone gold now!
Zerberus 2013-05-01 09:53:36	Had high hopes in this build, now I can't use it fully. Currently testing with the debug build.
Zerberus 2013-05-01 13:30:26	No joy. Debug build stops, too. Next is trying 4.45, tomorrow. I checked command line, works. I used the -init switch, init phase definitely completes and it hangs after init. Only one thread exists at the time of the deadlock.
Stephen Brooks 2013-05-01 16:36:41	As you explained it, the bug only happens only on very old "ChicaneLinac" lattices? So not a major issue for the work we're doing now. By the way, if you try running those with a modern install they will fail with `Loading bending magnet fields... y=0cm [FATAL] Could not load B-field file datafiles\l2y0.txt` ...since I no longer include the magnet data files datafiles\l2yNN.txt in the default package.
Zerberus 2013-05-02 17:30:13	But it is an issue, at least for me. Most possibly a race condition because it does not appear 100% on the very same simulation.
[OCAU] badger 2013-05-13 03:00:39	I've done some final speed testing with v446 (and v445) with the following results: Particles/ weighted Mpts/s 40 2.613 50 2.618 60 2.625 70 2.656 75 2.654 80 2.648 v445 (same lattices, starting results.dat and conditions) 75 2.514 (so about 5.5% slower than v446) weighted Mpts/s is a metric that tries to remove a lot of the variation due to the random size of each run. I divide the results for each set (ie particle number) into groups by Mpts, then find the speed for each group. Each group is then weighted by the fraction of the total of all the results that lies in that group.
K`Tetch 2013-05-15 17:25:24	Here's another speed comparison between the two clients. I do the 'best designs' videos with a specific install+setup. For the latest one (9Xc2 - should be out Friday) I used 4.46 while all the others have used 4.45. In visual mode, it's indicating 1.60us per active particle average for 4.45. With 4.46 it's showing 1.30us. That's about an 18% speed improvement.