Page 1 2 |
Stephen Brooks 2013-03-04 15:52:52 | I'm starting to alpha test the new version: please download this package and see what you think. Changes since v4.45:
There is also a muon1_mingw32.exe compiled using a Windows version of gcc. This may or may not be faster than the LCC-compiled standard version. You can benchmark this with Muon1bench available from the main page. |
Zerberus 2013-03-04 23:39:27 | W00t! That one will go into a new directory, I guess. Just a little one to add: Would it be possible for muon1 to clean up obsolete samplefiles, in addition to the lattices? |
Stephen Brooks 2013-03-05 12:57:35 | It's possible but I'm not completely sure I want to change that behaviour, in case people have put their own custom samplefiles in there or something. |
Stephen Brooks 2013-03-05 18:54:03 | First possible bug: looks like my version isn't giving the results checksums (so they're not being accepted by the network). Is this happening to anyone else? Will investigate... [edit] New alpha version posted to the site. |
Zerberus 2013-03-05 20:59:11 | By obsolete I mean samplefiles for lattices which have been deleted. Maybe renaming them to *.old instead will do so one can run a cleanup batch regularly if desired. |
waffleironhead 2013-03-05 22:30:10 | The gcc version seems quite a bit slower. Quick benches on my athlon2 x4 @ 2.6 muon1_mingw32.exe: 17933,1107.0,234.42 18233,1154.0,229.24 18533,1248.7,234.64 18833,1294.9,229.90 19733,1524.4,233.66 muon1.exe: 29902,308.5,373.00 30202,591.8,658.67 30502,790.6,660.00 31102,1071.0,582.94 |
Stephen Brooks 2013-03-05 22:52:11 | That performance discrepancy seems very odd unless I've messed up the compiler flags for MinGW. Is the low-performing Muon1 taking 80+% CPU or mainly running single threaded? Also just double check you're not running two at once or something |
waffleironhead 2013-03-05 23:00:33 | For both tests I've run one commandline client which spawned 4 threads. Both are using all 4 cores as the task manager shows 80-90% of cpu time being used. Hopefully someone else is benching the different versions so we have more data to compare. |
Stephen Brooks 2013-03-06 15:38:19 | Found a minor mistake that _alpha2 idenitifies itself as _alpha1 in the results... Waiting for some of these results to show up on the versions plots e.g. to make sure they're being sent and processed properly |
Stephen Brooks 2013-03-06 17:26:13 | Another iteration: v4.46 alpha 3 is now available. The MinGW32 version now targets Pentium 4 architecture with SSE2, hopefully that will work on most modern processors. I think I've fixed a bug where results weren't getting uploaded properly because of the "_alpha" in the version string! Also the version identifier is correct (v4.46_alpha3) now, although when you send as .bin format it gets stripped back to [v4.46]. |
Vvolodymyr 2013-03-07 06:54:34 | Hi, I was just wondering if it's possible to run the MinGW32 as command line - I'm running through Wine on linux. Thanks |
Stephen Brooks 2013-03-07 10:56:55 | If you rename muon1_mingw32.exe to muon1.exe (and the old muon1.exe to muon1_lcc.exe or something), you can use that executable with muon1_cmdline and similar commands. Type muon1 -? at the command line to get switches info. |
Vvolodymyr 2013-03-07 10:57:52 | OK Great - Thanks, will do that. CheerZ |
GP500 2013-03-07 21:26:54 | Stehpen isnt there more to gain with a compile on higher CPU's with sse3 4a and x64 instrucions. many math-projects score high gains with optimizes for newer CPU's. |
waffleironhead 2013-03-08 01:31:06 | alpha_3 has been chugging along all day with no problems that i can see. The mingw32 version is still a lot slower than the muon1 version though. Single core benchmarks: ming32: 3633,465.8,155.96 3933,531.2,162.17 4233,554.1,154.37 muon1: 13464,3562.5,275.27 13764,3628.7,273.97 14064,3666.1,270.49 14364,3791.3,273.83 |
Vvolodymyr 2013-03-08 14:27:39 | Hi. I tried running the MinGW32 from command line (-c) as suggested above, but on both Intel and AMD PCs (both 64-bit Linux, Wine) it drops out with an error right away. "A serious error occurred... etc etc" the error log starts with "Unhandled exception: page fault on read access to 0x00000000 in 32-bit code (0xf758e98e)" Although running it in visual mode, works fine - but I just don't want to run it like that, not all the time. P.S. the regular Muon1.exe (v4.46 alfa 3) is running on both machines for more than a day perfectly fine - with an increase of ~9-10% on Intel (i5-2500) and ~5% on AMD (FX-8150) in performance. Although the AMD cpu utilization always capped at ~60% (with settings of 8 threads or "auto" and still is (both v4.45 and this alfa test one). [Edited by Vvolodymyr at 2013-03-08 14:32:42] [Edited by Vvolodymyr at 2013-03-08 14:33:54] |
Stephen Brooks 2013-03-08 15:48:40 | @GP500 - For the alternative build I'm using Dev-C++ 4.9.9.2 (a free compiler system) but it's quite old and only has 32bit support right now. It uses MinGW32 - I think based on gcc 3.4.2. If you know of a free C++ compiler system that is more capable (e.g. 64bit) for Windows I'd be interested to know about it. As for performance, most of the relevant stuff for Muon1 is in SSE2, as far as I know. x64 may even make it slower, since pointers then take up twice the amount of RAM and the program tends to be bottlenecked either by floating-point performance or RAM bandwidth. Its commandline to the compiler looks like this at the moment:
@Vvolodymyr - So the LCC compiled version works OK under Wine in any mode? I haven't yet worked out how to get line-number trace information out of MinGW32 but given its poor performance I might not bother with it at all! Is your AMD CPU bulldozer-based? I heard they have shared floating-point units, one per pair of cores, which might explain the usage only showing just over half. |
Vvolodymyr 2013-03-08 16:57:22 | Hi, Sorry for lack of clarity, a bit muddled today. Yes the v4.46a3 Muon1.exe (lcc) works perfectly with yield improvements compared to v4.45 - I always run in command line mode. the Muon1_Mingw32.exe works fine in visual mode, But not as command line (through muon1_cmdline.bat) - drops out in error right away (after loading results file). The error log stars with "Unhandled exception: page fault on read access to 0x00000000 in 32-bit code (0xf758e98e)" Do you want me to email the whole error log? As for AMD - you're right, the FX-8150 is Bulldozer. That explains it, Thanks. and bummer [Edited by Vvolodymyr at 2013-03-08 17:01:35] |
Stephen Brooks 2013-03-08 17:30:30 | No need to send the full log, they're usually impossible to use unless I actually compile debugging information into the file to begin with. It looks like it's dereferencing a NULL pointer though. And you say it's doing it before any text output at the commandline AND only when Muon1 is given commandline arguments (i.e. "muon1 -c" not just "muon1" ) so it could be something to do with how Wine/Linux is sending it the command strings. Does "muon1 -?" work? |
Vvolodymyr 2013-03-08 18:18:50 | Oh ok. It's doing that after a single first line of text output - about loading results. That's the gcc(mingw32) - and yes with command line arguments "muon1 -c". The "moun1 -?" just starts up the Wine cmd window and immediately quits very very fast without any errors or messages. I could have fidgeted with Wine settings, but I know nothing of programming or where to look and what to try. I wonder if same happens on native Windows OS. |
Zerberus 2013-03-09 19:58:37 | For an alpha version this is already really good, and it seems to be fast already. From a first look on the graphical parts I witnessed a small graphical bug: -Framerate and AutoView overlap if you did not select small text Also, how can I start up windowed mode without setting the framerate to 1/1 in the process? |
Stephen Brooks 2013-03-09 21:21:47 | I guess when I say "alpha" I actually mean "beta" testing (i.e. public or semi-public), though I suppose I decided to call them "alpha" releases to imply there might be something very wrong with them: for instance not producing checksums or not being able to send. Once the completely stupid "network" bugs are ironed out, the remainder should be as stable as what I've been testing locally. I think I made windowed mode automatically select 1/1 frame rate to make it "pretty" and animate smoothly, but I think you're right I shouldn't have combined these two features. The -g switch selects 1/1 independently. |
Zerberus 2013-03-09 22:06:06 | At the very end of the first graphical fullscreen run with 4.46apha3: Faulting application name: muon1.exe, version: 0.0.0.0, time stamp: 0x51377b14 Faulting module name: muon1.exe, version: 0.0.0.0, time stamp: 0x51377b14 Exception code: 0xc0000005 Fault offset: 0x000169a0 Faulting process id: 0x1c38 Faulting application start time: 0x01ce1d0eff9b53d8 Faulting application path: D:\Program Files (x86)\muon_.next\muon1.exe Faulting module path: D:\Program Files (x86)\muon_.next\muon1.exe Report Id: e130cf12-8904-11e2-abff-0015172156a0 [Edited by Zerberus at 2013-03-09 22:06:57] |
Stephen Brooks 2013-03-09 23:22:44 | Next week I'll add a debug build (with the LCC-Win32 compiler) to the alpha3 distribution so you can find out what that is. |
Zerberus 2013-03-10 00:20:35 | It reproducibly crashes when it wants to write an queuegfx.txt file. The MinGW32 version doesn't crash, but it is a lot slower. Edit: MinGW32 seems to crash, too, at least when loaded from autogfx.sav. [Edited by Zerberus at 2013-03-10 00:32:50] |
Zerberus 2013-03-10 10:08:14 | Well, let's say it wanted to write files. As it was the first run ever, it could be queuegfx.txt, results.txt, results.dat... |
shauge 2013-03-10 14:23:55 | Visual Studio Express should be a modern free C++ compiler http://en.wikipedia.org/wiki/Microsoft_Visual_Studio_Express I have not used the express version myself though. SSE2 instructions are fairly old, it would be very interesting to see how much newer instructions could improve the project throughput, e.g. AVX instructions. It has given an significant improvement in other projects. |
Stephen Brooks 2013-03-11 16:32:16 | If you download the alpha 3 archive again it now contains a "muon1_debug.exe", which is the LCC version with stack trace error box enabled. In graphical mode, seeing this error box may require some careful use of Alt-Tab to minimise the fullscreen graphics. If the error is to do with creating new files, have you checked the folder you're using allows everyone write privileges? (All users not just you) SHauge - thanks. I'll bear Visual Studio Express in mind for a later version, though I've not had terribly good experiences with Microsoft compilers in the past (their projects always seem to want to include lots of MS-specific features rather than just compiling code). I've got a few long plane flights coming up from April, so I might play with different compilers on my laptop! |
Zerberus 2013-03-12 10:45:37 | Tested it on a different PC with the muon1_debug.exe First, if I try to switch on the plots (pressing H) I get this: Even if I set it in display.txt directly, it just crashes again. The non-debug version works. [Edited by Zerberus at 2013-03-12 11:00:16] |
Zerberus 2013-03-12 11:15:29 | A new graphical problem I witnessed. Or is it intended? The debug version seems to avoid the crash at the end somehow, at least on this system. [Edited by Zerberus at 2013-03-12 11:24:33] |
Stephen Brooks 2013-03-12 12:39:08 | Think I've managed to squash Zerberus's bug (I'd defined an array wrongly in a sorting algorithm that the phase space graphs call) and muon1_debug.exe has been updated in the alpha 3 archives. See if you can now catch the end-of-simulation bug in the debug build (assuming it wasn't a consequence of the phase space buffer overrun). The graphics "bug" looks like you've got XYZ reference axes turned on(?) Is there a display option somewhere you can toggle (runtime or display.txt)? |
Zerberus 2013-03-12 15:28:51 | Yes the second problem was mine. I had activated the ''show axes'' option without noticing. Stupid me. I'll now hunt for the other bug/issue. Question: Is is possible to display the Mpts ''highscore'' for the lattice the simulation is running currently on? |
Zerberus 2013-03-12 15:45:40 | Crash at the end of the simulation/beginning of next (since I see the text "Hello, Zerberus" on the screen (not always). Other times it just hangs. On a side note, that directory is on Everyone Full Control. [Edited by Zerberus at 2013-03-12 15:47:10] |
Stephen Brooks 2013-03-12 16:20:44 | So that's happening at the simulation end. Interesting: same bit of code. I looked and found a couple of things that might be causing problems - possibly trying to create/sort an array of zero elements. Can you download the archive again (muon1_debug.exe should now be timestamped 2013-03-12 16:18) and see if this bug goes? When you say Mpts "highscore", you mean as some way of estimating the time to complete the result? |
Zerberus 2013-03-12 21:15:28 | This will be the first thing I'll try tomorrow (have to sleep now, sorry). Highscore = the highest Mpts for the lattice currently running, as known by your client. |
Zerberus 2013-03-13 14:58:41 | Couldn't get an error at the end of a simulation anymore. But to effectively continue testing, I'd need a non-debug build based on the current one. The debug build is slow as hell. Oh, by the way: Is there a way to stop the leaking of particles out of the accelerator? Particles flying in backward diagonal directions seem to be especially vulnerable to leaking out. [Edited by Zerberus at 2013-03-13 15:00:15] |
Stephen Brooks 2013-03-13 15:27:38 | Good, there'll be an _alpha4 build out Although I've improved the number of leaky particles greatly, I think there's still a way for them to get out near the beginning, especially in early runs where the solenoid is in a strange position. It's fairly harmless but I'll look into it for a later version. |
Stephen Brooks 2013-03-13 16:25:19 | [update] There's a new alpha 4 build available from the main page. Hopefully it's fixed
A side effect is that you can now combine the -g "smooth graphics" switch with the -scr screensaver mode too if you want. Specifying -w and -g gives the old -w behaviour. |
Stephen Brooks 2013-03-14 18:25:39 | Oops, -g switch still didn't work properly in _alpha4 because it was overridden by display.txt! Fixed in the development source. |
Zerberus 2013-03-15 20:38:21 | Since it's looking good so far, I'll try to update my main workhorses to 4.46alpha4. Does the server take 4.46alpha results already? [Edited by Zerberus at 2013-03-15 20:39:03] |
GP500 2013-03-16 10:46:26 | i have a higher mpts rate on the alpha3 then the 445. I was looking @ the CPU-time and been 10-15% seems to stay on idle time. 46 hrs lost so far, quad BE 955. That seems allot. |
Zerberus 2013-03-17 00:00:14 | Minor issue: When displaying size of results database and size of results since last send in graphical modes, fractions following the dot are not correctly shown. Example: The results database is 84.70 MB, but muon1 only displays 84.00 MB. |
[OCAU] badger 2013-03-18 00:50:22 | Have been running alpha 3 for five days now, seems to be a bit faster than v445 despite running slower (shorter) simulations due to only 2 lattices (one no sample). Muon bench: on i7 2600@ 3401 MHz: v445 66709,4015316.0,2340.59 v445 (threads set to 50) 512114,19741474.6,2607.51 v446 776674,219999.6,2502.86 v446 (threads set to 50) 878428,483593.6,2621.03 It might be a bit faster now I have some results in results.dat - will rebench with alpha 4. No crashes or problems so far, but running in commandline mode only. |
[OCAU] badger 2013-03-18 01:36:56 | Installed alpha4 by copying the new muon1.exe into the directory, when I ran it, it deleted the save file from alpha 3 |
Stephen Brooks 2013-03-18 12:56:19 | GP500 wrote:I was looking @ the CPU-time and been 10-15% seems to stay on idle time. 46 hrs lost so far, quad BE 955. That seems allot. Try reducing the "particles per extra thread" to 50, 40 or 25. Does that help? I'm thinking of changing the default because modern CPUs and OSes seem better at handling short-lived threads. badger wrote: Installed alpha4 by copying the new muon1.exe into the directory, when I ran it, it deleted the save file from alpha 3 That's by design. I might change the autosave format between versions, so it will delete the temporary save file. Better than a crash or an erroneous result. |
Stephen Brooks 2013-03-18 16:52:52 | Zerberus: "Does the server take 4.46alpha results already?" Yes, it's taken them since _alpha3 I think. The first two alphas didn't put checksums on the results, but that wasn't intentional! Think I've fixed the decimal place bug in the development source. Badger's benchmarks suggest I should set the default particles/thread lower limit to 50 or so. Any more comparisons people can get on this would be good (though benchmarking takes time). |
GP500 2013-03-18 18:42:47 | I will try that, with a new start and benchmarking, overnight i think. |
[OCAU] badger 2013-03-19 00:46:22 | Benched for a bit longer with alpha4 and particles set to 40: 1195977,1319881.9,2660.31 1196577,1322003.3,2667.15 also on e8400 @3GHz, particles 40 336233,423182.8,605.35 particles 70 406387,1582141.5,605.45 (not much difference for particles setting) (v445 results, particles 25) 1813891,12316044.3,603.51 Have moved my other 2 boxes onto alpha4, see how they go |
Stephen Brooks 2013-03-19 11:31:41 | So those last 3 on the E8400 are comparable to each other? (all same version but different particles) Kind of implies the optimum is around 50, as 25 is getting worse, though that's probably within the bounds of error. I've done some debugging on a Linac900Ext8Xc2 simulation. This graph plots number of particles vs. time to complete one step. The simulation cycled between using 1,2,3,4 threads on this 4-core Xeon machine. (Muon1 detected 4 logical cores although the web says the Xeon E5620 has hyperthreading, that's a bit odd). Anyway, the graph seems to say that for 0-40 particles I want to pick 1 thread. 2 threads is optimal for 40-60 particles. 3 threads for 60-100 particles and then from 100-150 particles it's not entirely clear whether 3 or 4 are better, but from 150 upwards 4 threads is the clear winner. Remember the 4 thread mode might be slightly compromised by other things running on my system. That's consistent with a particles-per-thread setting of somewhere between 33 and 50. |
Stephen Brooks 2013-03-19 12:43:28 | OK, v4.46 alpha 5 is out, you can download it from the main page. Changes since alpha 4:
I now wonder what Zerberus meant by the "show highest Mpts" - did you mean in graphics mode or commandline mode or both? Because I've only done graphics mode at the moment. |