|Up until a few days ago, every PC I have ever run DPAD on has behaved the same way. By that I mean, the Task Manager Performance chart would always be maxed out at 100% with a flat line display. It never varied. This was on AMD and Pentium, WinNT, Win2000 and WinXP. If you watched the Muon CPU percentage in the Processes list, you could see the CPU percentage vary, but the graph stayed maxed.|
But, I recently put a Socket 939 Opteron 165 online and it looks different. It is XP SP2, fully updated and very stable. But, the Performance Chart graph seems to be tracking just Muon1, not total CPU usage. When a new optimization starts, it is a very high 95 to 99%. As it progresses, the graph invariably trends down, sometimes getting near 50% right at the time the optimization ends, then jumps back up to 99% when the new one starts.
It is moving right through the processing like a dual core Opteron should, but the different behavior puzzles me.
|I should add that it is also the only system using an ALI chipset. The motherboard is an Asrock Dual SATA2 and appears fairly normal in all other respects.|
|This morning I noticed on my AMD64 3800+ that my Task Manager was showing 0% CPU usage. I maximized it to look at my processes running thinking that maybe I accidentally ended that process instead of something else the night before, but muon was still "running", but the chart for the last 20 minutes or so of CPU usage had been at 0%. I checked the directory to see if maybe it was having troubles uploading results or something, but noticed instead that it hadn't written anything to the results.txt file since yesterday (file size was around 15k or so). After I ended it and restarted it, it seems to be running semi-normally, but it does have it's minor dips in CPU usage a few times a minute. Not sure if this is because I'm running 64bit Windows or not but I don't recall behavior similar to this in the past. Maybe it's just a one time 'bug'.|
|I should also add that I am running the latest 4.43c version with no modifications to the config file except to run as Background instead of Low. (If I leave it as Low, Nero acts oddly and print jobs take FOREVER).|
|Yes, I had to go over to my brothers house and reconfigure his to background from low priority due to the fact that some of his programs (Symantec AV 10 & his Java based web games) were near to freezing up on him.|
|curious, are you running the X2/dual core driver that can be downloaded from AMD's website?|
|I am running the X2/dual core driver from Asrock, not the AMD website.|
|I tried the X2/dual core driver version from AMD. No difference. Are there different versions of the Task Manager?|
|Actually I have the X2/Dual-Core Driver from AMD's website. I had no idea that there were other builds of the AMD driver out there. Is there supposed to be a special difference in them? As far as different versions of the taskmanager, I have no idea about that. Task Manager is just an info tool for me. I keep the update speed set to slow and periodically open it up and check it's history over the last 10 minutes or so (as far as the graph shows) to make sure it's been busy. I've recently upgraded to 4gigs of RAM and so I'm wondering if that may be what is causing my 'glitches'. If it keeps acting up on me then I'll swap the 4gig in this machine with the 2 gig in my other 64bit machine and see if it still gives headaches. I just finished converting all 4 of my towers in this room into 1 nice organized 22U Rackmount system, so I don't feel like tearing into it again just yet.|
|With multiple cores, Muon1's parallel algorithm isn't 100% efficient, though it's pretty good with 2-4. I just ran it on a 16-core machine and got only 55%, so I'm going to look at improving that. One good way of mopping up cycles is to run either a 2nd instance of Muon1 (in another directory) or another DC project that will use up the remainder.|
Basically, the algorithm branches out into threads to do the bulk of the processing for each timestep, then merges back together to complete the step. It divides the particles equally (or near-equally) between the processors but if the threads take different times to finish, some end up waiting for the rest, and this seems to get worse for larger numbers of processors.
|Stephen, running from two different folders did make the CPU usage a bit more even. It is still jagged, though, unlike the flat line I am accustomed to seeing.|
I made a Muon0 and a Muon1 (in different folders), manually set the CPU affinitities to the appropriate CPU, and set their Threads=1. I'll let them rip for awhile.
|I got my new pc (although the distributor stuffed up, and I got a much lower spec one than I paid for, it will be replaced soon I hope).|
Running a P4 3.0 Ghz HT cpu, I tried running the background client, but only got one (virtual) cpu (ie 50% usage) with threads set to "auto" or "2". Running 2 instances shared the 50% between the 2 processes! The command line seems to work ok (95-100%), but does drop off in CPU usage at the end of each simulation, down to roughly 20%. I tried running 2 commandline processes, but the 2nd didn't produce any results overnight, even though it supposedly was using the spare cpu cycles....
I now have firedaemon running the main process (commandline), and will try another one (commandline) with muon bench running on both...
|16 cores . . . . wow /drool|
|I actually have some pictures of it here.|
|Now that's a Task Manager worth watching! LOL That's an amazing system. One question though, how do you get 16 cores out of a 4 CPU system? I would have thought that it would be 8 Cores with that setup and Dual Core Opterons. Are there Quad Core Opterons out already or did I miss something in the pics?|
|I've noticed on my new p4 3 Ghz (HT) several issues... (i'm sure I posted this yesterday too?)|
1. if I run with the background version, it only uses 1 "virtual cpu" (ie 50% cpu), with threads in config.txt set to "auto" or "2". I tried 2 instances of the background version (separate directories) and the 2 instances shared that 50%....
Running the commandline version takes up 95-100% most of the time, but tapers off toward the end of the simulation, dropping to 20% in some instances... I have it running with firedaemon now, which hides the commandline version.
I've benched it like that, getting 177kpts/s. I'm now trying 2 instances, the above, plus another instance which is a single thread commandline version to "mop up" spare cpu cycles. so far it looks like I'm getting about 130kpts/s and 26kpts/s (total of 156, ie slower than just the one instance), but I'll run it overnight to check without me doing work on the machine...
I might try running 2 instances with 1 thread each as well.
|bah, I did post yesterday. There is no edit function here?|
|That does sound a bit odd. I know that when I launch muon, that there is only one process being utilized (out of the whopping 2)for around the first 20-30 seconds or so while it's reading/analyzing the results.dat. But once it's finished reading/parsing that stuff then it jumps up to 100% when the simulation actually begins. If yours isn't jumping up to the 90-100% CPU usage (for the HT portion of your CPU) after a bit then that would seem pretty odd to me. Maybe your results.dat are really huge in size? I know that on my machines I try and cut off at 100MB in size (give or take 10MB or so). I just rename them to results.dat.old etc., and then just parse the top 100 or so results to start off the new results.dat and begin the cycle all over again.|
|GRRR, I hit F5 to refresh the 'forum' and the next thing I see is my last post reposted again. LOL Sorry for the Duplicate above.|
|in answer to your points: Results.dat is about 59k, so it isn't that. It only takes about 3-5 seconds to get to 100% here, but that may because it doesn't have to read a huge results.dat I did do some benches with a large results.dat and a small one, it was noticably slower with the big file, with all the time taken to read it, although it was worse for quicker simulations (since it was doing more sims per mpts). I also found it slower when auto sending was enabled, also worse for quicker sims.|
|--[GRRR, I hit F5 to refresh the 'forum' and the next thing I see is my last post reposted again. LOL Sorry for the Duplicate above.]--|
There is actually something to stop that happening, but since you refreshed after Badger had posted another post, it was no longer the same as the one at the end of the thread. An edit/typo-correct function is kind of next on my to-do list for the forum.
|--[One question though, how do you get 16 cores out of a 4 CPU system? I would have thought that it would be 8 Cores with that setup and Dual Core Opterons. Are there Quad Core Opterons out already or did I miss something in the pics?]--|
The pics don't show that there's ANOTHER board behind the daughterboard shown with 4 sockets, which has CPUs 4,5,6 and 7 on it.
--[I also found it slower when auto sending was enabled, also worse for quicker sims.]--
Auto-send or auto-save?
|[TA]JonB - i've been running this system on a dual cpu system since forever (it seems0 and this behavour has always been there. one thing i noticed years ago, is that the prorgam cpu afinity isn't the world's best. If you want to maximise the cpu usage, set both to use 2cpus, and with auto threads. (especially if you have auto-threads set, setting afinities will not wokr, client will detect 2 processors, run dual threaded, but run both threads on the samecpu, so you hget the same problem, only worse.|
In order to soak up the 7-8% on mine that gets unusuaed, i'm now running a different project (the player-based '13th labour' client for the perplex city ARG) muon1 at belownormal, 13l at low. works nice.
|So I ran it overnight, with a 2 threaded main instance, and a single threaded 2nd instance on background priority. The main thread benched at 147kpts/s (cf 177 running alone), but the 2nd instance benched at 42 kpts/s, giving a total output of 189kpts/s, an improvement of about 6%. I find it interesting that the low priority 2nd process steals cycles from the 1st one. looking at cpu time listed in task manager (both started at the same time) the main process has 27hr07min, the other is 8hr24min. I'm guessing it counts each virtual cpu (strange, as it only gives CPU use as a total - 100% = both cpus), since they have only been running for about 20 hours total. |
I'm going to have a go at running 2 single threaded instances...
|Benched 2 single threaded instances, got 102kpts/s for one, 96 kpts/s for the other, giving a total of 198kpts/s. Thus the 2 single threaded instances is the most efficient way to run muon in the current version (at least on this machine)|
|[ocau]badger - have you tried running TWO dual-threaded instances? hat gives the best results for me.|
|I ran 2 dual threaded instances overnight, with instance 1 giving 155kpts/s and instance 2 giving 37kpts/s, a total of 192kpts/s. My conclusion, 2 single threaded instances is still best, at least on this machine. Remember this isn't a real dual cpu machine, but a single processor with hyper threading, YMMV|
|How can you permanently set CPU affinity?|
|I don't think you can, not with Muon1 in its current state. I might be able to add it as a feature because there is the SetThreadAffinityMask function in the Windows API that sets this sort of thing. However I think if you're concerned about dropped cycles, running N clients with 1 thread each ought to usually all go on separate processors. One of the few things the Windows kernel does seem to be pretty effective at is scheduling tasks to "use up" all the available CPU.|
|[DPC] Eclipse~Lord Alderaan|
|Yeah multi-core setups where tested in this thread too:|
It seems that runnnig a copy of Muon1 (configured for 1 thread) for each core you have is best. Setting affinity doesn't seem to have any influence in this case because limiting a muon1 instance to 1 thread forces it on one core and windows seems to assign them each to its own core which means they won't bother each other.
|Ok thanks guys ,that's what I did in the end ,& it is pretty much maxing out the CPUs now|
I now have another DPAD problem to look into..... (will be in a new thread)