K`Tetch 2012-03-30 22:49:59 | Ok, Q6600 on win Vista64 with v4.45 2404357, 16577063.6, 526.15 2406510, 16578246.3, 527.77 2409582, 16579876.6, 528.03 2412655, 16581485.5, 527.66 Hmm, thats down from what it used to be. Saying that though, that seems ot have been from a long time back. More recent benchmarks have been 400ish area, but this was an 18 hour run with the rarity of all my browsers closed. Benching the i3-380m on win7 right now. |
K`Tetch 2012-03-31 00:02:34 | Oh yeah, i5-2400 on win7, came to 1600kpts/sec (it's drider1969's system, I just 'borrowed it') it's a 3.1ghz chip, 4 threads (4 cores) ddr3 ram 1333 i think The q6600 above is a quad core, 4 thread 2.4ghz chip, with ddr2-800 ram. |
Silverthorne 2012-04-09 02:48:26 | Here's a new benchmark for a I7 2700K@4.8ghz, it only ran for about 4.5 hours. Uptime (secs),Mpts in file,Estimate kpts/sec 18209,46762608.5,0.00 18811,46765025.7,4010.78 19113,46765671.4,3388.12 22729,46778484.0,3512.15 23030,46779210.0,3443.18 23331,46781028.3,3595.62 23934,46782275.5,3434.96 24235,46784050.0,3557.68 24537,46784062.4,3390.06 24838,46785107.5,3393.61 25441,46786673.8,3327.49 26044,46789353.0,3413.49 26345,46790374.2,3412.59 26948,46792330.7,3401.10 27550,46794428.8,3406.27 27851,46794688.4,3326.79 28153,46795860.3,3343.82 28454,46797150.6,3371.41 28755,46797547.4,3312.74 32372,46810125.6,3355.02 32673,46811123.9,3354.13 If my math is correct it's hitting 3441.94 kpts/sec with ht enabled. |
K`Tetch 2012-04-09 04:54:42 | Ok, I had to go away for the weekend, and left my home system running within only irc and email going. peak was 592 at about 24 hours, but the amount slowly dropped as more ram was used to read the .dat (62,227 entries) so that after t+273150, mpts was +157211.9 and an estimate 575.55 |
K`Tetch 2012-04-13 12:13:33 | laptop, i3-380m with win7 (2.53ghz, 2core+HT=4thread) 138944,388635.9,691.13 139544,389042.1,690.77 so about 690 |
K`Tetch 2012-04-13 16:42:33 | typical, i lave it running longer... 144345,392227.8,686.14 144645,392562.3,690.66 151247,397404.3,698.70 152147,398122.7,701.18 152747,398543.9,701.19 153047,398714.1,700.10 153347,398919.3,699.97 153647,399038.7,697.55 154247,399512.5,699.00 155148,400144.7,699.07 700? |
[AMD Users] Michal Hajicek 2012-07-18 13:13:10 | desktop, Intel Core2duo E7400 - 2,8GHz, WinXP 32bit two muon1 clients (one thread each), v. 4.45 18327,10208.2,583.78 18628,10370.1,582.91 18928,10555.1,583.56 19228,10689.3,580.99 19528,10886.2,582.38 19828,11031.9,580.63 20128,11253.3,583.44 20428,11409.0,582.30 20728,11593.1,582.85 21028,11737.1,581.10 21328,11938.5,582.60 21628,12080.4,580.81 21928,12259.2,581.05 22228,12444.3,581.62 22528,12584.2,579.82 22828,12778.5,580.86 23128,12971.4,581.80 23428,13146.0,581.81 23728,13321.0,581.83 24028,13474.0,580.79 24328,13636.3,580.22 24628,13809.3,580.17 |
[OCAU] badger 2012-10-12 05:59:24 | Thought I'd add some benchmarks, since I'm playing with them atm. CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s) 8 GB ram - on 32 bit win 7 1 instance, normal: 77383,4040724.3,2426.38 77683,4041711.3,2455.11 84287,4057820.6,2448.45 66709,4015316.0,2340.59 2 instances, nosample lattice only: 7501,18680129.7,1014.04 and 7483,18679795.4,888.91 total of about 1900 kpts/s 1 instance, nosample lattice only: 8848,18684280.2,1561.87 9149,18684736.9,1551.52 For nosample, since each run takes of the order of 40 seconds at the moment (best result about -0.218) there is fair bit of downtime between the runs, as well as some lost time sending results (the first 3 servers are down?) because it had a lot more results to send in a given time. There is also a noticeable loss in usage toward the end of the run (according to task manager) I guess as the number of particles falls off. Thus the substantial advantage of running 2 instances at once (both on auto threads). [Edited by [OCAU] badger at 2012-10-12 06:02:35] [Edited by [OCAU] badger at 2012-10-12 06:08:58] |
Stephen Brooks 2012-10-12 11:21:19 | The business about fewer threads being produced when there are fewer particles can be tuned using the lineParticles per extra thread (limits threading overhead): 100 in config.txt. Making this smaller (e.g. in the extreme case, 1) will produce more threads but you'll still have Windows OS overhead in creating and waiting for the threads. Making it larger will revert to fewer threads more often. The value of 100 came from a benchmark I did back in 2004, it errs on the side of creating more threads (i.e. 200 might be slightly better), though it's a messy and broad optimum. Newer versions of Windows and CPU architectures might be slightly different, though. |
[OCAU] badger 2012-10-15 01:21:13 | I've seen the "particles per extra thread" option, but was not sure on how it worked. I ran over the weekend with the normal lattice priority (most with samples) and two instances, one auto threads + normal priority, the other set to 4 threads and background priority, to try to "mop up" unused CPU cycles. 224216,19019460.3,1541.52 227684,18881566,940.74 total 2482.26 226321,19022742.5,1541.7 227985,18881759.6,940.32 total 2482.02 236848,19039018.4,1541.91 244826,18897997.7,942.06 total 2483.97 238953,19042338.9,1542.24 248134,18901186.5,942.37 total 2484.61 249179,19058615.5,1544.39 248735,18901312.8,940.5 total 2484.89 this is a little bit faster (about 1.4%) than just "auto" threads single instance: 74081,4032843.9,2450.66 74681,4034228.9,2436.31 76182,4037433.8,2376.12 77383,4040724.3,2426.38 77683,4041711.3,2455.11 84287,4057820.6,2448.45 There is a much larger difference when doing a lot of short simulations - 2 instances is about 25% faster although still 22% slower than running normal simulations with samples. I'll try changing "particles per extra thread" to 50 to see how that changes things. |
[OCAU] badger 2012-10-16 00:39:49 | After running at "particles per extra thread" at 50 with 1 instance, auto threads overnight, it seems to be a bit faster again than 100 (even 100 with 2 instances): 319679,19240184.7,2598.03 320581,19243215.5,2614.03 322084,19246469.2,2598.86 323286,19249653.8,2600.16 329299,19265923.2,2612.45 329599,19266116.8,2601.07 330802,19269300.9,2602.13 332004,19272764.9,2608.29 Out of interest I've also benchmarked two other boxes running muon under wine in linux (by adding up the total points produced over 3 days 5 hours). 1: i5 3570k @3.4 Ghz = 2022 kpts/s - bit of a drop in performance compared to the i7 at same clock speed and # of cores. Not sure if HT, more L3 cache or wine/linux 2: Core 2 Duo E8400 @ 3.708 GHz = 766 kpts/s - this is a bit slower again considering higher clocked and half the # of cpus. |
[OCAU] badger 2012-10-18 00:19:54 | just to add more, I ran it overnight again with "particles per extra thread" set to 25. 477550,19651786.2,2618.83 478753,19654977.0,2619.90 479053,19655849.9,2622.05 480255,19659084.7,2624.05 486266,19675357.1,2634.63 492277,19691304.4,2636.73 492577,19691388.6,2623.49 493779,19694572.6,2624.04 494982,19697750.8,2624.47 495582,19699371.2,2625.23 495883,19700128.4,2624.68 497085,19703495.7,2628.33 497686,19704551.8,2619.40 498888,19707736.0,2619.99 500090,19711339.6,2627.43 501292,19713712.2,2614.81 501893,19715622.6,2620.21 So it seems like (win 7 at least) and modern cpus/chipsets handle extra threads very well. |
Stephen Brooks 2012-10-18 11:20:55 | If I can get some confirmation of this from my system(s) and perhaps someone else on the forum as well, a reduction of the default setting shipped with Muon1 might be in order. |
Zerberus 2012-10-18 13:41:56 | Multi-threading has improved, since multi-core and multiple-processors have become common. The default could be changed, and users of older systems advised to reduce threads if it doesn't work well on their machines. |
[OCAU] badger 2012-10-19 01:45:04 | I've set my other machines to 40, as there was a large difference from 100 to 50, but not much going from 50 to 25. I'll see if there is a change in output for them with wine under linux. |
[OCAU] badger 2012-10-22 00:52:30 | Ok for the other machines running in wine under linux, kpts/s 3 days of running, time take from file creation of results.txt (no auto upload) particles Machine1 Machine 2 100 2022 766 40 2018 694 so the older machine (core2duo) suffers a bit with less particles, the newer machine (i5) is basically the same |
GP500 2013-03-08 09:53:21 | overnight bench on AMD 955 BE 3,72 GHz around 1200 score i guess. Uptime (secs),Mpts in file,Estimate kpts/sec 196581,12620831.8,0.00 198385,12622888.9,1140.32 200190,12624990.5,1152.45 201694,12627116.0,1229.28 201994,12627142.0,1165.87 202294,12627585.3,1182.15 203798,12629134.6,1150.46 204100,12629187.2,1111.32 206207,12631324.8,1090.16 206507,12631409.2,1065.63 208009,12633578.8,1115.44 208310,12633629.7,1091.21 209811,12635757.5,1128.18 210112,12636087.5,1127.52 211614,12638243.9,1158.33 211914,12638333.5,1141.48 219723,12648830.0,1209.85 227833,12659428.2,1235.02 229335,12661501.4,1241.69 237445,12672063.0,1253.73 238946,12674142.2,1258.36 |
Stephen Brooks 2013-03-13 17:56:42 | Looks like I haven't done a graph on this thread for a while. Here's a big one, I've merged v4.44d with v4.45 results but tagged them in my spreadsheet so I can filter the old ones out later if necessary. |
RGtx 2013-06-15 23:58:04 | Haswell i7 4770k ~3.9GHz, HT enabled: Uptime (secs),Mpts in file,Estimate kpts/sec 14918,7767966.7,0.00 15219,7768578.5,2034.49 15520,7769831.7,3100.95 15821,7770512.1,2821.46 17024,7774698.4,3196.43 17325,7775365.0,3073.87 18530,7778755.7,2987.40 18831,7779741.3,3009.29 19132,7780943.8,3079.62 19433,7781622.4,3024.61 19734,7782237.8,2963.30 20939,7785629.5,2933.94 21842,7789149.5,3059.48 22143,7789828.7,3026.03 22444,7790437.7,2985.89 23348,7793834.4,3068.70 24552,7797223.9,3036.96 24853,7797901.9,3013.12 25155,7799170.3,3048.36 25456,7799802.3,3021.22 25757,7801051.6,3052.50 26058,7801715.7,3029.59 26359,7802984.6,3060.75 26660,7803641.2,3038.19 26962,7804320.4,3018.57 27865,7807712.2,3069.96 28166,7808391.2,3051.48 28467,7809532.3,3067.91 28768,7810221.4,3050.99 29069,7810934.1,3036.41 29370,7812282.4,3066.40 30574,7815665.6,3046.85 30874,7816382.8,3034.37 31175,7817666.5,3057.20 31476,7818525.1,3053.53 31776,7819135.8,3035.29 32077,7819806.7,3021.20 32378,7820556.5,3012.11 32679,7821236.2,2999.33 32980,7822594.3,3024.49 33281,7823207.6,3008.33 33582,7823886.0,2996.13 34029,7825057.1,2987.41 34330,7825733.2,2975.86 34631,7827081.2,2998.75 34933,7827772.1,2988.16 36137,7831167.2,2978.50 36438,7831846.8,2968.42 |
tomaz 2014-08-07 07:02:36 | Out of the box i7-4930k (6 core), 3.4 GHz, Windows 8.1pro, 64 bit Uptime (secs),Mpts in file,Estimate kpts/sec 61359,329420.7,0.00 61659,330564.7,3813.16 61959,332039.0,4363.63 62259,333571.8,4612.12 62559,335105.8,4737.36 63159,336672.7,4028.70 63459,338168.2,4165.28 63759,339672.7,4271.46 64059,341295.3,4397.79 64359,341605.7,4061.45 64659,343150.5,4160.30 64959,344656.2,4231.82 66759,352316.7,4239.76 67059,353852.9,4286.11 67359,355420.6,4333.06 67659,356937.2,4367.44 69459,364626.7,4346.17 71259,372293.5,4330.34 71559,373817.8,4352.41 71859,374898.8,4331.00 72159,376061.4,4318.33 72459,377592.6,4339.55 72759,379799.9,4418.96 73059,380963.3,4405.09 74860,388754.5,4394.84 75460,390545.8,4334.85 75760,391759.9,4328.85 76060,393475.9,4357.23 76360,394753.4,4355.24 76660,396263.4,4368.53 76960,397796.2,4382.77 77560,399364.7,4317.26 79060,407153.2,4391.39 79660,408695.0,4331.66 79960,410263.9,4346.13 80260,411807.1,4358.79 80560,412479.1,4325.68 80860,413993.2,4336.77 82360,421681.3,4393.08 82660,422381.2,4364.06 82960,423654.4,4362.39 83260,424978.5,4363.08 83560,425988.2,4349.60 83860,427251.9,4347.76 84160,428367.4,4339.48 84460,429897.7,4349.36 84760,431426.2,4358.91 85060,432932.3,4367.28 86860,440631.9,4360.93 88661,448335.5,4355.56 88961,449835.4,4362.56 89261,451017.4,4358.01 89561,452555.1,4366.17 89861,453865.6,4366.19 90161,455406.7,4374.21 90461,457009.9,4384.21 91061,458549.6,4347.47 91361,460104.5,4355.82 91661,461640.8,4363.40 91961,463132.8,4369.37 92261,464276.9,4363.97 92561,465792.1,4370.57 92861,467314.0,4377.25 93461,468858.3,4343.54 93761,470399.3,4350.88 94061,471945.4,4358.24 94361,473087.1,4353.22 94661,475084.5,4373.97 94961,475804.5,4356.35 95261,476967.8,4352.08 95561,477520.3,4330.06 95861,479055.4,4336.90 96162,480261.5,4334.14 96462,481798.0,4340.87 96762,483334.5,4347.48 97062,484876.8,4354.14 97362,486417.9,4360.66 97662,487940.8,4366.57 97962,487968.1,4331.53 99462,495678.2,4363.33 99762,497189.4,4368.60 100362,498729.6,4340.88 100662,499896.4,4337.43 100962,501388.7,4342.25 101262,502950.5,4348.74 101562,504665.1,4358.93 101862,505831.1,4355.43 102162,507353.5,4360.71 102462,508024.9,4345.22 102762,509565.4,4350.94 103062,511104.6,4356.54 103362,512642.7,4362.04 103662,513783.5,4358.07 103962,515277.4,4362.45 104263,516812.5,4367.72 104563,518003.4,4364.95 105163,519445.6,4338.08 105463,520956.6,4342.83 105763,522442.5,4346.95 106063,523947.1,4351.43 106363,525434.7,4355.47 106663,526970.5,4360.53 106963,528508.7,4365.57 107263,530025.8,4370.09 107563,530767.1,4357.75 107863,532295.4,4362.50 108163,533791.5,4366.50 108763,536266.5,4363.44 109363,538660.0,4358.75 109663,540318.7,4366.02 110263,541859.5,4343.95 110563,543359.9,4347.94 110863,544904.1,4352.78 111164,545722.0,4342.98 111464,547246.1,4347.39 111764,548831.2,4352.96 112064,550362.5,4357.40 112364,551880.6,4361.53 112664,552983.6,4357.53 112964,554487.0,4361.32 113264,555281.9,4351.43 113564,556846.8,4356.39 113864,558399.1,4361.07 114464,559960.0,4341.18 114764,561479.9,4345.25 115064,562534.7,4340.62 115364,564074.1,4345.01 115664,565712.8,4351.18 115964,566864.0,4348.35 116264,568330.6,4351.30 116564,569857.6,4355.31 117164,571413.8,4336.36 117464,572949.4,4340.55 117764,574473.0,4344.47 118064,575902.5,4346.69 118364,577440.3,4350.79 118664,578584.4,4347.97 118964,579713.3,4344.92 119265,581242.8,4348.82 119565,582776.1,4352.75 119865,584312.4,4356.68 120165,585851.3,4360.62 120465,585933.3,4339.88 120765,587473.3,4343.88 121065,588975.8,4347.21 121365,590589.1,4352.36 121965,592212.0,4336.05 122265,593843.0,4341.47 122565,595769.8,4351.66 123165,597307.1,4334.29 123465,598888.5,4338.81 123765,600417.4,4342.45 124065,601008.2,4331.09 124365,602587.0,4335.53 124665,603754.5,4333.42 124965,605297.0,4337.23 125265,606807.6,4340.50 125565,608356.6,4344.35 125865,609020.9,4334.44 126165,610532.8,4337.70 126465,612195.7,4343.25 127066,613762.4,4327.43 127366,615290.8,4330.91 127666,616790.0,4333.92 127966,617399.1,4323.55 129466,625158.9,4342.25 130066,626737.1,4327.29 130366,629028.1,4341.68 130666,630566.0,4345.07 131266,632151.1,4330.45 131566,633694.7,4333.93 133366,641404.1,4332.64 133666,642948.2,4336.02 133966,644491.6,4339.35 134267,646009.5,4342.32 If you crop first 20 results average is 4352 kpts/s (be calm, no need for chart resizing yet . Since that nice eletronic device has 6 cores and it runs mostly on 3.54 GHz (bios was on default-HT on, turbo boost on) it gives 205 pts/Mclock/core ! Well, it is the record breaking performance CPU was used ~98%, rest is system etc. If you draw a graph of kpts/s you can see slightly falling trend in last lets say 5 hours (whole run is 20 h)-probably due to heating ? If I let it run shorter time, average would be ~4356. I might try benchmark again in a week or so to compare. |
Stephen Brooks 2014-08-07 17:10:24 | Nice I think your efficiency is very high, though it's going to be tough to figure out the true average clock speed of all those cores now that Intel dynamically changes them. Is there a program like CPU-Z that prints a log of core frequencies? The next highest Core i7x6 is Maniacken's at 3310Mpts/s but that was from three generations ago and running a bit slower. |
tomaz 2014-08-08 09:35:19 | Good point, Stephen. First I thought that 3.54 GHz is maximum "boost", but it is not. It's 3.9 GHz. Everytime I took a look at Task manager, CPU speed was on 3.54 GHz. I then run program called ArgusMonitor. It showed frequencies of 3602 kHz (100.06 x 36). Default multiplier is 34, Turboboost adds 2.I don't know why there is slight difference between Task manager and ArgusMonitor estimated frequencies. According to Argus, which shows all 6 cores, boosting is applied evenly on all cores, so they are ticking at same frequencies (which simplifyes the problem). I also observed temperatures, they are quite different between cores, from 60 to 64 °C. I found it quite warm so I opened front door of my silent case-temperatures dropped to 56-60 °C. Unforunately, that doesn't affect Turbo-boosting. But I think it is strongly related to core temperatures anyway. I observed Task manager and Argus for a longer time. When computation of genome is finished and it writes results, CPU usaged drops to 10%. At the same time, CPU frequencies rise to 3.84 MHz (in Task manager, Argus doesn't register it, due to very short period) and temperatures fall to ~55 °C for a fraction of second. Maybe, if one could keep temperatures of all cores bellow ~55 °C CPU would run happily at 3.8-3.9 GHZ ?? Let's wait for the winter Anyway, conclusion is that above estimation of efficiency is rather realystyc. If we use Argus estimations of frequencies it gives 201 instead of 205 pts/Mclock/core. (How can I add pictures in post ? I made a screen shot of TM and Argus) [Edited by tomaz at 2014-08-08 09:37:48] |
Stephen Brooks 2014-08-08 17:19:24 | You add pictures by writing the HTML img tag and then ticking "Use HTML?" at the bottom of the post. The reason the clock speed increases in the genome/initialisation part between simulations is that that part is single-thread whereas the main simulation is multi-threaded. So the CPU runs a single core much faster during that period. The old highest efficiencies of around 180 pts/Mclock/core were on older versions of the Core i7 (at least 2 generations old) so I can believe the efficiency has improved to ~200 since then. Especially since Intel have emphasised efficiency over clock speed increases recently. Some CPUs have "performance counters" that actually count the cumulative number of cycles - polling those would give a correct average GHz. |