bench1 tests the classic (1986) Barnes and Hut code in C, without quadrupole corrections, for 10240 particles and 64 timesteps. bench2 tests a more modern Treecode, this one is the O(N) version from Dehnen (2002?). Again 64 timesteps. bench0 has been added to compare a tree code with a direct N-body code, also 64 timesteps, but now for fewer particles to prevent the benchmark to dominate the whole series. bench3 benchmarks the I/O and creation of a large (>2GB) snapshot. bench4 benchmarks an image operation.
bench0: time directcode nbody=10240 bench1: time hackcode1 nbody=10240 bench2: time mkplummer p1 10240; time gyrfalcON p1 . kmax=6 tstop=2 eps=0.05 bench3: time mkspiral s000 1000000 nmodel=40 bench4: ccdmath "" - ’ranu(0,1)’ size=128 | ccdpot - . help=c
and the data :
bench0 (directcode) I5/7200@2.5 47.7 I7/3630QM@3.6 174.2 (nemo2: I7/3770@3.5 152.6 P4/3.0 342.7 G5/2.0 472.8 AMD-opt64/2.0 300.3 AMD-ath64/2.0 888.7 P4/2.8 w/cygwin 365.9 x86_64/3.0 516.4 x86_64/3.2 480.6 sparcv9+vis/0.36 3048.2 bench1 (hackcode1) I5/2.5 2.2 I7/3.6 14.0 (nemo2: 1.2 - 3.6) 5.1 I7/3.5 4.4 P4/3.0 13.4 G5/2.0 13.1 AMD-opt64/2.0 9.36 AMD-ath64/2.0 17.45 P4/2.8 w/cygwin 15.1 x86_64/3.0 11.7 x86_64/3.2 10.7 sparcv9+vis/0.36 81.5 bench2 (gyrfalcON) I5/2.5 2.5 I7/3.6 8.1 (nemo2) 2.948 3.19 5.09 I7/3.6 31.2 (?) P4/3.0 10.8 G5/2.0 21.0 AMD-opt64/2.0 8.1 AMD-ath64/2.0 8.9 P4/2.8 w/cygwin 45.5 x86_64/3.2 8.6 sparcv9+vis/0.36 85.1 bench3 (mkspiral) I5/2.5 5.4u 1.6s I7/3.6 23.657u 3.856s 0:28.04 98.0% (nemo2) 10.760u 0.832s 0:20.33 57.0% I7/3.5 7.112u 1.520 0:09.37 98.5% P4/3.0 22.890u 5.980s 1:45.63 27.3% G5/2.0 28.400u 24.660s 1:05.41 81.1% AMD-opt64/2.0 18.540u 10.921s 0:56.93 51.7% AMD-ath64/2.0 29.311u 10.353s 0:59.88 66.2% (SATA) P4/2.8 25.541u 8.081s 0:59.98 56.0% (S/ATA) P4/2.8 w/cygwin 276.56u 26.35s 6:34.75 76.7% (using mkplummer V2.8) x86_64/3.0 21.651u 8.897s 0:48.05 63.5% 0+0k 0+0io 0pf+0w x86_64/3.2 21.950u 9.997s 0:39.37 81.1% (SATA) i7/2.93 7.892u 3.170s 0:12.92 85.6% (HDD) i7/2.93 7.662u 1.467s 0:09.13 99.8% (SHMEM) bench4 (ccdpot) I5/2.5 11.9 I7/3.6 23.657u 3.856s 0:28.04 98.0% (nemo2)
% time ls 0.012u 0.068s 0:00.77 9.0% 0+0k 8376+0io 0pf+0w 2.324u 1.080s 0:09.25 36.7% 0+0k 1049384+2097160io 2pf+0w 1.876u 0.788s 0:03.63 73.0% 0+0k 0+2097160io 0pf+0wOn linux the command
echo 1 > > /proc/sys/vm/drop_cacheswill clear the disk cache in memory, so your program will be forced to read from disk, with all possible interference from other programs
In NEMO
another useful addition to the benchmark is that the output can be turned
off easily, by using out=., viz.
% sudo $NEMO/src/scripts/clearcache % time ccdsmooth n1 . dir=x 0.852u 1.068s 0:12.41 15.3% 0+0k 2098312+0io 6pf+0w 0.812u 0.400s 0:01.21 100.0% 0+0k 0+0io 0pf+0w 0.820u 0.380s 0:01.20 100.0% 0+0k 0+0io 0pf+0wwhere the last two instances were just re-running the same command, but now clearly showing the effect of reading the file from memory instead of disk. By repeating this whole series a few times, an lower bound to the wall clock time is more likely to properly account for the I/O overhead time.
CGS(1NEMO) scfm(1NEMO)
~/data standard repository area for data files.
12-may-97 created PJT 26-nov-03 finally added some data PJT 17-feb-04 added bench0 comparison PJT 31-mar-05 added some cygwin numbers, fixed input PJT 6-may-11 added i7 and SHMEM/HDD comparison PJT 27-sep-13 added caveats PJT 6-jan-2018 updated for V4, more balanced benchmarks PJT 27-dec-2019 nemo.bench; updated with potcode and orbint; now 10 tasks PJT