ethtool -A eth3 autoneg on rx on tx on ethtool -A eth4 autoneg on rx on tx on ethtool -A eth5 autoneg on rx on tx on ethtool -A eth6 autoneg on rx on tx on ethtool -C eth3 rx-usecs 12 ethtool -C eth4 rx-usecs 12 ethtool -C eth5 rx-usecs 12 ethtool -C eth6 rx-usecs 12rx-usecs latency is about 12-13 usecs.
./configure --prefix=/opt/open-mx-1.2.0 --disable-mx-wire --disable-endian --disable-FMAomx_perf result (single port p2p connection)
=========== length 0: 13.834 us 0.00 MB/s 0.00 MiB/s length 1: 12.902 us 0.08 MB/s 0.07 MiB/s length 2: 13.061 us 0.15 MB/s 0.15 MiB/s length 4: 13.097 us 0.31 MB/s 0.29 MiB/s length 8: 12.903 us 0.62 MB/s 0.59 MiB/s length 16: 13.091 us 1.22 MB/s 1.17 MiB/s length 32: 13.392 us 2.39 MB/s 2.28 MiB/s length 64: 13.934 us 4.59 MB/s 4.38 MiB/s length 128: 15.848 us 8.08 MB/s 7.70 MiB/s length 256: 18.922 us 13.53 MB/s 12.90 MiB/s length 512: 24.026 us 21.31 MB/s 20.32 MiB/s length 1024: 33.584 us 30.49 MB/s 29.08 MiB/s length 2048: 53.032 us 38.62 MB/s 36.83 MiB/s length 4096: 101.085 us 40.52 MB/s 38.64 MiB/s length 8192: 174.576 us 46.93 MB/s 44.75 MiB/s length 16384: 240.273 us 68.19 MB/s 65.03 MiB/s length 32768: 372.839 us 87.89 MB/s 83.82 MiB/s length 65536: 667.963 us 98.11 MB/s 93.57 MiB/s length 131072: 1207.452 us 108.55 MB/s 103.52 MiB/s length 262144: 2258.683 us 116.06 MB/s 110.68 MiB/s length 524288: 4377.124 us 119.78 MB/s 114.23 MiB/s length 1048576: 8607.746 us 121.82 MB/s 116.17 MiB/s length 2097152: 17065.792 us 122.89 MB/s 117.19 MiB/s length 4194304: 33994.376 us 123.38 MB/s 117.67 MiB/s ===========
=========== ./configure --prefix=/opt/openmpi-1.4-mx \ --with-memory-manager=none \ --disable-shared \ --disable-mpi-cxx \ --enable-static \ --enable-mpi-threads \ --with-threads=posix \ --with-mx=/opt/open-mx \ --with-mx-libdir=/opt/open-mx/lib64 \ CC=icc CXX=icpc F77=ifort FC=ifort \ CFLAGS="-O3 -xT -static -g -traceback -gcc -m64" \ FFLAGS="-O3 -xT -static -g -traceback -m64" \ FCFLAGS="-O3 -xT -static -g -traceback -m64" \ LD=ld ===========The option "--with-memory-manager=none" is important with Intel compiler. Without this option, compile will stop at the opal/.../ptmalloc2 dir.
=========== btl_mx_bandwidth = 500 btl_mx_latency = 25 btl_mx_bonding = 1 btl_base_warn_component_unused = 1 ===========And 4 ports (eth3,eth4,eth5,eth6) are attached to the Open-MX driver. With these settings, OpenMPI with Open-MX shows amazing performance.
============= #--------------------------------------------------- # Intel (R) MPI Benchmark Suite V3.2, MPI-1 part #--------------------------------------------------- # Date : Mon Dec 21 11:52:09 2009 # Machine : x86_64 # System : Linux # Release : 2.6.18-164.6.1.el5 # Version : #1 SMP Tue Nov 3 16:12:36 EST 2009 # MPI Version : 2.1 # MPI Thread Environment: MPI_THREAD_MULTIPLE # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # ./IMB-MPI1 sendrecv # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Sendrecv #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 # ( 2 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 13.78 13.80 13.79 0.00 1 1000 13.54 13.56 13.55 0.14 2 1000 13.47 13.47 13.47 0.28 4 1000 13.33 13.33 13.33 0.57 8 1000 13.42 13.43 13.42 1.14 16 1000 13.74 13.74 13.74 2.22 32 1000 14.08 14.08 14.08 4.34 64 1000 14.77 14.77 14.77 8.26 128 1000 17.08 17.08 17.08 14.29 256 1000 20.11 20.12 20.12 24.27 512 1000 25.45 25.46 25.46 38.36 1024 1000 35.53 35.57 35.55 54.91 2048 1000 56.03 56.07 56.05 69.67 4096 1000 132.30 132.30 132.30 59.05 8192 1000 139.91 139.91 139.91 111.68 16384 1000 170.61 170.63 170.62 183.15 32768 1000 262.18 262.21 262.20 238.36 65536 640 329.08 329.10 329.09 379.82 131072 320 474.95 474.98 474.97 526.33 262144 160 801.22 801.29 801.25 624.00 524288 80 1546.08 1546.25 1546.16 646.73 1048576 40 3130.63 3130.82 3130.73 638.81 2097152 20 6503.36 6504.00 6503.68 615.01 4194304 10 15194.99 15195.80 15195.39 526.46 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 4 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 13.99 14.04 14.02 0.00 1 1000 13.71 13.72 13.71 0.14 2 1000 14.28 14.32 14.31 0.27 4 1000 13.69 13.72 13.70 0.56 8 1000 14.03 14.07 14.05 1.08 16 1000 14.20 14.22 14.21 2.15 32 1000 15.17 15.22 15.20 4.01 64 1000 15.09 15.10 15.10 8.08 128 1000 17.98 18.02 18.00 13.55 256 1000 20.65 20.70 20.68 23.59 512 1000 26.64 26.72 26.69 36.55 1024 1000 35.88 35.97 35.92 54.30 2048 1000 56.24 56.40 56.32 69.26 4096 1000 137.73 137.77 137.75 56.71 8192 1000 146.22 146.26 146.24 106.83 16384 1000 162.60 162.71 162.67 192.06 32768 1000 276.37 276.46 276.43 226.07 65536 640 409.15 409.52 409.32 305.24 131072 320 600.41 601.50 600.91 415.63 262144 160 1470.24 1470.55 1470.40 340.01 524288 80 2449.69 2449.81 2449.75 408.19 1048576 40 4504.12 4557.88 4543.65 438.80 2097152 20 8583.20 8798.05 8742.06 454.65 4194304 10 17032.12 17245.01 17187.83 463.90 # All processes entering MPI_Finalize ---- # Exchange #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 2 # ( 2 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 16.34 16.34 16.34 0.00 1 1000 15.32 15.32 15.32 0.25 2 1000 16.00 16.02 16.01 0.48 4 1000 15.26 15.28 15.27 1.00 8 1000 16.24 16.24 16.24 1.88 16 1000 15.76 15.76 15.76 3.87 32 1000 16.99 17.02 17.00 7.17 64 1000 16.97 16.99 16.98 14.37 128 1000 20.32 20.33 20.33 24.02 256 1000 22.57 22.58 22.57 43.25 512 1000 28.52 28.55 28.54 68.40 1024 1000 38.23 38.25 38.24 102.13 2048 1000 60.81 60.85 60.83 128.39 4096 1000 250.88 250.90 250.89 62.28 8192 1000 267.87 267.87 267.87 116.66 16384 1000 289.17 289.18 289.18 216.13 32768 1000 492.19 492.20 492.19 253.96 65536 640 638.92 638.93 638.92 391.28 131072 320 936.69 936.71 936.70 533.78 262144 160 1545.04 1545.13 1545.08 647.20 524288 80 2596.69 2596.80 2596.74 770.18 1048576 40 4702.63 4702.85 4702.74 850.55 2097152 20 8945.35 8946.30 8945.82 894.22 4194304 10 17617.20 17618.39 17617.80 908.14 #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 4 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 14.87 14.88 14.87 0.00 1 1000 16.98 16.99 16.98 0.22 2 1000 14.98 15.00 14.99 0.51 4 1000 15.24 15.25 15.25 1.00 8 1000 15.27 15.29 15.28 2.00 16 1000 17.22 17.24 17.23 3.54 32 1000 16.04 16.06 16.05 7.60 64 1000 17.08 17.09 17.09 14.28 128 1000 20.87 20.90 20.89 23.36 256 1000 26.96 27.00 26.98 36.17 512 1000 28.14 28.17 28.15 69.34 1024 1000 44.22 44.27 44.25 88.23 2048 1000 60.92 60.97 60.94 128.15 4096 1000 291.34 291.45 291.40 53.61 8192 1000 281.90 281.94 281.91 110.84 16384 1000 334.74 334.84 334.78 186.66 32768 1000 569.37 569.58 569.49 219.46 65536 640 814.59 814.91 814.68 306.78 131072 320 1297.89 1299.04 1298.46 384.90 262144 160 2915.94 2922.26 2920.04 342.20 524288 80 5002.27 5016.69 5009.51 398.67 1048576 40 9265.12 9313.92 9289.56 429.46 2097152 20 18330.41 18523.85 18427.12 431.88 4194304 10 35378.00 36082.60 35730.58 443.43 ---- # PingPong #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 # ( 2 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 12.89 0.00 1 1000 13.10 0.07 2 1000 13.06 0.15 4 1000 13.08 0.29 8 1000 13.18 0.58 16 1000 13.41 1.14 32 1000 13.83 2.21 64 1000 14.53 4.20 128 1000 17.00 7.18 256 1000 19.73 12.37 512 1000 25.04 19.50 1024 1000 35.25 27.70 2048 1000 54.85 35.61 4096 1000 131.15 29.78 8192 1000 140.83 55.47 16384 1000 152.95 102.16 32768 1000 252.08 123.97 65536 640 327.77 190.68 131072 320 475.06 263.12 262144 160 769.79 324.76 524288 80 1301.54 384.16 1048576 40 2354.76 424.67 2097152 20 4473.95 447.03 4194304 10 8727.20 458.34 ---- # Barrier #--------------------------------------------------- # Benchmarking Barrier # #processes = 2 # ( 2 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 14.23 14.24 14.24 #--------------------------------------------------- # Benchmarking Barrier # #processes = 4 #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 26.22 26.23 26.23 ---- # Allgather #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 2 # ( 2 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 15.17 15.17 15.17 2 1000 14.34 14.34 14.34 4 1000 14.22 14.22 14.22 8 1000 14.30 14.30 14.30 16 1000 14.59 14.59 14.59 32 1000 15.07 15.07 15.07 64 1000 15.77 15.78 15.78 128 1000 17.85 17.85 17.85 256 1000 20.75 20.78 20.76 512 1000 25.93 25.96 25.95 1024 1000 36.01 36.05 36.03 2048 1000 56.28 56.32 56.30 4096 1000 134.17 134.19 134.18 8192 1000 144.00 144.02 144.01 16384 1000 158.85 158.86 158.85 32768 1000 262.29 262.32 262.30 65536 640 336.30 336.31 336.31 131072 320 491.06 491.10 491.08 262144 160 869.71 869.72 869.71 524288 80 1504.08 1504.20 1504.14 1048576 40 3322.63 3322.73 3322.68 2097152 20 8086.30 8086.50 8086.40 4194304 10 17404.29 17404.29 17404.29 #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 4 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 27.08 27.09 27.08 2 1000 28.05 28.06 28.06 4 1000 27.72 27.75 27.74 8 1000 26.80 26.82 26.81 16 1000 28.25 28.27 28.26 32 1000 30.26 30.27 30.27 64 1000 33.26 33.29 33.28 128 1000 37.50 37.51 37.51 256 1000 46.38 46.41 46.39 512 1000 61.92 61.98 61.95 1024 1000 91.41 91.45 91.43 2048 1000 188.71 188.75 188.73 4096 1000 316.29 316.35 316.32 8192 1000 425.82 425.98 425.90 16384 1000 666.94 667.23 667.09 32768 1000 1019.19 1019.59 1019.40 65536 640 1452.81 1453.70 1453.24 131072 320 2148.04 2148.46 2148.23 262144 160 3162.96 3163.08 3163.00 524288 80 6284.14 6284.39 6284.26 1048576 40 12090.43 12097.53 12094.01 2097152 20 25040.75 25041.10 25040.91 4194304 10 43153.29 43158.01 43155.04 =============