[MITgcm-support] Scalability on a new Sgi node
Stefano Querin
squerin at ogs.trieste.it
Tue Jun 14 08:21:35 EDT 2011
Dear MITgcmers,
I'm experiencing a problem with a new Sgi node (AMD Opteron - 24
cores) mounted on our HPC cluster (called COBRA).
The problem is that the model (checkpoint61t) does not scale fine,
especially switching from 12 to 24 processors. It even takes a bit
more time when using 24 rather than 12 cores!
Of course, this is not a MITgcm issue: I run the SAME simulation, with
the SAME configuration (namelists, I/O, ...), on another (Intel based)
HPC cluster (called PLX) and the results, also using 48 cores, are
reasonable.
I'm reporting below some details about the two clusters and the
numerical experiments that we carried out.
My questions are:
- could this be due to the old version of the compiler on the COBRA
cluster (see below)?
- could there be something wrong in the compilation/optimization flags?
- it seems that the two (12 core) CPUs are not "talking to each other"
efficiently. Could this be a hardware problem?
Thanks for any hint/suggestion,
cheers!
Stefano
************************* SGI node (COBRA) *************************
node name: cobra-0-5
Sgi H2106-G7. Servers, One quadsocket. Chipset, AMD SR5690 + SR5670 +
SP5100
CPUs: 24 x 2.05 GHz (2 Opteron 6172 with 12 cores, 2.1GHz, and 12MB L3
cache)
Memory (RAM): 15.68 GB
Disk: 1 x 600 GB SAS 15k RAID 1, 2 x 10/100/1000, 6 PCIe slots.
(http://www.sgi.com/products/servers/four_way/)
PGI compiler:
FFLAGS='-r8 -Mnodclchk -Mextend -Ktrap=fp'
FOPTIM='-tp k8-64 -pc=64 -fastsse -O3 -Msmart -Mvect=cachesize:
1048576,transform'
Operating system:
Rocks 5.4 (Maverick) x86_64
Compiler, debugger, profiler (our PGI version is almost 4 years old:
could this be the issue?):
PGDBG 6.1-2 x86-64
PGPROF 6.1-2
/share/apps/pgi/linux86-64/6.1
Job scheduler:
/opt/gridengine/bin/lx26-amd64/qsub
Mpirun:
/opt/openmpi/bin/mpirun
TESTS (same simulation, using 4, 12 and 24 cores):
4 (2x2) cobra-0-5
User time: 3318.979990234599
System time: 98.77000091597438
Wall clock time: 4033.110437154770
12 (6x2) cobra-0-5
User time: 1735.529926758260
System time: 77.22999450750649
Wall clock time: 2233.585635900497
24 (6x4) cobra-0-5
User time: 1704.439960937947
System time: 124.3899948149920
Wall clock time: 2270.645053148270
************************* control node (PLX) used for comparisons
*************************
PLX DataPlex Cluster @ CINECA - RedHat EL 5.6! ( http://www.cineca.it/en/hardware/ibm-plx2290-0
WARNING! website not up to date... )
Qlogic QDR (40Gb/s) Infiniband high-performance network
274 Compute node
2 esa-core Intel(R) Xeon(R) CPU E5645 @2.40GHz per Compute node
48 GB RAM per Compute node
2 Nvidia Tesla M2070 GPU per Compute node
8 Fat node
2 Intel(R) Xeon(R) CPU X5570 @2.93GHz per Fat node
128 GB RAM per Fat node
3352 Total cores
6 Remote Visualization Login
2 Nvidia QuadroPlex 2200 S4
PBSpro 10.1 batch scheduler
Intel compiler
FFLAGS="$FFLAGS -WB -fno-alias -assume byterecl"
FOPTIM='-O3 -xSSE4.2 -unroll=4 -axSSE4.2 -ipo -align -fno-alias -
assume byterecl'
TESTS (same simulation, using 4, 12, 24 and 48 cores):
4 (2x2)
User time: 1174.67938481271
System time: 0.000000000000000E+000
Wall clock time: 1215.21540594101
12 (6x2)
User time: 642.731264695525
System time: 0.000000000000000E+000
Wall clock time: 692.214211940765
24 (6x4)
User time: 328.649038668722
System time: 0.000000000000000E+000
Wall clock time: 360.033116102219
48 (8x6)
User time: 179.773665569723
System time: 0.000000000000000E+000
Wall clock time: 233.140372991562
More information about the MITgcm-support
mailing list