NCSA: National Center for Supercomputing Applications
AllianceNCSAUser_InfoAccess
University of Illinois at Urbana-Champaign
Leading Edge Site

Benchmark of COMMAS 3.9 on SGI CRAY Origin2000 at NCSA

The COllaborative Model for Multiscale Atmospheric Simulation (COMMAS) is a non-hydrostatic, three-dimensional simulation model which is capable of representing atmospheric motions ranging from several tens of meters to several hundred kilometers. The model uses the Reynolds-Averaged Navier-Stokes equations, and several state-of-the-art finite difference methods are used to generate the numerical solution.

The time integration scheme is second-order Runge-Kutta for the velocity and pressure equations, and forward-in-time for the scalar fields (such as temperature and moisture). The model uses a split integration approach to increase computational efficiency when both slow and fast modes are present in the system, i.e., sound waves are integrated using a small time step, while slower mode processes such as advection and mixing are integrated using a more economical larger time step.

The spatial advection scheme for momemtum and pressure is third order upwind. Scalar variables are transported using a second-order fully three-dimensional eulerian monotonic advection scheme.

There can be up to 7 scalar variables, depending on the representation of the microphysical species. Both liquid and ice microphysical variables can be included in the model solution.

Origin 2000 Hardware information:

FPU: MIPS R10010 Floating Point Chip Revision: 0.0
CPU: MIPS R10000 Processor Chip Revision: 2.6
32 195 MHZ IP27 Processors
Main memory size: 8192 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 4 Mbytes

Operating system:

IRIX64 6.4

Compiler version :

MIPSpro 7.1 compiler

Compiler flags :

-Ofast=IP27 -mips4 -64 -pfa -keep -WK,-mc=500,-p=0,-o=5,-so=3,-ro=3 -OPT:roundoff=3 -OPT:IEEE_arithmetic=3 -OPT:fold_arith_limit=1601 -static

Loader flags :

-mips4 -64 -pfa -IPA

Libraries linked:

fastm

Experimental design:

The purpose of this benchmark is to see the performance difference between two coding styles under different domain sizes.

3-D arrays are defined as either (nx,ny,nz)/ijk or (nz,ny,nx)/kji style. All runs were made in dedicated mode. Time was obtained from the fortran call "etime" which measures the sum of elapsed user and system time since the last etime call.

Compiler flags were used to automatically parallelize the code.

Model parameters : time step: 6.0 s, # of small time steps: 4, # of large time steps: 50

  • nx=125,ny=125,nz=35
  • nx=250,ny=250,nz=70
  • nx=480,ny=480,nz=70
  • nx=500,ny=500,nz=70


    IJK Version (nx=125,ny=125,nz=35)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 962.08600 1.0000 943.44537 1.0000
    2 519.78723 1.8509 506.98990 1.8609
    4 298.65497 3.2214 289.60947 3.2576
    8 192.94649 4.9863 185.50102 5.0859
    16 128.81763 7.4686 123.13338 7.6620
    32 177.89604 5.4081 169.95786 5.5511

    KJI Version (nx=125,ny=125,nz=35)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 1009.45972 1.0000 1018.06079 1.0000
    2 538.53094 1.8745 542.05200 1.8782
    4 269.71603 3.7427 269.42114 3.7787
    8 143.06299 7.0561 141.38454 7.2006
    16 81.2208612.4286 78.86033 12.9097
    32 59.14380 17.0679 56.31403 18.0783


    IJK Version (nx=250,ny=250,nz=70)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 2624.50635 1.0000 2552.86133 1.0000
    2 1425.517941 1.8411 1374.04224 1.8579
    4 834.33459 3.1456 793.83734 3.2158
    8 482.77496 5.4363 453.13104 5.6338
    16 338.14569 7.7615 313.10388 8.1534
    32 277.27332 9.4654 253.26633 10.0798

    KJI Version (nx=250,ny=250,nz=70)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 2635.51025 1.0000 2638.30884 1.0000
    2 1425.13977 1.8493 1420.25928 1.8576
    4 733.49231 3.5931 724.62524 3.6409
    8 408.04532 6.4589 396.43494 6.6551
    16 220.17799 11.9699 205.93039 12.8116
    32 141.09566 18.6789 123.79723 21.3115


    IJK Version (nx=480,ny=480,nz=70)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 7324.24805 1.0000 7097.02930 1.0000
    2 4034.58472 1.8154 3875.48022 1.8313
    4 2168.64941 3.3773 2044.30798 3.4716
    8 1245.31323 5.8815 1150.43250 6.1690
    16 841.72296 8.7015 746.78992 9.5034
    32 645.82166 11.3410 566.58411 12.5260

    KJI Version (nx=480,ny=480,nz=70)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 7422.19287 1.0000 7406.74805 1.0000
    2 3936.79468 1.8853 3899.90625 1.8992
    4 2076.00610 3.5752 2028.36841 3.6516
    8 1235.76123 6.0062 1180.80627 6.2726
    16 629.10284 11.7981 565.89575 13.0885
    32 423.07086 17.5436 353.76389 20.9370


    IJK Version (nx=500,ny=500,nz=70)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 8190.19873 1.0000 7950.73193 1.0000
    2 4344.58740 1.8851 4174.16455 1.9047
    4 2512.65356 3.2596 2379.09399 3.3419
    8 1458.82080 5.6143 1328.69141 5.9839
    16 879.32568 9.3142 782.48157 10.1609
    32 685.76770 11.9431 599.96045 13.2521

    KJI Version (nx=500,ny=500,nz=70)
    threads total cpu time (sec) total speedup solver cpu time solver speedup
    1 8092.06250 1.0000 8081.73486 1.0000
    2 4431.94580 1.8258 4394.77490 1.8389
    4 2270.64404 3.5638 2220.60474 3.6394
    8 1321.80176 6.1220 1265.08594 6.3883
    16 673.09802 12.0221 613.44598 13.1743
    31 412.3430 19.6246 351.9050 22.9656
    32 406.72824 19.8955 347.37543 23.2651
    48 319.05573 25.3625 256.69330 31.4840
    62 283.65887 28.5274 217.04993 37.2344
    63 280.34961 28.8642 212.94124 37.9529


  • Back to NCSA Atmospheric Sciences Group home page


    [Alliance] Alliance NCSA UIUC [NCSA]