UK-MAC/CloverLeaf_OpenMP

语言: Fortran

git: https://github.com/UK-MAC/CloverLeaf_OpenMP

使用OpenMP pragma的CloverLeaf版本
A version of CloverLeaf using OpenMP pragmas
README.md (中文)

CloverLeaf_OpenMP

这是仅限OpenMP版本的CloverLeaf 1.3版。已删除所有MPI函数调用。

发行说明

版本1.3

CloverLeaf 1.3包含了许多优于以前版本的优化。 这些包括许多循环融合优化以及在工作数组上使用标量变量。 总的来说,这提高了缓存效

此版本还包含对显式平铺的一些支持。 这是通过两个输入卡座参数激活的:

  • tiles_per_chunk指定每个MPI排名有多少个图块。
  • tiles_per_problem要指定有多少全局切片,这会向下舍入为每个MPI等级的偶数。

对于编译,我们现在使用原生的Fortran和C编译器,而不是像以前那样使用MPI包装器。 但是,使用它们可以获得具体的编译:

make COMPILER = GNU MPI_COMPILER = gfortran C_MPI_COMPILER = gcc

性能

预期的表现如下。

如果您没有看到这种性能,或者您看到了可变性,那么建议您检查MPI任务放置和OpenMP线程关联性,因为这些是必不可少的,这些是固定的并放置在最佳状态以获得最佳性能。

请注意,性能可能取决于编译器(品牌和发布),内存速度,系统设置(例如turbo,大页面),系统负载等。

绩效表

Test Problem Time Time Time
Hardware E5-2670 0 @ 2.60GHz Core E5-2670 0 @ 2.60GHz Node E5-2698 v3 @ 2.30GHz Node
Options make COMPILER=INTEL make COMPILER=INTEL make COMPILER=CRAY
Options mpirun -np 1 mpirun -np 16 aprun -n4 -N4 -d8
2 20.0 2.5 0.9
3 960.0 100.0
4 460.0 40.0 23.44
5 13000.0 1700.0

弱缩放 - 测试4

Node Count Time
1 40.0
2
4
8
16

强缩放 - 测试5

Node Count Time Speed Up
1 1700 1.0
2
4
8
16

本文使用googletrans自动翻译,仅供参考, 原文来自github.com

en_README.md

CloverLeaf_OpenMP

This is the OpenMP only version of CloverLeaf version 1.3. All MPI function calls have been removed.

Release Notes

Version 1.3

CloverLeaf 1.3 contains a number of optimisations over previous releases.
These include a number of loop fusion optimisations and the use of scalar variables over work arrays.
Overall this improves cache efficiency.

This version also contains some support for explicit tiling.
This is activated through the two input deck parameters:

  • tiles_per_chunk To specify how many tiles per MPI ranks there are.
  • tiles_per_problem To specify how many global tiles there are, this is rounded down to be an even number per MPI rank.

For compilation we now use the native Fortran and C compilers, not the MPI wrappers as before.
However specific compilation can be obtained with their use:

make COMPILER=GNU MPI_COMPILER=gfortran C_MPI_COMPILER=gcc

Performance

Expected performance is give below.

If you do not see this performance, or you see variability, then is it recommended that you check MPI task placement and OpenMP thread affinities, because it is essential these are pinned and placed optimally to obtain best performance.

Note that performance can depend on compiler (brand and release), memory speed, system settings (e.g. turbo, huge pages), system load etc.

Performance Table

Test Problem Time Time Time
Hardware E5-2670 0 @ 2.60GHz Core E5-2670 0 @ 2.60GHz Node E5-2698 v3 @ 2.30GHz Node
Options make COMPILER=INTEL make COMPILER=INTEL make COMPILER=CRAY
Options mpirun -np 1 mpirun -np 16 aprun -n4 -N4 -d8
2 20.0 2.5 0.9
3 960.0 100.0
4 460.0 40.0 23.44
5 13000.0 1700.0

Weak Scaling - Test 4

Node Count Time
1 40.0
2
4
8
16

Strong Scaling - Test 5

Node Count Time Speed Up
1 1700 1.0
2
4
8
16