Compilation
Compilation
You will notice major speed improvements if you compile your code in your project folder (inside /projects/
) instead of your home folder. This is due to differences in disk reading/writing speeds.
Deucalion has a heterogeneous architecture with nodes featuring both x86 and arm architectures.
x86 Compilation. Examples. Specific examples for GPU are here
ARM Environment
Available compilers (Native)
The login nodes use x86 microprocessors. To compile a program for the arm architecture the fastest way is to take a node from the queue via salloc:
salloc -N1 -p dev-arm -t 4:00:00
The messages will tell you which node you took so you can ssh that node via
ssh cna[xxxx]
If you forgot what node you allocated, you can view that information via
squeue --me
The name of the job should be "interact".
The Native compilers are only used on the compute nodes.
The dev-arm
partition should only be used for compilation tasks.
Non-MPI
Creator | Language | Compile command |
---|---|---|
Fujitsu | Fortran | frt |
Fujitsu | C | fcc |
Fujitsu | C++ | FCC |
GNU | Fortran | gfortran |
GNU | C | gcc |
GNU | C++ | g++ |
MPI
Kind | Language | Compile command |
---|---|---|
Fujitsu | Fortran | mpifrt |
Fujitsu | C | mpifcc |
Fujitsu | C++ | mpiFCC |
GNU | Fortran | mpifort |
GNU | C | mpicc |
GNU | C++ | mpiCC |
To access all Fujitsu compilers:
ml FJSVstclanga
(run in arm compute node)
To access all GNU compilers:
ml OpenMPI
(run in arm compute node)
For recommended options for GNU compilers check here For recommended options for FUJITSU compilers check here
Available compilers (Cross)
To avoid compilation through the compute nodes, FUJITSU created a cross-compiler (i.e. runs on the x86 architecture but produces machine code for ARM processors) Because cross compilers run on x86 instructions, they may only be used on the login nodes.
Non-MPI
Creator | Language | Compile command |
---|---|---|
Fujitsu | Fortran | frtpx |
Fujitsu | C | fccpx |
Fujitsu | C++ | FCCpx |
MPI
Creator | Language | Compile command |
---|---|---|
Fujitsu | Fortran | mpifrtpx |
Fujitsu | C | mpifccpx |
Fujitsu | C++ | mpiFCCpx |
To load all FUJITSU compilers:
ml FJSVstclanga
(run in login node)
Compile Information
- Fortran compiler
- Creates object programs from Fortran source programs.
- C/C++ compiler
- Creates object programs from C/C++ source programs.
- C/C++ compiler has 2 modes with different user interfaces, the mode to be used is specified by the option of the compile command.
- The default mode is Trad Mode.
Mode | Description |
---|---|
Trad Mode (default) | This mode uses an enhanced compiler based on compilers for K computer and PRIMEHPC FX100 or earlier system. This mode is suitable for maintaining compatibility with the past Fujitsu compiler. [C] Supported specifications are C89/C99/C11 and OpenMP 3.1/Part of OpenMP 4.5. [C++] Supported specifications are C++03/C++11/C++14/Part of C++17 and OpenMP 3.1/Part of OpenMP 4.5. |
Clang Mode (-Nclang) | This mode uses an enhanced compiler based on Clang/LLVM. This mode is suitable for compiling programs using the latest language specification and open source software. [C] Supported specifications are C89/C99/C11 and OpenMP 4.5/Part of OpenMP 5.0. [C++] Supported specifications are C++03/C++11/C++14/C++17 and OpenMP 4.5/Part of OpenMP 5.0. |
Recommended Options for Fujitsu Compilers
Language/Mode | Focus | Recommended options | Induced options |
---|---|---|---|
Fortran | Performance | Kfast,openmp[,parallel] | ‑O3 ‑Keval,fp_contract,fp_relaxed,fz,ilfunc,mfunc omitfp,simd_packed_promotion |
Fortran | Precision | Kfast,openmp[,parallel],fp_precision | ‑Knoeval,nofp_contract,nofp_relaxed,nofz,noilfunc,nomfunc,parallel_fp_precision |
C/C++ Trad Mode | Performance | Kfast,openmp[,parallel] | ‑O3 ‑Keval,fast_matmul,fp_contract,fp_relaxed,fz,ilfunc,mfunc,omitfp,simd_packed_promotion |
C/C++ Trad Mode | Precision | Kfast,openmp[,parallel],fp_precision | ‑Knoeval,nofast_matmul,nofp_contract,nofp_relaxed,nofz,noilfunc,nomfunc,parallel_fp_precision |
C/C++ Clang Mode | Nclang -Ofast | -O3 -ffj-fast-matmul -ffast-math -ffp-contract=fast -ffj-fp-relaxed -ffj-ilfunc -fbuiltin -fomitframe-pointer -finline-functions |
Recommended options for GNU compilers
Language/Mode | Compiler | Recommended options |
---|---|---|
C/C++/Fortran | g[cc,CC,fortran], mpi[cc,CC,fort] | -O2 -ftree-vectorize -march=native -fno-math-errno |
Compile Examples ARM
Fortran
- Sequential program
Fujitsu Cross-compilation: (ln0x)$ frtpx -Kfast sample.f #for cross-compilation using login nodes (FUJITSU)
Fujitsu Native: (cnaxxxx)$ frt -Kfast sample.f #for compilation inside compute node (FUJITSU)
GNU Native: (cnaxxxx)$ gfortran -O2 -ftree-vectorize -march=native -fno-math-errno sample.f #for compilation inside compute node (GNU)
- Thread-parallel program (using automatic parallelization)
Fujitsu Cross-compilation: (ln0x)$ frtpx -Kfast,parallel sample.f #for cross-compilation using login nodes
Fujitsu Native: (cnaxxxx)$ frt -Kfast,parallel sample.f #for compilation inside compute node
GNU Native: (cnaxxxx)$ gfortran -O2 -ftree-parallelize-loops=n -ftree-vectorize -march=native -fno-math-errno sample.f #for compilation inside compute node (GNU) with n threads
- Thread-parallel program (Open MP)
(ln0x)$ frtpx -Kfast,openmp sample.f
(cnaxxxx)$ frt -Kfast,openmp sample.f
(cnaxxxx)$ gfortran -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.f
- Thread-parallel program (Open MP+ auto-parallel)
(ln0x)$ frtpx -Kfast,openmp,parallel sample.f
(cnaxxxx)$ frt -Kfast,openmp,parallel sample.f
- MPI program
(ln0x)$ mpifrtpx -Kfast sample.f
(cnaxxxx)$ mpifrt -Kfast sample.f
(cnaxxxx)$ mpifort -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
- Hybrid program (Open MP+ auto-parallel + MPI)
(ln0x)$ mpifrtpx -Kfast,openmp,parallel sample.f
(cnaxxxx)$ mpifrt -Kfast,openmp,parallel sample.f
(cnaxxxx)$ mpifort -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.f
C
- Sequential program
(ln0x)$ fccpx -Kfast sample.c #for cross-compilation using login nodes (FUJITSU)
(cnaxxxx)$ fcc -Kfast sample.c #for compilation inside compute node (FUJITSU)
(cnaxxxx)$ gcc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c #for compilation inside compute node (GNU)
- Thread-parallel program (using automatic parallelization)
(ln0x)$ fccpx -Kfast,parallel sample.c
(cnaxxxx)$ fcc -Kfast,parallel sample.c
(cnaxxxx)$ gcc -O2 -ftree-parallelize-loops=n -ftree-vectorize -march=native -fno-math-errno sample.c #for compilation inside compute node (GNU) with n threads
- Thread-parallel program (Open MP)
(ln0x)$ fccpx -Kfast,openmp sample.c
(cnaxxxx)$ fcc -Kfast,openmp sample.c
(cnaxxxx)$ gcc -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.c
- Thread-parallel program (Open MP+ auto-parallel)
(ln0x)$ fccpx -Kfast,openmp,parallel sample.c
(cnaxxxx)$ fcc -Kfast,openmp,parallel sample.c
- MPI program
(ln0x)$ mpifccpx -Kfast sample.c
(cnaxxxx)$ mpifcc -Kfast sample.c
(cnaxxxx)$ mpicc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
- Hybrid program (Open MP+ auto-parallel + MPI)
(ln0x)$ mpifccpx -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpifcc -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpicc -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.c
C++
- Sequential program
(ln0x)$ FCCpx -Kfast sample.c
(cnaxxxx)$ FCC -Kfast sample.c
(cnaxxxx)$ g++ -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
- Thread-parallel program (using automatic parallelization)
(ln0x)$ FCCpx -Kfast,parallel sample.c
- Thread-parallel program (Open MP)
(ln0x)$ FCCpx -Kfast,openmp sample.c
- Thread-parallel program (Open MP+ auto-parallel)
(ln0x)$ FCCpx -Kfast,openmp,parallel sample.c
- MPI program
(ln0x)$ mpiFCCpx -Kfast sample.c
(cnaxxxx)$ mpiFCC -Kfast sample.c
(cnaxxxx)$ mpiCC -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
- Hybrid program (Open MP+ auto-parallel + MPI)
(ln0x)$ mpiFCCpx -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpiFCC -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpiCC -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.c
x86 Environment
Basic Information
Compilers CPU | GCC 12.3.0, Intel oneAPI HPC Toolkit 2023.1.0 |
Compilers GPU | CUDA 11.8: GCC 11.3.0, NVIDIA HPC SDK 22.9 |
MPI Interconnect CPU | InfiniBand (HDR100) |
MPI Interconnect GPU | InfiniBand (HDR200) |
Job Scheduler Name | Slurm |
Job Scheduler Version | 23.11.4 |
Available compilers
The Native x86 compilers are used on both the login nodes and compute nodes. You may compile directly in the login nodes.
Non-MPI
Creator | Language | Compile command |
---|---|---|
GNU | Fortran | gfortran |
GNU | C | gcc |
GNU | C++ | g++ |
Intel | Fortran | ifort |
Intel | C | icc |
Intel | C++ | icpc |
To load GNU compilers:
ml GCCcore/11.3.0
To load INTEL compilers:
ml intel
MPI
Creator | Language | Compile command |
---|---|---|
GNU | Fortran | mpifort |
GNU | C | mpicc |
GNU | C++ | mpic++ |
Intel | Fortran | mpiifort |
Intel | C | mpiicc |
Intel | C++ | mpiicpc |
To load GNU compilers:
ml GCCcore/11.3.0
To load INTEL compilers:
ml intel
Compile Information
- Fortran compiler
- Creates object programs from Fortran source programs.
- C/C++ compiler
- Creates object programs from C/C++ source programs.
Compiler | Description |
---|---|
GCC | https://gcc.gnu.org/gcc-12 |
Intel oneAPI HPC Toolkit | https://www.intel.com/content/www/us/en/developer/tools/oneapi/hpc-toolkit.html |
CPU: Recommended Options
Language/Mode | Focus | Recommended options |
---|---|---|
C/C++/Fortran | GCC | -O2 -ftree-vectorize -march=native -fno-math-errno |
C/C++/Fortran | Intel oneAPI | -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise |
Compile Examples x86
Fortran
- Sequential program
Intel oneAPI: (ln0x)$ ifort -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ gfortran -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
- Thread-parallel program (Open MP)
Intel oneAPI: (ln0x)$ ifort -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ gfortran -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
- MPI program
Intel oneAPI: (ln0x)$ mpiifort -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ mpif90 -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
- Hybrid program (Open MP + MPI)
Intel oneAPI: (ln0x)$ mpiifort -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ mpif90 -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
C
- Sequential program
Intel oneAPI: (ln0x)$ icx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ gcc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
- Thread-parallel program (Open MP)
Intel oneAPI: (ln0x)$ icx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ gcc -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
- MPI program
Intel oneAPI: (ln0x)$ mpiicx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ mpicc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
- Hybrid program (Open MP+ MPI)
Intel oneAPI: (ln0x)$ mpiicx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ mpicc -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
C++
- Sequential program
Intel oneAPI: (ln0x)$ icpx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ g++ -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
- Thread-parallel program (Open MP)
Intel oneAPI: (ln0x)$ icpx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ g++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
- MPI program
Intel oneAPI: (ln0x)$ mpiicpx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ mpic++ -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
- Hybrid program (Open MP + MPI)
Intel oneAPI: (ln0x)$ mpiicpx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ mpic++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
GPU: Recommended Options
- NVIDIA recommendations:
- https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html
- Resources on:
- https://developer.nvidia.com/hpc-sdk
Compiler | Description |
---|---|
CUDA/GCC | https://docs.nvidia.com/cuda/archive/11.8.0/index.html |
NVIDIA HPC SDK | https://docs.nvidia.com/hpc-sdk/archive/22.9/index.html |
Compile Examples GPU
Fortran
- Fortran support it is only available through OpenACC directives or using CUDA Fortran
OpenACC:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvfortran -acc -gpu=cc80 -Minfo=accel -Mpreprocess -o sample_acc sample_acc.f90
CUDA Fortran:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvfortran -gpu=cc80 -Minfo=accel -Mpreprocess -o sample sample.cuf
C
- CUDA
nvcc:
(ln0x)$ ml CUDA/11.8.0 GCC/11.3.0
(ln0x)$ nvcc --generate-code arch=compute_80,code=sm_80 -o sample sample.c
- OpenACC
nvhpc:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvc -acc -gpu=cc80 -Minfo=accel -Mpreprocess -o sample_acc sample_acc.c
C++
- CUDA
nvcc:
(ln0x)$ ml CUDA/11.8.0 GCC/11.3.0
(ln0x)$ nvcc --generate-code arch=compute_80,code=sm_80 -o sample sample.cpp
- OpenACC
nvhpc:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvc++ -acc -gpu=cc80 -Minfo=accel -Mpreprocess -o sample_acc sample_acc.cpp