Skip to content

Compilation

Compilation

You will notice major speed improvements if you compile your code in your project folder (inside /projects/) instead of your home folder. This is due to differences in disk reading/writing speeds.

Deucalion has a heterogeneous architecture with nodes featuring both x86 and arm architectures.

ARM Compilation. Examples.

x86 Compilation. Examples. Specific examples for GPU are here

ARM Environment

Available compilers (Native)

The login nodes use x86 microprocessors. To compile a program for the arm architecture the fastest way is to take a node from the queue via salloc:

salloc -N1 -p dev-arm -t 4:00:00

The messages will tell you which node you took so you can ssh that node via

ssh cna[xxxx]

If you forgot what node you allocated, you can view that information via

squeue --me

The name of the job should be "interact".

The Native compilers are only used on the compute nodes.

The dev-arm partition should only be used for compilation tasks.

Non-MPI

Creator Language Compile command
Fujitsu Fortran frt
Fujitsu C fcc
Fujitsu C++ FCC
GNU Fortran gfortran
GNU C gcc
GNU C++ g++

MPI

Kind Language Compile command
Fujitsu Fortran mpifrt
Fujitsu C mpifcc
Fujitsu C++ mpiFCC
GNU Fortran mpifort
GNU C mpicc
GNU C++ mpiCC

To access all Fujitsu compilers:

ml FJSVstclanga (run in arm compute node)

To access all GNU compilers:

ml OpenMPI (run in arm compute node)

For recommended options for GNU compilers check here For recommended options for FUJITSU compilers check here

Available compilers (Cross)

To avoid compilation through the compute nodes, FUJITSU created a cross-compiler (i.e. runs on the x86 architecture but produces machine code for ARM processors) Because cross compilers run on x86 instructions, they may only be used on the login nodes.

Non-MPI

Creator Language Compile command
Fujitsu Fortran frtpx
Fujitsu C fccpx
Fujitsu C++ FCCpx

MPI

Creator Language Compile command
Fujitsu Fortran mpifrtpx
Fujitsu C mpifccpx
Fujitsu C++ mpiFCCpx

To load all FUJITSU compilers:

ml FJSVstclanga (run in login node)

Compile Information
  • Fortran compiler
  • Creates object programs from Fortran source programs.
  • C/C++ compiler
  • Creates object programs from C/C++ source programs.
  • C/C++ compiler has 2 modes with different user interfaces, the mode to be used is specified by the option of the compile command.
  • The default mode is Trad Mode.
Mode Description
Trad Mode (default) This mode uses an enhanced compiler based on compilers for K computer and PRIMEHPC FX100 or earlier system. This mode is suitable for maintaining compatibility with the past Fujitsu compiler. [C] Supported specifications are C89/C99/C11 and OpenMP 3.1/Part of OpenMP 4.5. [C++] Supported specifications are C++03/C++11/C++14/Part of C++17 and OpenMP 3.1/Part of OpenMP 4.5.
Clang Mode (-Nclang) This mode uses an enhanced compiler based on Clang/LLVM. This mode is suitable for compiling programs using the latest language specification and open source software. [C] Supported specifications are C89/C99/C11 and OpenMP 4.5/Part of OpenMP 5.0. [C++] Supported specifications are C++03/C++11/C++14/C++17 and OpenMP 4.5/Part of OpenMP 5.0.
Language/Mode Focus Recommended options Induced options
Fortran Performance Kfast,openmp[,parallel] ‑O3 ‑Keval,fp_contract,fp_relaxed,fz,ilfunc,mfunc omitfp,simd_packed_promotion
Fortran Precision Kfast,openmp[,parallel],fp_precision ‑Knoeval,nofp_contract,nofp_relaxed,nofz,noilfunc,nomfunc,parallel_fp_precision
C/C++ Trad Mode Performance Kfast,openmp[,parallel] ‑O3 ‑Keval,fast_matmul,fp_contract,fp_relaxed,fz,ilfunc,mfunc,omitfp,simd_packed_promotion
C/C++ Trad Mode Precision Kfast,openmp[,parallel],fp_precision ‑Knoeval,nofast_matmul,nofp_contract,nofp_relaxed,nofz,noilfunc,nomfunc,parallel_fp_precision
C/C++ Clang Mode Nclang -Ofast -O3 -ffj-fast-matmul -ffast-math -ffp-contract=fast -ffj-fp-relaxed -ffj-ilfunc -fbuiltin -fomitframe-pointer -finline-functions
Language/Mode Compiler Recommended options
C/C++/Fortran g[cc,CC,fortran], mpi[cc,CC,fort] -O2 -ftree-vectorize -march=native -fno-math-errno
Compile Examples ARM

Fortran

  • Sequential program
Fujitsu Cross-compilation: (ln0x)$ frtpx -Kfast sample.f #for cross-compilation using login nodes (FUJITSU)
Fujitsu Native: (cnaxxxx)$ frt -Kfast sample.f #for compilation inside compute node (FUJITSU)
GNU Native: (cnaxxxx)$ gfortran -O2 -ftree-vectorize -march=native -fno-math-errno sample.f #for compilation inside compute node (GNU)
  • Thread-parallel program (using automatic parallelization)
Fujitsu Cross-compilation: (ln0x)$ frtpx -Kfast,parallel sample.f #for cross-compilation using login nodes
Fujitsu Native: (cnaxxxx)$ frt -Kfast,parallel sample.f #for compilation inside compute node
GNU Native: (cnaxxxx)$ gfortran -O2 -ftree-parallelize-loops=n -ftree-vectorize -march=native -fno-math-errno sample.f #for compilation inside compute node (GNU) with n threads
  • Thread-parallel program (Open MP)
(ln0x)$ frtpx -Kfast,openmp sample.f
(cnaxxxx)$ frt -Kfast,openmp sample.f
(cnaxxxx)$ gfortran -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.f 
  • Thread-parallel program (Open MP+ auto-parallel)
(ln0x)$ frtpx -Kfast,openmp,parallel sample.f
(cnaxxxx)$ frt -Kfast,openmp,parallel sample.f
  • MPI program
(ln0x)$ mpifrtpx -Kfast sample.f
(cnaxxxx)$ mpifrt -Kfast sample.f
(cnaxxxx)$ mpifort -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
  • Hybrid program (Open MP+ auto-parallel + MPI)
(ln0x)$ mpifrtpx -Kfast,openmp,parallel sample.f
(cnaxxxx)$ mpifrt -Kfast,openmp,parallel sample.f
(cnaxxxx)$ mpifort -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.f

C

  • Sequential program
(ln0x)$ fccpx -Kfast sample.c #for cross-compilation using login nodes (FUJITSU)
(cnaxxxx)$ fcc -Kfast sample.c #for compilation inside compute node (FUJITSU)
(cnaxxxx)$ gcc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c #for compilation inside compute node (GNU)
  • Thread-parallel program (using automatic parallelization)
(ln0x)$ fccpx -Kfast,parallel sample.c
(cnaxxxx)$ fcc -Kfast,parallel sample.c
(cnaxxxx)$ gcc -O2 -ftree-parallelize-loops=n -ftree-vectorize -march=native -fno-math-errno sample.c #for compilation inside compute node (GNU) with n threads
  • Thread-parallel program (Open MP)
(ln0x)$ fccpx -Kfast,openmp sample.c
(cnaxxxx)$ fcc -Kfast,openmp sample.c
(cnaxxxx)$ gcc -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.c 
  • Thread-parallel program (Open MP+ auto-parallel)
(ln0x)$ fccpx -Kfast,openmp,parallel sample.c
(cnaxxxx)$ fcc -Kfast,openmp,parallel sample.c
  • MPI program
(ln0x)$ mpifccpx -Kfast sample.c
(cnaxxxx)$ mpifcc -Kfast sample.c
(cnaxxxx)$ mpicc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c 
  • Hybrid program (Open MP+ auto-parallel + MPI)
(ln0x)$ mpifccpx -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpifcc -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpicc -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.c 

C++

  • Sequential program
(ln0x)$ FCCpx -Kfast sample.c
(cnaxxxx)$ FCC -Kfast sample.c
(cnaxxxx)$ g++ -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
  • Thread-parallel program (using automatic parallelization)
(ln0x)$ FCCpx -Kfast,parallel sample.c
  • Thread-parallel program (Open MP)
(ln0x)$ FCCpx -Kfast,openmp sample.c
  • Thread-parallel program (Open MP+ auto-parallel)
(ln0x)$ FCCpx -Kfast,openmp,parallel sample.c
  • MPI program
(ln0x)$ mpiFCCpx -Kfast sample.c
(cnaxxxx)$ mpiFCC -Kfast sample.c
(cnaxxxx)$ mpiCC -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
  • Hybrid program (Open MP+ auto-parallel + MPI)
(ln0x)$ mpiFCCpx -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpiFCC -Kfast,openmp,parallel sample.c
(cnaxxxx)$ mpiCC -O2 -fopenmp -ftree-vectorize -march=native -fno-math-errno sample.c

x86 Environment

Basic Information

Compilers CPU GCC 12.3.0, Intel oneAPI HPC Toolkit 2023.1.0
Compilers GPU CUDA 11.8: GCC 11.3.0, NVIDIA HPC SDK 22.9
MPI Interconnect CPU InfiniBand (HDR100)
MPI Interconnect GPU InfiniBand (HDR200)
Job Scheduler Name Slurm
Job Scheduler Version 23.11.4

Available compilers

The Native x86 compilers are used on both the login nodes and compute nodes. You may compile directly in the login nodes.

Non-MPI

Creator Language Compile command
GNU Fortran gfortran
GNU C gcc
GNU C++ g++
Intel Fortran ifort
Intel C icc
Intel C++ icpc

To load GNU compilers:

ml GCCcore/11.3.0

To load INTEL compilers:

ml intel

MPI

Creator Language Compile command
GNU Fortran mpifort
GNU C mpicc
GNU C++ mpic++
Intel Fortran mpiifort
Intel C mpiicc
Intel C++ mpiicpc

To load GNU compilers:

ml GCCcore/11.3.0

To load INTEL compilers:

ml intel

Compile Information

  • Fortran compiler
  • Creates object programs from Fortran source programs.
  • C/C++ compiler
  • Creates object programs from C/C++ source programs.
Compiler Description
GCC https://gcc.gnu.org/gcc-12
Intel oneAPI HPC Toolkit https://www.intel.com/content/www/us/en/developer/tools/oneapi/hpc-toolkit.html

CPU: Recommended Options

Language/Mode Focus Recommended options
C/C++/Fortran GCC -O2 -ftree-vectorize -march=native -fno-math-errno
C/C++/Fortran Intel oneAPI -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise
Compile Examples x86

Fortran

  • Sequential program
Intel oneAPI: (ln0x)$ ifort -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ gfortran -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
  • Thread-parallel program (Open MP)
Intel oneAPI: (ln0x)$ ifort -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ gfortran -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
  • MPI program
Intel oneAPI: (ln0x)$ mpiifort -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ mpif90 -O2 -ftree-vectorize -march=native -fno-math-errno sample.f
  • Hybrid program (Open MP + MPI)
Intel oneAPI: (ln0x)$ mpiifort -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.f
GCC: (ln0x)$ mpif90 -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.f

C

  • Sequential program
Intel oneAPI: (ln0x)$ icx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ gcc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
  • Thread-parallel program (Open MP)
Intel oneAPI: (ln0x)$ icx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ gcc -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
  • MPI program
Intel oneAPI: (ln0x)$ mpiicx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ mpicc -O2 -ftree-vectorize -march=native -fno-math-errno sample.c
  • Hybrid program (Open MP+ MPI)
Intel oneAPI: (ln0x)$ mpiicx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.c
GCC: (ln0x)$ mpicc -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.c

C++

  • Sequential program
Intel oneAPI: (ln0x)$ icpx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ g++ -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
  • Thread-parallel program (Open MP)
Intel oneAPI: (ln0x)$ icpx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ g++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
  • MPI program
Intel oneAPI: (ln0x)$ mpiicpx -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ mpic++ -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp
  • Hybrid program (Open MP + MPI)
Intel oneAPI: (ln0x)$ mpiicpx -qopenmp -O2 -march=core-avx2 -ftz -fp-speculation=safe -fp-model precise sample.cpp
GCC: (ln0x)$ mpic++ -fopenmp -O2 -ftree-vectorize -march=native -fno-math-errno sample.cpp

GPU: Recommended Options

Compiler Description
CUDA/GCC https://docs.nvidia.com/cuda/archive/11.8.0/index.html
NVIDIA HPC SDK https://docs.nvidia.com/hpc-sdk/archive/22.9/index.html
Compile Examples GPU

Fortran

  • Fortran support it is only available through OpenACC directives or using CUDA Fortran
OpenACC:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvfortran -acc -gpu=cc80 -Minfo=accel -Mpreprocess -o sample_acc sample_acc.f90
CUDA Fortran:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvfortran -gpu=cc80 -Minfo=accel -Mpreprocess -o sample sample.cuf

C

  • CUDA
nvcc:
(ln0x)$ ml CUDA/11.8.0 GCC/11.3.0
(ln0x)$ nvcc --generate-code arch=compute_80,code=sm_80 -o sample sample.c
  • OpenACC
nvhpc:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvc -acc -gpu=cc80 -Minfo=accel -Mpreprocess -o sample_acc sample_acc.c

C++

  • CUDA
nvcc:
(ln0x)$ ml CUDA/11.8.0 GCC/11.3.0
(ln0x)$ nvcc --generate-code arch=compute_80,code=sm_80 -o sample sample.cpp
  • OpenACC
nvhpc:
(ln0x)$ ml NVHPC/22.9-CUDA-11.8.0
(ln0x)$ nvc++ -acc -gpu=cc80 -Minfo=accel -Mpreprocess -o sample_acc sample_acc.cpp