The OpenMP offloading with the Clang compiler
Let's get into our build directory:
cd builddir
We'll use a little symlink-powered hack to ease Clang's job of finding the OpenMP library:
cd lib
ln -s ../runtimes/runtimes-bins/openmp/runtime/src/libomp.so
cd ..
Compiling basic OpenMP programs
We'll use the examples from the OpenMP Application Programming Interface Examples book, Version 5.2.2 (April 2024). Since copying code from PDF tends to be a suboptimal experience, we'll use the examples from GitHub repository, specifically devices/sources/target.1.c
:
/*
* @@name: target.1
* @@type: C
* @@operation: compile
* @@expect: success
* @@version: omp_4.0
*/
extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
init(v1, v2, N);
#pragma omp target
#pragma omp parallel for private(i)
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
output(p, N);
}
While Clang doesn't support all possible OpenMP 5 features, it supports enough to be able to compile this example.
Naively trying to compile this code will result in linking errors due to missing symbols:
./bin/clang target.1.c
/usr/bin/ld: /lib/x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x17): undefined reference to `main'
/usr/bin/ld: /tmp/target-7a0854.o: in function `vec_mult':
target.1.c:(.text+0x76): undefined reference to `init'
/usr/bin/ld: target.1.c:(.text+0xc3): undefined reference to `output'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Since we want to inspect the object code (and the generated assembly inside it) and don't care about linking at this point, we will use the -c
flag:
./bin/clang -c target.1.c
The symbols in the generated target.1.o
file can be inspected using llvm-objdump with the --syms
parameter:
./bin/llvm-objdump --syms target.1.o
target.1.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 target.1.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 g F .text 00000000000000d3 vec_mult
0000000000000000 *UND* 0000000000000000 init
0000000000000000 *UND* 0000000000000000 output
There is a notable lack of __omp_
- and __kmpc_
-prefixed symbols, which would be expected in OpenMP-enabled build. This is because OpenMP support is not enabled automatically, but has to be done via a -openmp
parameter (documentation):
./bin/clang -c target.1.c -fopenmp
./bin/llvm-objdump --syms target.1.o
target.1.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 target.1.c
0000000000000000 l d .text 0000000000000000 .text
00000000000000e0 l F .text 0000000000000075 __omp_offloading_809_d4b5d_vec_mult_l15
0000000000000160 l F .text 0000000000000153 __omp_offloading_809_d4b5d_vec_mult_l15.omp_outlined
0000000000000000 l d .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 l d .data.rel.ro 0000000000000000 .data.rel.ro
0000000000000000 g F .text 00000000000000d1 vec_mult
0000000000000000 *UND* 0000000000000000 init
0000000000000000 *UND* 0000000000000000 output
0000000000000000 *UND* 0000000000000000 __kmpc_fork_call
0000000000000000 *UND* 0000000000000000 __kmpc_for_static_init_4
0000000000000000 *UND* 0000000000000000 __kmpc_for_static_fini
Compiling OpenMP programs with offloading
While offloading symbols are present, no images for a target architectures are present:
./bin/llvm-objdump --offloading target.1.o
target.1.o: file format elf64-x86-64
To enable generation of images, offloading requires specification of target architecture with -fopenmp-targets
and --offload-arch
parameters (documentation):
./bin/clang -c target.1.c -fopenmp -fopenmp-targets=amdgcn-amd-hsa --offload-arch=gfx942 -nogpulib
GFX942, which we requested with the gfx942
value for --offload-arch
parameter, is the architecture for AMD Instinct MI300A. Additionally, observe the presence of -nogpulib
parameter, which is helpful to avoid the requirement for AMD ROCm installation on the system.
After printing the list of offloading images, we can see that the one for GFX942 is present:
./bin/llvm-objdump --offloading target.1.o
target.1.o: file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind llvm ir
arch gfx942
triple amdgcn-amd-amdhsa
producer openmp
It is possible to generate code for multiple offload architectures, which results in multiple images. Let's, as an example, use Kaveri (kaveri
or gfx700
value for --offload-arch
parameter), which happened to be the first APU to support unified memory:
./bin/clang -c target.1.c -fopenmp -fopenmp-targets=amdgcn-amd-hsa --offload-arch=gfx942,kaveri -nogpulib
./bin/llvm-objdump --offloading target.1.o
target.1.o: file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind llvm ir
arch gfx700
triple amdgcn-amd-amdhsa
producer openmp
OFFLOADING IMAGE [1]:
kind llvm ir
arch gfx942
triple amdgcn-amd-amdhsa
producer openmp
Assignment
Find another OpenMP target offloading example and compile it for Radeon VII GPU and Instinct MI250X Accelerator. Refer to the User Guide for AMDGPU Backend to figure out the architecture names.
Tip
The related question in the LLVM/OpenMP FAQ explains this in more details and covers non-AMD architectures.
Target architecture feature specification for OpenMP offloading
Enabling a feature, such as HSA XNACK, can be specified after the architecture, separated by a colon:
./bin/clang -c target.1.c -fopenmp -fopenmp-targets=amdgcn-amd-hsa --offload-arch=gfx942:xnack+ -nogpulib
vmiletic@atlas:~/workspace/llvm-project/builddir$ ./bin/llvm-objdump --offloading target.1.o
target.1.o: file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind llvm ir
arch gfx942:xnack+
triple amdgcn-amd-amdhsa
producer openmp
Using -
instead of +
would, of course, disable a feature. Notable limitation is that, if any target architecture specifies a feature, it has to be specified (as enabled or disabled) in all target architectures.
Assignment
Learn about other supported features of GFX942 in the User Guide for AMDGPU Backend and check if you can enable them using the above approach.
Author: Vedran Miletić