Developing with the LLVM compiler infrastructure

During the course we will use LLVM, which is a well-known open-source compiler and toolchain. It is distributed under the Apache License 2.0 with LLVM Exceptions. Due to its popularity, various LLVM programs and libraries are packaged for many operating systems, including Fedora, Debian GNU/Linux, Arch Linux, and FreeBSD. Therefore we could install LLVM from the operating system repository, although this would later prevent us from modifying its source code.

Setting up the integrated development environment

Furthermore we will also use the Visual Studio Code as the integrated development enviroment. However, the use of any development enviroment for C++ is acceptable, including Qt Creator, CLion, CodeLite, NetBeans, and Eclipse.

First, install the C/C++ Extension Pack, which will install C/C++ and CMake extensions. More details about these extensions can be found in the C/C++ for Visual Studio Code guide.

Installing packages required for building LLVM

Tip

The commands below assume that the Unix-like operating system is used, which includes Linux, FreeBSD, macOS, illumOS, and many others, but not Windows.

To get a Unix-like environment on Windows 10 and newer, it is recommended to use the Windows Subsystem for Linux (WSL) together with Windows Terminal and Visual Studio Code Remote - WSL extension. While almost any distribution supported by WSL will support building LLVM, we recommend using RHEL-compatible AlmaLinux OS 9 that is available in the Microsoft Store or, if you are feeling adventurous, CentOS Stream 9 that requires manual image download from one of its official mirrors.

🎩 Fedora/CentOS/RHEL🏹 Arch Linux🦎 openSUSE/SLES🍥 Debian GNU/Linux Mint

Assuming Red Hat Enterprise Linux 9, CentOS Stream 9, Fedora 34, or newer:

dnf builddep llvm

curl -O https://gitlab.archlinux.org/archlinux/packaging/packages/llvm/-/raw/main/PKGBUILD
makepg -s PKGBUILD

zypper source-install llvm19

apt build-dep llvm-toolchain-19

Building the LLVM compiler infrastructure from source

Hereafter we will more or less follow the directions of Getting started with the LLVM System from the Getting Started/Tutorials section.

It is possible to download the LLVM source code from its releases page. We'll be using the latest patch release from the latest series that is available. At the time of the start of the course, this is release 19.1.7. We'll download the source code from the LLVM 19.1.7 release on GitHub:

curl -OL https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-19.1.7.tar.gz

This is the complete source code achive for all tools and libraries. The same page also provides the binaries as well as the separate source code archives for the tools and libraries produced by the LLVM sub-projects:

LLVM core libraries,
Clang compiler and its tools,
compiler-rt runtime library,
Flang compiler,
libclc OpenCL library,
libcxx C++ standard library and its application binary interface,
libc C standard library,
LLD linker,
LLDB debugger,
MLIR language,
OpenMP library for Clang and Flang,
Polly high-level loop and data-locality optimizations infrastructure, and
test suite.

Although all of these tools are interesting in their own way, most of them will not be used here. In particular, we will be using Clang and several libraries to demonstrate the compile process of codes written in C, C++, OpenCL C, and C with OpenMP.

We'll be following Building LLVM with CMake from LLVM documentation, section User Guides. Now it's time to unpack the source code tarballs and enter the source directory.

tar xzf llvmorg-19.1.7.tar.gz
cd llvm-project-llvmorg-19.1.7

If Visual Studio Code is used for the development, this is the project directory that should be opened in it. Afterwards, the integrated terminal can be used for running the comamnds.

LLVM, Clang, and related projects use CMake for building. Most notably, it does not support building in the source tree, so it's necessary to start by creating a directory:

mkdir builddir

CMake is invoked using cmake command (documentation). The required parameters are:

-S with path to source directory,
-B with path to build directory.

There are many CMake and LLVM-related variables that can be specified at build time. We'll use only three of them, two LLVM-specific and one CMake-generic, namely:

-D CMAKE_BUILD_TYPE=Release (documentation) sets the build mode to release (instead of the default debug), which results in smaller file size of the built binaries,
-D LLVM_ENABLE_PROJECTS=clang enables building of Clang alongside LLVM,
-D LLVM_ENABLE_RUNTIMES='openmp;offload' enables building of OpenMP runtime with offloading,
-D BUILD_SHARED_LIBS=ON enables dynamic linking of libraries, which singificantly reduces memory requirements for building (though this is only recommended for use when developing LLVM, which we are).

Optionally, one might also want to specify:

-D CMAKE_CXX_COMPILER_LAUNCHER=ccache (documentation), which enables the ccache compiler cache and results in faster rebuilds,
-G Ninja, which enables the Ninja build system instead of GNU Make and results in faster builds.

cmake -S llvm -B builddir -D CMAKE_BUILD_TYPE=Release -D LLVM_ENABLE_PROJECTS=clang -D LLVM_ENABLE_RUNTIMES='openmp;offload' -D BUILD_SHARED_LIBS=ON
cmake --build builddir --parallel 2
cmake --build builddir --parallel 2 --target check

Assignment

Find out what is the latest released version of LLVM, download it instead of the one used above, and build it.

If you have many CPU cores, you can increase the number of parallel compile jobs by setting the --parallel (-j for short) parameter of the cmake command to a number larger than 2, for example the number of cores. This will make cmake-launched (Ninja or) Make make (!) the code faster, ideally several times faster.

Assignment

Find out how many CPU cores you have and check if increasing the number of jobs speeds up the build process.

Alternatively, LLVM can also be obtained from GitHub using Git. In that case, the branch release/19.x should be used. The rest of the process is pretty similar:

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout release/19.x
mkdir builddir
cmake -S llvm -B builddir -D CMAKE_BUILD_TYPE=Release -D LLVM_ENABLE_PROJECTS=clang -D LLVM_ENABLE_RUNTIMES='openmp;offload' -D BUILD_SHARED_LIBS=ON
cmake --build builddir --parallel 2
cmake --build builddir --parallel 2 --target check

Assignment

Find out what is the latest release branch of LLVM, check out that branch instead of the one used above, and build LLVM.

The overview of the LLVM architecture

While LLVM is building, let's take a look at the LLVM architecture. Chris Lattner, the main author of LLVM, wrote the LLVM chapter of The Architecture of Open Source Applications book. To follow the code described in the chapter, open the following files in the llvm-project-llvmorg-19.1.7/llvm directory:

include/llvm/Analysis/InstructionSimplify.h
lib/Analysis/InstructionSimplify.cpp
include/llvm/Pass.h
lib/Transforms/Hello/Hello.cpp
include/llvm/ADT/Triple.h
lib/Target/X86/X86InstrArithmetic.td
lib/Target/AMDGPU/AMDGPUInstrInfo.td
test/CodeGen/X86/add.ll
test/CodeGen/AMDGPU/llvm.log10.ll

Author: Vedran Miletić