Jared Frazier

OpenMP Defaults Are Not Portable: A Note on OMP_DYNAMIC

2026-05-07T00:00:00+00:00

For reproducible scaling experiments, it is good practice to explicitly set OMP_DYNAMIC=false, especially when you want to guarantee the number of threads spawned for each parallel region. In this short article, I explain why this is important and how one can sift through the GCC codebase (140k+ files and more than 5 million LOC) plus the OpenMP standard to find this information.

Controlling the exact thread count is important for reproducibility. If the runtime is allowed to dynamically adjust the number of threads, the same program may exhibit different performance characteristics across compilers, systems, or execution environments. Explicitly disabling dynamic thread adjustment helps ensure that scaling measurements and performance comparisons are consistent and interpretable.

Interestingly, this is not strictly necessary when using GCC, because GCC’s OpenMP implementation (gcc/libgomp) already defaults to OMP_DYNAMIC=false (well, sort of… more on that later). However, relying on the default behavior is not portable. Per section 6.3: OMP_DYNAMIC of the OpenMP standard:

The behavior of the program is implementation defined if the value of OMP_DYNAMIC is neither true nor false.

Moreover, per section 2.5: Internal Control Variables and section 2.5.2: Internal Control Variable Initialization, the variable dyn_var determines whether dynamic adjustment of the number of threads is enabled, and its value is implementation defined. This means compiler developers are free to choose the default behavior of their OpenMP implementation. In principle, this means that a compiler implementer may choose to enable dynamic adjustment of the number of threads by default.

Note that the internal control variable dyn_var simply reads from OMP_DYNAMIC if its defined. However, explicitly disabling it ensures more predictable and reproducible behavior across systems and compilers.

How do we know, though, that GCC defaults to OMP_DYNAMIC=false? Well, GCC does not actually “set” OMP_DYNAMIC, so that’s why I wrote “sort of” earlier. Really, GCC defines dyn_var in a struct of configuration variables that can be updated by environment variables. If the environment variable doesn’t exist, then the default value of the configuration variable is used. If you want to verify GCC’s behavior yourself, you can trace it directly through the gcc/libgomp source code:

$ git clone --depth 1 https://github.com/gcc-mirror/gcc.git
$ cd gcc
$ cd libgomp 
$ rg dyn_var
parallel.c 
73: if (icv->dyn_var)

env.c
78: .dyn_var = false

From ripgrep, we can see where dyn_var gets used in parallel.c#L73. We can also see from env.c#L78, the default value of dyn_var, and thus know its value for certain without having to guess or treat GCC as a black box.

// @file env.c
const struct gomp_default_icv gomp_default_icv_values = {
  .nthreads_var = 1,
  .thread_limit_var = UINT_MAX,
  .run_sched_var = GFS_DYNAMIC,
  .run_sched_chunk_size = 1,
  .default_device_var = INT_MIN,
  .max_active_levels_var = 1,
  .bind_var = omp_proc_bind_false,
  .nteams_var = 0,
  .teams_thread_limit_var = 0,
  .dyn_var = false            // <----------------- DEFAULT VALUE!!!! 
};

Overall, OMP_DYNAMIC is a small but important example of a broader issue in HPC: default behavior is not guaranteed to be portable, even when it is consistent within a specific compiler like GCC. For reproducible performance experiments, especially scaling studies, explicitly setting OpenMP configuration variables removes ambiguity and ensures that results are comparable across compilers, systems, and environments. As shown in GCC’s libgomp implementation, even seemingly “fixed” defaults are ultimately implementation details and relying on them implicitly can introduce hidden variability into performance measurements. When in doubt: always set OpenMP environment variables explicitly for reproducible performance work.

]]>

Printer Rejects USB Drive: How I Fixed It

2026-03-20T00:00:00+00:00

I recently ran into a frustrating problem: I wanted to print a PDF directly from a USB stick, but the printer kept showing “memory device is not working properly.” The drive had previously been flashed with Debian, so I assumed deleting the old files and copying my PDF would be enough. This was, of course, not the solution. In this short article, I cover what I did to fix the problem.

When I plugged the drive into my Linux laptop and called fdisk -l, the output showed a Linux partition still lingering on the drive:

Disk /dev/sda: 7.46 GiB Device     Boot   Start      End  Sectors Size Id Type
/dev/sda2       1064960 15646719 14581760   7G 83 Linux

Even after copying my PDF, the printer refused to read it. It turns out printers are picky and cannot read Linux filesystems like ext4. They expect a FAT32 (or sometimes exFAT) partition starting at the beginning of the drive. This makes sense since FAT32 is one of the oldest and widely supported file systems (see Kingston Tech: Understanding file systems and drive formatting) Changing permissions (chmod +777) on Linux didn’t help either. In retrospect, it makes sense that this didn’t work since the printer cannot understand Linux file permissions.

However, the fix was straightforward:

Wipe the old partitions and create a fresh DOS partition table.
Create a single primary FAT32 partition covering the entire drive.
Format the partition as FAT32.

On Linux, the steps are simple using fdisk and mkfs.vfat:

sudo umount /dev/sda* 
# then use: o > n > p > 1 > Enter > Enter > t > b > W95 FAT32 > w 
sudo fdisk /dev/sda  
sudo mkfs.vfat -F 32 /dev/sda1

After that, I copied my PDF to the root of the USB drive, plugged it into the printer, and it worked immediately.

This little episode was a great reminder: USB drives that were once used for Linux installations often leave behind partitions that devices like printers cannot read. It taught me to always make sure your filesystem matches the device’s expectations.

]]>

Fixing a Broken NVIDIA / GNOME Setup on Ubuntu

2026-03-06T00:00:00+00:00

Recently my Ubuntu installation ended up in a broken state after a reboot: external displays stopped working, nvidia-smi failed inside the desktop environment (DE), and the system would hang while stopping gdm during shutdown. Interestingly, the NVIDIA driver still worked in recovery mode, which suggested the kernel modules themselves were not completely broken. I decided to reset both the NVIDIA stack and the DE entirely since I have had issues in the past with GPU weirdness.

One thing that helped diagnose root issues was disabling the graphical splash screen during boot. By default Ubuntu hides most boot messages behind the splash screen, which makes it hard to see where the system is hanging.

To see the real boot output, while in the Ubuntu DE I edited the GRUB configuration:

sudo nano /etc/default/grub

In the line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

I removed quiet splash so it became:

GRUB_CMDLINE_LINUX_DEFAULT=""

Then I updated the GRUB:

sudo update-grub

Now Ubuntu displayed the full boot log instead of the splash screen. This made it obvious that the system was hanging while rebooting from the DE due to gdm, which pointed toward a joint problem with the DE and GPU stack rather than the kernel itself.

At this point, I hard reset, then powered back on. I then pressed ESC in order to reveal the GRUB boot menu. I then entered recovery mode and fixed the issue as follows:

First, I updated the system to ensure all packages and kernel components were consistent:

apt update && apt upgrade

Next, I completely removed all NVIDIA and CUDA packages:

apt --purge remove "*nvidia*" "*cuda*"

To make sure no stale desktop configuration remained, I also removed the GNOME desktop and the display manager:

apt --purge remove ubuntu-desktop gdm3

Then I cleaned up any remaining dependencies:

apt autoremove

After that, I reinstalled the desktop environment:

apt install ubuntu-desktop gdm3

Finally, I let Ubuntu reinstall the recommended GPU drivers automatically:

ubuntu-drivers install

After rebooting, the system came up normally again. NVIDIA worked inside the desktop environment, external displays were detected, and the gdm shutdown hang was gone. While perhaps not the most elegant solution, completely resetting both the GPU drivers and the desktop environment turned out to be the fastest path to a clean state.

Update (2026-03-10)

Interestingly, this solution worked only temporarily. I did not dig into system logs; however, it turns out that by default ubuntu-drivers will try and install the most recent stable driver available. In this, on my Ubuntu 24.04 system with v6.17 kernel, the automatically installed driver was nvidia-driver-590. Oddly, this is actually not the driver version that is recommended if you list the available devices for which drivers are available with

ubuntu-drivers devices

The output was

driver   : nvidia-driver-580-open - distro non-free recommended

So, in order to properly fix my problem, I did the following:

# 1. Purge all NVIDIA leftovers
sudo apt purge '*nvidia*' '*cuda*'
sudo apt autoremove

# 2. Update package lists
sudo apt update

# 3. Install NVIDIA driver 580
sudo apt install nvidia-driver-580 nvidia-settings nvidia-prime

# 4. Reboot
sudo reboot

I ended up doing this for v6.14 kernel, which I set to the default kernel loaded during boot by modifying /etc/default/grub with

GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.14.0-37-generic"

and then sudo update-grub.

I have since had no issues with my drivers or display manager :))

]]>

How-to: Perform Portable Substitution Using nvim and ripgrep

2026-03-04T00:00:00+00:00

Often I have to perform substitution of variables or code blocks with something new as part of refactoring efforts. Since I am using nvim, I wanted a way to do this without having to close nvim, but also taking advantage of highly optimized regex search utilities like ripgrep. In this short how-to, I provide the commands one can use to do this without ever having to close nvim.

Recently, I ran into a case where a target string I wanted to replace across a codebase contained lots of special characters. Special characters required painful manual escaping if I used sed.

Instead, I wanted to:

Use ripgrep to intelligently find matching files.
Load those files into Neovim’s argument list.
Perform a substitution using Neovim’s built-in %s with \V (“very nomagic”) to avoid regex escaping headaches.

The target workflow looked like this:

:argdo %s,\V<C-r>0,newstring,g

Where:

The search string is yanked into the unnamed register.
\V disables magic pattern interpretation.
argdo applies the substitution to all files in the argument list.

The question was: how do I cleanly pipe ripgrep results into :args?

Neovim allows things like:

:args **/*.py
:args +!ls src/*.py

But I wanted to use ripgrep because it’s tailored specifically for fast pattern matching across files in a git repository.

As an example, I was trying to replace the below string across many files:

"{environ['HOME']}/.cartopy_backgrounds/BlueMarble_3600x1800.png"

After some experimentation, I landed on a minimal solution that:

Requires no nvim configuration changes
Keeps everything inside nvim
Works with any external search tool

Step 1: Start Neovim

$ nvim

Step 2: Use ripgrep to find matching files

Write matching filenames to a temporary file:

:!rg -l -F "\"{environ['HOME']}/.cartopy_backgrounds/BlueMarble_3600x1800.png\"" . --glob "*.py" > /tmp/out.txt

Explanation:

-l -> list matching files only
-F -> fixed string search (no regex interpretation)
--glob "*.py" -> restrict to Python files
You still have to escape the quotation marks via \"

Step 3: Populate the argument list

:args `cat /tmp/out.txt`

Now all matching files are in the argument list.

Step 4: Perform the replacement

Assuming the search string is in the unnamed register (e.g. you yanked it with y):

:argdo %s,\V<C-r>0,"new_text",g | update

This:

Uses \V to avoid regex escaping
Replaces globally in each file
Writes only modified files (update)

This solution:

Requires no plugins
Doesn’t depend on quickfix tricks
Doesn’t require leaving Neovim
Lets you swap in find, grep, or any other tool

It’s simple, explicit, and flexible.

There are likely more elegant solutions, but this approach is portable, minimal, and easy to remember.

Sometimes the most boring solution is the best one.

]]>

Lightweight CI/CD on a Raspberry Pi 3B+: Deploying a Robust Self-Managed GitLab Runner

2026-02-12T00:00:00+00:00

If you choose to host a self-managed (i.e. private) GitLab instance on on-premises infrastructure, you do not get access to GitLab’s shared (public) runners for CI/CD. Unlike GitLab.com or GitLab Dedicated, a private GitLab instance puts you fully in control, but that control comes with responsibility: you have to bring your own runners. In this article, I document my experience setting up a GitLab runner on a Raspberry Pi 3B+, why I chose this route, and how I did it. Spoiler: while the setup itself is fairly straightforward, there are some Raspberry Pi–specific hardware quirks that are worth handling correctly if you care about long-term stability and recoverability.

CI/CD and GitLab

Before diving into hardware, it’s worth briefly setting the stage.

What is CI/CD

Continuous Integration (CI) and Continuous Deployment/Delivery (CD) are practices that automate the process of:

building code,
running tests,
performing static code analysis,
and deploying artifacts whenever changes are pushed to a repository.

The goal is to catch issues early, reduce manual steps, and make software delivery repeatable and reliable.

The Importance of Static Code Analysis

Static analysis is of particular interest since it is essentially the automated scanning of code in order to find and report vulnerabilities and errors. For example, consider the following C code snippet:

// @file main.c
// @brief Illustrates usefulness of static code analysis.
#include 
void foo(int x) {
  int buf[10];
  buf[x] = 0;
}

int main() {
  int *p;
  free(p);
  foo(10);
  return 0;
}

Take a moment to scan the above program and see what might be problematic about it. Once you’re ready, you could go ahead and compile it with your favorite compiler and have the compiler emit warnings. For example, we can use gcc with strict warnings set (see man gcc):

$ gcc -Wpedantic -Wextra -Wall main.c 
main.c: In function ‘foo’:
main.c:3:7: warning: variable ‘buf’ set but not used [-Wunused-but-set-variable]
    3 |   int buf[10];
      |       ^~~
main.c: In function ‘main’:
main.c:9:3: warning: ‘p’ is used uninitialized [-Wuninitialized]
    9 |   free(p);
      |   ^~~~~~~

Great! With just a few flags (note, only the -Wall flag is actually relevant here, but to be very strict I included the -Wpedantic and -Wextra), we have got some quite clear and very useful warnings! Line 3 tells us we have an unused variable named buf while line 7 tells us we have an uninitialized variable p that we are calling free on. Calling free on an uninitialized pointer is undefined behavior (see man 3 free), so we can already infer that this very simple program will crash. Note that the unused variable is not a particularly harmful aspect of the program since it will get optimized out by the compiler anyways. See section Verifying Compiler Optimizations in Assembly of this article for details. However, as a good practice you should not declare variables you don’t use, and gcc warnings can alert you to such artifacts in your code.

While we can easily use gcc to emit warnings, this requires that we compile our program. We can halt gcc after the compilation proper stage so that the assembler doesn’t convert the assembly to object code; however, as you scale the number of files up, you end up wasting computational resources to emit assembly code when that is not actually a necessary stage to detect many potential errors in the code.

Instead, we can use a tool like cppcheck that, per its design docs, performs only a modified subset of the phases that a compiler would normally perform such as the preprocessing, tokenizing, and abstract syntax tree construction. A major difference is cppcheck will not assemble nor link code. Moreover, cppcheck can catch things that a compiler would not. In general, this is the approach of static code analysis: don’t compile or run your code, but use parsing techniques to catch errors or produce warnings instead. For example, if we call cppcheck on the same program,

$ cppcheck main.c 
Checking main.c ...
main.c:6:6: error: Array 'buf[10]' accessed at index 10, which is out of bounds. [arrayIndexOutOfBounds]
  buf[x] = 0;
     ^
main.c:12:7: note: Calling function 'foo', 1st argument '10' value is 10
  foo(10);
      ^
main.c:6:6: note: Array index out of bounds
  buf[x] = 0;
     ^
main.c:11:8: error: Uninitialized variable: p [uninitvar]
  free(p);
       ^

we get the crucial error that our array access is out of bounds! We did not see this error when using gcc. You can also see that the uninitialized variable is caught as well, so in fact there is overlap in what a compiler can catch and what a static code analyzer can catch.

You can hopefully understand now why static code analysis in particular is so valuable and why you would want to include it in a CI/CD pipeline.

What is GitLab

GitLab is a web-based DevOps platform that combines (non-exhaustively):

Git repository hosting,
issue tracking,
CI/CD pipelines,
container registries,

into a single application. One of GitLab’s strengths is how tightly integrated CI/CD is with the project (aka repository) itself. That is, pipelines are defined declaratively via a .gitlab-ci.yml file in a a project’s root directory. The feature of GitLab that executes CI/CD pipelines is known as GitLab runners (analogous to GitHub actions).

GitLab runners are daemons (i.e., background processes, see man daemon) that run on either public (aka shared) servers or are configured to run on on-prem infrastructure. The runner execution flow is highly intuitive, and involves essentially two steps:

Initial registration of the runner with the GitLab instance.
Continuous polling of the GitLab instance by the runner for CI/CD jobs.

Only step (1) requires active administration since step (2) is handled by the runner itself.

Per the GitLab runner documentation, shared runners are not available to self-managed instances. The following table summarizes the different GitLab tiers:

Tier	Shared Runner Available	On-prem	Cloud
GitLab.com	yes	no	yes
Self-managed	no	yes	no
GitLab Dedicated	yes	no	yes

What does this mean? It means that for a self-managed GitLab instance, the administrators maintain full control of their data and infrastructure. So, if you want to use CI/CD, you need to setup any GitLab runners yourself.

The Raspberry Pi 3B+

GitLab runners can be setup on virtually any hardware with networking capabilities; however, since I was primarily interested in a lightweight runner, I chose to deploy a Raspberry Pi 3B+.

Why a Raspberry Pi

The Raspberry Pi 3B+ checked several boxes for me:

A spare Pi was given to me by another department at work.
It’s cheap.
Low power consumption.
Full control over OS, storage, and networking.

The last point, that is full control, is particularly important because I am not an administrator of any of the servers we have at work. By getting my own hardware, I can experiment with the GitLab runner setup without bothering IT.

For CI workloads like static analysis, linting, and lightweight builds, the Pi is perfectly sufficient. Where it could struggle is with multiple developers submitting CI jobs since I have disabled concurrent jobs by default. However, it can certainly tolerate use by less than five developers, which is the current state of the users of the self-managed GitLab instance. More than this would require a runner on more powerful hardware, but that is a next step in the following circumstances:

I find that the developers at work are using GitLab CI/CD enough.
IT is willing to give us such access to the local compute clusters.

Setting up the Pi

I knew that I would be heavily resource constrained, so I opted for a Linux operating system tailored for minimal resource consumption on Raspberry Pis. DietPi fits exactly this requirement and was a breeze to install. Since DietPi is Debian-based, it was also extremely comfortable to administer. But most importantly, DietPi consumed about half of the RAM that the headless Raspberry Pi OS does. Obviously, I want to dedicate as much RAM as possible to the GitLab runner and its operations, so minimizing extraneous RAM utilization was a high priority.

To initially setup the Pi, I flashed DietPi OS onto a microSD, plugged that into the Pi, connected an HDMI cable to a monitor, connected a keyboard, and plugged in power supply. I then went through the setup stage to get language settings and networking working. Straightforward. Boom, done.

However, as an aside, I will voluntarily and shamefully admit that while selecting my regulatory domain for WiFi (see Wiki: WLAN Channels and Linux wireless regulatory domains), I accidentally selected GE. I live in Germany. This was a silly oversight since GE corresponds to Georgia, and when I was scanning for WiFi connections on my local network nothing was coming up. After switching to DE, of course my problem was solved. In short: know basic geography and nomenclature and avoid spending half an hour troubleshooting network connectivity issues.

Enabling USB Boot

We’ve got the Pi initially setup, so I now can focus on bolstering recoverability. SD cards fail. Filesystems get corrupted. Eventually, you will brick something, so might as well put measures in place to facilitate easy recovery.

The Raspberry Pi supports USB boot via a one-time programmable (OTP) bit that enables booting from USB mass storage devices. According to recent documentation, this should already be enabled on newer Pi models.

However, when I checked:

$ vcgencmd get_config program_usb_boot_mode

on the Pi, I got

program_usb_boot_mode=0

This means USB boot was, in fact, not enabled.

To fix this, I had to explicitly update /boot/firmware/config.txt to enable USB boot, then reboot once to permanently set the OTP bit. This update is very simple: just append the following to the end of /boot/firmware/config.txt file:

program_usb_boot_mode=1

This is a one-way operation, but one that significantly improves recovery options if the system becomes unbootable. Why? Because now I can drop any Linux distribution on a USB stick, then plug it into the Pi to triage any issues without having to pull out the Pi microSD to make changes to it with another computer.

Moving the Root Filesystem to an SSD

MicroSD cards have limited write endurance (see here). While it is perhaps a bit overengineering, I wanted to move the root filesystem, where lots of write operations would occur, to a more durable SSD and leave the largely read-only bootloader on the microSD.

Moving the root filesystem was a multistep process. Ultimately, the goal was two-fold:

Update kernel parameters in /boot/firmware/cmdline.txt to point the root file system to the SSD.
Update /etc/fstab on the SSD to reflect the new filesystem information.

First, I plugged the microSD and SSD into a Linux machine. I then needed to create mount points for the filesystems on those storage devices. By listing block devices,

$ lsblk
NAME         MAJ:MIN RM   SIZE   R0 TYPE MOUNTPOINTS
sda            8:0    0 476.9G    0 disk
 |_sda1        8:1    0   512M    0 part  
 |_sda2        8:1    0   2.3G    0 part 
mmcblk0      179:0    0  58.3G    0 disk
 |_mmcblk0p1 179:1    0   128M    0 part 
 |_mmcblk0p2 179:2    0  58.2G    0 part 
nvme0n1      259:0    0 476.9G    0 disk 
 |_nvme0n1p1 259:1    0   260M    0 part /boot/efi
 |_nvme0n1p2 259:2    0 358.4G    0 part /
 |_...

I could look to see the disk and partition sizes. The nvme0n1 disk is the SSD on my laptop, the sda disk is the SSD I will use for the Pi, and the mmcblk0 is the microSD. I then created mount points for the SSD and microSDs:

$ mkdir -p /media/pi/ssd 
$ mkdir -p /media/pi/msd/boot
$ mkdir -p /media/pi/msd/root
$ sudo mount /dev/sda2 /media/pi/ssd/
$ sudo mount /dev/mmcblk0p1 /media/pi/msd/boot 
$ sudo mount /dev/mmcblk0p2 /media/pi/msd/root

I could then copy the contents of the microSD root directory to its new location on the SSD:

$ cp -a /media/pi/msd/root/* /media/pi/ssd/

Now I need to set the kernel boot parameters where the root filesystem is. I need the partition unique ID of the root partition on the SSD, so I listed this with

$ lsblk -o +PARTUUID

and then set the parameter in /boot/firmware/cmdline.txt on the microSD boot partition I’ve mounted like

root=PARTUUID=

I updated also /etc/fstab on the SSD so that the root filesystem mounts from the SSD and not the microSD. Lastly, I booted the Pi to verify everything works.

If you looked closely when I listed the block devices, you may have noticed that while the disk size of the SSD is 476.9G, the size of the root partition is only 2.3G. The reason there are two partitions in the first place is because at one point I tried just booting the Pi directly with the SSD, so this required that I flash DietPi onto the SSD. Flashing the SSD created a boot and root partition that were obviously significantly smaller than the total disk space available. I could have used a tool like gparted, however, DietPi already comes with a simple tool to expand the partition. So after booting up the Pi, I simply called

$ dietpi-drive_manager

and voila, expanding the root partition to maximize usage of the SSD is done.

At this point, the system boots normally and the heavy I/O happens on the SSD. Exactly what we want for CI workloads.

Static Analysis of a Representative Codebase

With the system stable, the next step was validating whether the Pi is actually useful as a CI runner.

ARM-native Docker image

Installing the GitLab runner was surprisingly easy once the Pi was stable. You can follow the GitLab runner installation docs, but in short all I needed to do was curl the setup script and then execute it. I chose to use the Docker executor for my GitLab runner, meaning CI jobs are run in Docker containers managed by the runner. To that end, I built a custom Docker image that comes preloaded with several static analysis tools:

Fortran: fortitude.
Python: ruff.
Shell: shellcheck.
CMake: cmakelang.
Make: checkmake.

This image is used directly by GitLab CI jobs running on the Pi. If you’re interested, the image is available at Docker Hub: jfdev001/iap-gitlab-ci. Naturally, the target architecture is ARM since that’s the Pi’s architecture; however, one could easily adapt the Dockerfile for x86-64 architecture. See section Static Code Analysis Dockerfile for the full Dockerfile at the time of this writing.

It’s also worth noting that during the GitLab runner setup on the Pi, I provided some advanced configuration to enforce only a subset of valid Docker images that could be used, a RAM limit for Docker jobs, and a limit on the total memory of jobs when SWAP space is also used. These restrictions were necessary to keep the intention of the runner for lightweight jobs codified and not just socially contracted.

Simple Performance Results

Naturally, I wanted to test how my new GitLab runner could handle static analysis of a representative codebase. To that end, I submitted a job to perform static analysis of the ICON weather/climate model codebase using fortitude. For context, in the src directory of this codebase, there are roughly 500k lines of code spread over 900 files. I forked the ICON codebase to our self-managed GitLab instance, and provided a job file like the one below:

# @file .gitlab-ci.yml
# @brief Static analysis of ICON's 500k+ lines of code 900 file Fortran codebase
icon-src-static-analysis:
  stage: test
  image: jfdev001/iap-gitlab-ci:static-analysis-base
  tags:
    - lightweight
  script: |
    # Only emit warning of syntax errors detected, omits specific details 
    # NOTE: || true included to prevent job from automatically crashing 
    find codebases/icon/ -iname "*.f90" -type f\
      -exec fortitude check {} + > /dev/null || true

Fortitude completes a full analysis of the codebase in about 10 seconds. For a low-power, fanless ARM board, that’s more than acceptable, and perfectly adequate for pre-merge CI checks.

Conclusion

Running a GitLab runner on a Raspberry Pi 3B+ turned out to be both practical and educational. In summary, I:

Chose a Raspberry Pi for cost and control.
Enabled USB boot via OTP for long-term recoverability.
Moved the root filesystem to an SSD to avoid microSD card wear.
Validated the setup with real CI workloads using a custom ARM-native Docker image.

The result is a small, quiet, low-power CI runner that does exactly what it needs to do, and does it reliably. If you’re running a self-managed GitLab instance and want full control over your CI infrastructure without breaking the bank, a Raspberry Pi is a surprisingly solid option, as long as you respect its hardware quirks.

Appendix

Extra details for anything that I felt didn’t belong in the main article are included below.

Verifying Compiler Optimizations in Assembly

The following section is a deliberate detour into compiler behavior. While tangential to the CI setup itself, it helps clarify what truly matters when interpreting static analysis and compiler warnings.

In the example program in section The Importance of Static Code Analysis, you can verify pretty easily that the unused variable buf is harmless by comparing the assembly generated for main.c when optimization flags are turned off and on. In the following paragraphs, I walk you through assembly code, which is the lowest level code that is semantically meaningful, for main.c.

With gcc, by default the optimization flags are turned off, and we can explicitly halt the compiler after the compilation proper stage to emit assembly code, thus preventing the assembler stage (which converts the assembly code to object files that are not readable with text editors). If we call

$ gcc -O0 -S -fverbose-asm -o main_debug.s main.c 

and look at the foo section of main_debug.s, we see

; @file main_debug.s 
; @brief Snippet of foo section of debug mode gcc assembly output.
; @note For highlighting reasons, I use ';' to denote comments. However, 
;       this is GNU assembly, so '#' would be the syntactically correct choice.
foo:
.LFB6:
    .cfi_startproc
    endbr64 
    pushq %rbp 
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp 
    .cfi_def_cfa_register 6
    subq $64, %rsp 
    movl %edi, -52(%rbp) ; x, x
;    main.c:4: void foo(int x) {
    movq %fs:40, %rax ; MEM[( long unsigned int *)40B], tmp84
    movq %rax, -8(%rbp) ; tmp84, D.2541
    xorl %eax, %eax ; tmp84
;    main.c:6:   buf[x] = 0;
    movl -52(%rbp), %eax ; x, tmp83
    cltq
    movl $0, -48(%rbp,%rax,4) ;, buf[x_2(D)]
;    main.c:7: }
    nop 
    movq -8(%rbp), %rax ; D.2541, tmp85
    subq %fs:40, %rax ; MEM[( long unsigned int *)40B], tmp85
    je .L2 
    call __stack_chk_fail@PLT 
.L2:
    leave 
    .cfi_def_cfa 7, 8
    ret 
    .cfi_endproc

If you have never seen assembly before, the syntax above can be pretty overwhelming. Fortunately, with the -fverbose-asm flag, gcc annotated the important parts so that we can more easily decipher what’s going on. Even if you were unable to read assembly, you could infer from line 20 that somewhere before line 20 the memory for the buf array gets allocated. Why? Because line 20 is a comment indicating that the proceding assembly corresponds to buf[x] = 0. Therefore, at some point before line 20 in the assembly code, there must be some allocation of memory for the array.

Okay, but if we don’t want to play an inference game, how do we know how much memory for buf actually get allocated? From main.c we see that buf is an array where each element is of type int and the array can store 10 elements. While the exact number of bytes for C data types might vary from one hardware architecture to the next, the minimum size for int is 4 bytes. Now we know that buf requires 40 bytes of memory.

If we know how much memory is needed for buf, where in the assembly can we look to confirm that memory is allocated? To understand that, you must understand that when a program executes, every function call requires the operating system to allocate memory for the functions local variables, arguments, and return value. The region of memory for a particular function that gets allocated is called a stack frame. The stack frame is part of the program (aka process) stack itself which has a total size that is much smaller than the total amount of memory that can be allocated on the heap (see RLIMIT_STACK and RLIMIT_DATA in man getrlimit). Lines 9 and 12 tell us that we are allocating a stack frame for the the foo function, but the key giveaway that we are allocating memory for buf on the stack is in line 14 with subq $64, %rsp. The $64 says that we need 64 bytes of memory for this stack frame!

So there you have it, we allocate 64 bytes which is more… more than actually needed for the array. Why are we allocating more memory than we need? Well, we know also that memory must be allocated for the local variable x so that is another 4 bytes. Great, we need at least 44 bytes of memory for our stackframe, but yet we allocate 64 bytes. It turns out that by default we allocate also 8 bytes for a stack canary as a way of securing our code (see -fstack-protector in man gcc, and note that in Ubuntu 14.10 -fstack-protector-strong is enabled by default). Now we need at least 52 bytes of memory for our stack frame! Yet we still are allocating 64 bytes…

It turns out that this is the result of the System V Application Binary Interface (ABI), which mandates that for x86-64 architectures the stack is 16-byte aligned, so therefore additional bytes are used to pad the stack frame until the memory allocated for the stack frame is a multiple of 16. 64 is a multiple of 16, therefore, while we only need 52 bytes for the foo function call, we must allocate 64 bytes to comply with the ABI. The ABI ensures many things, but with respect to alignment, it guarantees that a program can run without modification on multiple operating systems and that function calls occur in a standardized way.

That was a huge digression in order to say “hey, we know that when optimizations are turned off, memory for buf in the stack frame is allocated.” But what about when we turn on optimizations? Oh no… we’re going to have to look at more scary assembly to see how the compiler treats unused variables, which is the warning we saw that gcc emitted way back in section The Importance of Static Code Analysis. If we call,

$ gcc -O3 -S -fverbose-asm -o main_opt_verbose.s main.c

and look at the foo section of main_opt.s, we see

; @file main_opt.s
; @brief Snippet of foo section of optimized gcc assembly output.
foo:
.LFB16:
	.cfi_startproc
	endbr64	
# main.c:7: }
	ret	
	.cfi_endproc

This assembly is much easier to interpret than the unoptimized assembly. You can clearly see that no pushq %rbp operation occurs and no allocation of memory for the stack frame with a subq $LITERAL, %rsp operation occurs. The compiler intelligently optimized the stack frame allocation out because no work occurs in the function call.

All of this just to say that the warning emitted in section The Importance of Static Code Analysis regarding the unused variable buf can actually be safely ignored since it has no effect on an optimized object file anyway.

Static Code Analysis Dockerfile

Here is the Dockerfile used to build the static analysis oriented image for the Docker executor on the GitLab runner.

# @file Dockerfile
# @brief Static analysis base image for AARCH64 (e.g., Raspberry Pi 3B+)
FROM python:3.12-slim

# Install system dependencies and static analysis tools 
ARG SHELLCHECK_ARCH=aarch64
ARG SHELLCHECK_VERSION=0.11.0  # https://github.com/koalaman/shellcheck/releases/tag/v0.11.0
ARG CHECKMAKE_ARCH=arm64
ARG CHECKMAKE_VERSION=0.3.2  # https://github.com/checkmake/checkmake/releases/tag/v0.3.2
ARG RUFF_VERSION=0.14.14     # https://github.com/astral-sh/ruff/releases/tag/0.14.14
ARG FORTITUDE_VERSION=0.7.5  # https://github.com/PlasmaFAIR/fortitude/releases/tag/v0.7.5
ARG CMAKELANG_VERSION=0.6.13 # https://github.com/cheshirekow/cmake_format/releases/tag/v0.6.13
RUN \
    # install system dependencies
    apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    curl \
    tar \
    unzip && \
    apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false && \
    rm -rf /var/lib/apt/lists/* && \
    # install shellcheck
    curl -L https://github.com/koalaman/shellcheck/releases/download/v${SHELLCHECK_VERSION}/shellcheck-v${SHELLCHECK_VERSION}.linux.${SHELLCHECK_ARCH}.tar.gz -o shellcheck.tar.gz && \
    tar -xf shellcheck.tar.gz && \
    rm shellcheck.tar.gz && \
    mv shellcheck-v${SHELLCHECK_VERSION}/shellcheck /usr/bin/ && \
    rm -rf shellcheck-v${SHELLCHECK_VERSION} && \
    # install checkmake
    curl -L https://github.com/checkmake/checkmake/releases/download/v${CHECKMAKE_VERSION}/checkmake-v${CHECKMAKE_VERSION}.linux.${CHECKMAKE_ARCH} -o /usr/local/bin/checkmake && \
    chmod +x /usr/local/bin/checkmake && \
    # install python based tools
    pip install --no-cache-dir \
        ruff==${RUFF_VERSION} \
        fortitude-lint==${FORTITUDE_VERSION} \
        cmakelang==${CMAKELANG_VERSION}

CMD ["bash"] 

]]>

How-to: Enable nvim Color Schemes in tmux Windows

2025-12-29T00:00:00+00:00

I am constantly using ssh to remotely login to different compute clusters (e.g., DKRZ’s Levante, ECMWF’s Atos, local clusters at work, etc.). I find it much more convenient to use a terminal-based text editor like neovim coupled with the standard terminal multiplexer tmux in order to switch easily between several programs in one terminal. However, I quickly encountered an issue where my neovim color scheme was not displayed when using it within tmux. Fortunately, there’s an easy fix to this issue that I cover in this article. I first show the problem and the expected behavior, diagnose the error, and then correct it accordingly. I document the solution here and tie to use with legacy systems.

When launching neovim with tokyo-moon colorscheme in a regular terminal (I’m using Lubuntu 22.04 LTS with default terminal qterminal), you can clearly see the expected color scheme:

Figure (1): Expected tokyo-moon color scheme in neovim.

After launching tmux and inspecting the same file, however, I was only seeing the default neovim color scheme:

Figure (2): Neovim launched in a two-window tmux instance. Top tmux window shows neovim with default color scheme only.

Luckily, neovim makes diagnosing problems very easy. Launching neovim and calling :checkhealth revealed the following:

tmux ~
- ✅ OK escape-time: 0
- ✅ OK focus-events: on
- $TERM: screen-256color
- ⚠️ WARNING True color support could not be detected. |'termguicolors'| won't work properly.
  - ADVICE:
    - Add the following to your tmux configuration file, replacing XXX by the value of $TERM outside of tmux:
      set-option -a terminal-features 'XXX:RGB'
    - For older tmux versions use this instead:
      set-option -a terminal-overrides 'XXX:Tc'

Look at that! We are given an exact warning regarding colors in the terminal. More importantly, we are given advice on how to fix it. I then updated ~/.tmux.conf accordingly:

echo set-option -a terminal-featuers \'xterm-256color:RGB' >> ~/.tmux.conf

The above solution worked locally, but when I ssh’ed into Levante, I found I had the same issue still. When working with compute clusters, it’s not uncommon to encounter very old versions of software. This is precisely the case for the version of tmux on DKRZ’s Levante. That version is 2.7 and is from 2013. Needless to say, all the latest features of tmux aren’t supported there. Fortunately, :checkhealth also provides the solution in such cases! On Levante, the following did the trick:

set-option -a terminal-overrides \'xterm-256color:Tc\' >> ~/.tmux.conf

You may also be thinking: “Why didn’t you just install a more recent version of tmux on Levante and use that?” Well, I did, in fact, try this. For whatever reason, the tmux server would, however, periodically crash. Rather than trying to figure out the cause, I lazily downgraded back to the original tmux version that I knew was stable. Sometimes it’s better to just use what works. If fundamentally I am using tmux as a productivity tool, and I did not want to invest time to figure out the root of the periodic crashes of the latest version.

Maybe in a future article I will document the debugging journey, but for now, I will conclude :)

]]>

How-to: Update PDF Metadata Using pdftk

2025-08-27T00:00:00+00:00

I often download and read scientific journal articles on my Kobo eReader. Unfortunately, the metadata in the PDFs for such journal articles may lack title information. This means that the searchable title that appears for the article once loaded onto the eReader is something very ugly like an uninterpretable string of digits. An easy way to fix this is using the open-source tool pdftk. In this short article, I show how to use pdftk for this purpose.

A look at the man pages for pdftk via man pdftk tells us everything we need to know:

Figure (1): Manual for pdftk.

A bit of a deeper look at the man pages reveals that we need to dump the PDF meta data, modify this in a text editor, and then update the original PDF with the new metadata.

As an example, take Tsunami Propagation from a Finite Source (Carrier 2005). Once you’ve downloaded it, you can inspect the metadata contents with:

pdftk ~/Downloads/cmes.2005.010.113-2.pdf dump_data_utf8 output ~/Downloads/cmes.2005.utf 

Opening ~/Downloads/cmes.2005.utf, you’ll see a number of fields, one of which looks like the following:

InfoBegin
InfoKey: Title
InfoValue: main.dvi

If you change the InfoValue here in ~/Downloads/cmes.2005.utf to your desired name, e.g.,

InfoValue: Carrier 2005: Tsunami Propagation from a Finite Source

and then call

pdftk ~/Downloads/cmes.2005.010.113-2.pdf update_info_utf8 ~/Downloads/cmes.2005.utf output ~/Downloads/cmes_updated.pdf 

then your PDF now has the correct metadata and is ready for reading on an eReader!

Here is a permalink to a script that automates this update process for you. Here is a link to the same script on the main branch (in-case I ever update it). Here is the code pasted below for the script in case you don’t want to check out my github :)

#!/usr/bin/bash
trap 'echo "Error at ${LINENO}: ${BASH_COMMAND}" 2>&1' ERR
usage=$(cat << EOF
usage: $0 [-h] [-i] PDF NEW_TITLE

Update a PDF metadata with a NEW_TITLE and then output the updated PDF to a
default path '\${PDF}_updated'. Pass -i to update the PDF inplace.

positional arguments:
    PDF        Path to PDF to update.

    NEW_TITLE  The new metadata title for the PDF.

optional arguments:
    -h         Print a help message and exit.

    -i         Update the PDF inplace, thus removing any intermediate outputs.
EOF
)
inplace=0
while getopts "hi" flag
do
    case "${flag}" in
        h)  echo "${usage}"
            exit 0
            ;;
        i) inplace=1;;
        *)  echo "ERROR: unrecognized flags"
            echo "try $0 -h"
            exit 1 
            ;;
    esac 
done

# shift past options to get positionals 
shift $(($OPTIND - 1))
n_positional_args=2
PDF=$1
NEW_TITLE=$2

# Script exits immediately on error and errors on uninitialized variables 
# NOTE: when using grep, grep exits 1 on failure to pattern match and therefore
# '|| true' is needed when saving grep output to use in test expressions
set -eu

# Check if all positional arguments were provided
if [ $# -lt $n_positional_args ]
then
    echo "ERROR: Missing positional arguments!"
    echo "Try '$0 -h' for more information."
    exit 1
fi

# Verify positionals
if [[ ! -f "${PDF}" ]]
then
    echo "ERROR: PDF does not exist"
    echo "Got ${PDF}"
    exit 1
fi

# update title metadata
cur_metadata=$(pdftk "${PDF}" dump_data_utf8)
has_title_metadata=$(grep -n "InfoKey: Title" <<< "${cur_metadata}" || true)
if [[ -n "${has_title_metadata}" ]]
then
    # remove existing metadata
    lineno_infokey_title=$(grep --only-matching --perl-regexp \
        "\d*(?=:)" <<< "${has_title_metadata}")
    lineno_infobegin=$((lineno_infokey_title - 1))
    lineno_infovalue=$((lineno_infokey_title + 1))
    cur_metadata="$(printf '%s\n' "${cur_metadata}" | \
        sed "${lineno_infobegin}d;${lineno_infokey_title}d;${lineno_infovalue}d")"
fi

# Define the new metadata
title_metadata=$(cat << EOF
InfoBegin
InfoKey: Title
InfoValue: ${NEW_TITLE}
EOF
)
title_metadata="${title_metadata}"$'\n'
new_metadata="${title_metadata}${cur_metadata}"
output_pdf_path="$(dirname $(realpath ${PDF}))/updated_$(basename ${PDF})"
tmp_new_metdata="/tmp/new_metadata"
echo "${new_metadata}" > "${tmp_new_metdata}"

# update the pdf 
pdftk "${PDF}" update_info_utf8 "${tmp_new_metdata}" output "${output_pdf_path}"

if [[ ${inplace} -eq 1 ]]
then
    rm "${PDF}"
    mv "${output_pdf_path}" "${PDF}"
    output_pdf_path="${PDF}"
fi

echo "output written to ${output_pdf_path}"
rm "${tmp_new_metdata}"

]]>

Extra Mathematical Details: The Steady-State Reaction-Diffusion Equation and its Solution in PETSc

2025-08-16T00:00:00+00:00

In this article, some extra mathematical details related to the solution of the steady-state reaction-diffusion equation using PETSc are discussed. First, the simple nonlinear governing equation of interest is shown. Then, Newton’s method is presented at the partial differential equation (PDE) level for generality rather than being presented at the algebraic level. Subsequently, the spatial discretization via the finite difference method is shown for completeness. Finally, a commented PETSc implementation of the discretized reaction-diffusion equation is shown to concretely illustrate how the mathematical notation maps to code.

The Governing Equation

The general form of the one dimensional, time evolving heat equation can be modelled with diffusion (i.e., $\frac{\partial^2 u}{\partial x^2}$ in 1D or $\nabla^2 u$ in N-dimensions), reaction $R(u)$, and source $f(\cdot)$ processes for temperature (or substances more generally) with concentration $u(x, t)$ and is given by

\[\frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2} + R(u) + f. \\ \tag{1}\]

In this post, we focus on the time independent (aka steady-state) form of equation (1)

\[0 = \frac{\partial^2 u}{\partial x^2} + R(u) + f. \\ \tag{2}\]

We define also $R(u) = -\rho \sqrt u$ and $f(x) = 0$ as well as the Dirichlet boundary conditions $u(0) = \alpha$ and $u(1) = \beta$ for our domain $\Omega = [0, 1] $.

Clearly, the square root function in $R(u)$ introduces a nonlinear term into equation (2). Since identifying whether an equation has nonlinear terms naturally determines whether we can use a linear or nonlinear solver, it is useful to recall the definition of a linear operator $L$ which is said to be linear if, for every pair of functions $f$ and $g$ and a scalar $\theta$

\[L(f + g) = L(f) + L(g),\]

and

\[L(\theta f) = \theta L(f).\]

In this case, the square root term is obviously nonlinear. Given this nonlinearity, a nonlinear solver must be used to solve equation (2).

Newton’s Method at the PDE Level

We present here Newton’s method for solving nonlinear equations at the PDE level to make it as generally applicable as possible. Newton’s method is often presented at the algebraic level since its presentation is paired with the discretization (e.g., via finite difference methods, finite element methods, etc.) method of the PDEs. Since there are a large number of techniques to discretize PDEs, it is more general to formulate Newton’s method in the context of the strong (“original”) form of the PDEs. You can then select and apply the appropriate steps for your desired discretization (e.g., for the finite element method, this involves deriving a weak formulation and substituting the finite element solution using specified basis functions) followed by an application of your solver to the discretized PDEs.

Now, Newton’s method is for solving general nonlinear equations of the form

\[F(u) = 0. \\ \tag{3}\]

Substituting equation (2) into equation (3) and then expanding terms

\[F(u) = \frac{\partial^2 u}{\partial x^2} - \rho \sqrt u = 0, \\ \tag{4}\]

we now have our governing equation in the desired general form. A cornerstone of computational science is converting problems into forms that we know how to solve effectively. We know and have numerous methods to solve linear systems of equations. Newton’s method iteratively approximates solutions to $u$ in equation (4) by linearizing (i.e., converting a nonlinear problem to a linear one) around an iterate, solving the new linear system for a step in the direction of the solution, and then adding that step to the current iterate. That step in the direction of the solution is called a “Newton step” or perturbation. Mathematically, this solution process can be written as

\[\begin{aligned} F(u^k + \delta u) &= F(u^k) + F'(u^k)(\delta u), && \text{(5.1)}\\ F'(u^k)(\delta u) &= -F(u^k), && \text{(5.2)} \\ u^{k+1} &= u^{k} + \delta u. && \text{(5.3)} \end{aligned}\]

The “linearization” is equation (5.1): it is a truncated Taylor series that is a linear function of $\delta u$ that approximates $F$ near $u^k$ at iteration $k$, therefore we have replaced our nonlinear function with a linear one. We seek the zero of this function, that is $F(u^k + \delta u) = 0$ and rearrange to get equation (5.2). The term $F’(u)(\delta u)$ may look a little suspicious, but consider for a moment what $F$ actually is. If $F$ were a scalar function, naturally $F’$ is the ordinary derivative. Similarly, if $F$ were a vector function, $F’$ would be the Jacobian written as $J_F(u^k)$. However, $F$ is neither a scalar nor a vector function, but rather it is an operator: a map of one function space to another function space. That is, $F$ maps the function space of $u$ to another function space. Thus, we must define the derivative of an operator. This is given by the Gateaux derivative

\[dF(u; \delta u) = \lim_{\epsilon \rightarrow 0} \frac{F(u + \epsilon \delta u) - F(u)}{\epsilon} = \left . \frac{d}{d\epsilon} F(u + \epsilon \delta u) \right\vert_{\epsilon = 0}.\]

It turns out that by definition, $F’$ in $dF(u; \delta u) = F’(u)(\delta u)$ is just the continuous linear operator represented by the Jacobian matrix (see Rall 1971). Note, there are technically some subtleties here for the mathematically rigorous reader—I am no formal mathematician myself, of course. The Gateaux derivative, by definition, is not necessarily linear or continuous, but if we put the linear and continuous constraints on the Gateaux derivative, then we define the Frechet derivative. In the context of PDEs for which one is likely to perform numerical simulation, it is reasonable to assume the linearity and continuity requirement. Experts will still call the derivative for operators the Gateaux derivative (see Bangerth: Lecture 31.55) even when the term Frechet derivative might be more appropriate. I am only mentioning this distinction since you might encounter the Gateaux or Frechet derivative in your own study of numerical PDEs. Thus, to solve nonlinear PDE problems via Newton’s method, we must do one of the following:

(a) Provide the Jacobian explicitly after deriving and discretizing the Gateaux derivative.

(b) Allow a library (or write it yourself if you so desire) to compute the Jacobian by relying on automatic differentiation (e.g., NonlinearSolver.jl: Solvers) and/or sparsity detection (e.g., NonlinearSolve.jl: Declaring a Sparse Jacobian with Automatic Sparsity Detection).

(c) Allow a library to symbolically compute the Jacobian (e.g., Mathematica: Unconstrained Optimization – Methods for Solving Nonlinear Equations, FENICS: Solving the Nonlinear Variational Problem Directly, etc.).

It is worth noting that there are Jacobian-free methods (see Knoll 2004), but these come with their own trade-offs.

To be as explicit as possible, we now compute the Gateaux derivative of $F$

\[\begin{aligned} \left . \frac{d}{d\epsilon} F(u + \epsilon \delta u) \right\vert_{\epsilon = 0} &= \left . \frac{d}{d\epsilon}\left[ \frac{\partial^2}{\partial x^2}(u + \epsilon \delta u) - \rho \sqrt{(u + \epsilon \delta u)} \right] \right\vert_{\epsilon = 0} \\ &= \left . \frac{d}{d \epsilon}\left[\frac{\partial^2}{\partial x^2}u\right] + \frac{d}{d \epsilon}\left[\frac{\partial^2}{\partial x^2}\epsilon \delta u \right] - \frac{d}{d \epsilon} \left[ \rho \sqrt{(u + \epsilon \delta u)} \right] \right\vert_{\epsilon = 0} \\ &= \left . \frac{\partial^2}{\partial x^2} \delta u - \frac{d}{d \epsilon} \left[ \rho \sqrt{(u + \epsilon \delta u)} \right] \right\vert_{\epsilon = 0} \\ &= \left . \frac{\partial^2}{\partial x^2} \delta u - \frac{\rho}{2}(u + \epsilon \delta u)^{-\frac{1}{2}} \delta u \right\vert_{\epsilon = 0} \\ &= \frac{\partial^2}{\partial x^2} \delta u - \frac{\rho}{2 \sqrt u} \delta u. \end{aligned} \\ \tag{6}\]

Once we have the Gateaux derivative, equation (5.3) is simply an update of the solution space in the direction “pointing” toward the zero of our nonlinear function.

Discretization by the Finite Difference Method

We now have the continuous forms of the reaction-diffusion equation suitable for solution via Newton’s method. To provide implementations, these continuous forms have to be discretized as shown in the next sections. We therefore define a structured, 1D grid with spacing $h$ between grid points and indices $i \in [0…5] $ where the full domain is $x \in [0, 1]$. Recall also that $u(0) = \alpha$ and $u(1) = \beta$. Note that the grid spacing can be calculated by $h = \frac{1}{m_x - 1}$ where $m_x$ is the number of grid points. Naturally, we subtract 1 as there are $m_x - 1$ “spacings” between the $1^{st}$ and $m_x^{th}$ grid point. The grid is depicted and annotated below for reference.

                  h
               -------
               |     |
               V     V
u: α     ?     ?     ?     ?     β
   ▆-----▆-----▆-----▆-----▆-----▆
x: 0    0.2   0.4   0.6   0.8    1
i: 0     1     2     3     4     5

The Residual Form of the PDE

Here, we discretize equation (4). The focus of this article is not on the derivation of the finite difference method, so, if you’re unfamiliar with it, a clear explanation and derivation can be found at CFD University’s: The Finite Difference Method. Ignore the bits in that article related to the Navier-Stokes equations, of course. In any case, the discrete form of the second derivative operator using a centered finite difference is given by

\[\begin{aligned} F(u) \approx F(u_i) = F_i &= \frac{u_{i-1} - 2 u_i + u_{i+1}}{h^2} + R(u_i) \\ &= \frac{u_{i-1} - 2 u_i + u_{i+1}}{h^2} - \rho \sqrt{u_i}, \end{aligned} \tag{7}\]

where $u_i$ the solution at grid point $i$. That’s it!

The Jacobian from the Gateaux Derivative

Since equation (6) defines a linear operator on $\delta u$, we only care about discretizing the coefficients of $\delta u$. That is, we want the discrete form of the continuous linear operator

\[F'(u) = \frac{\partial^2}{\partial x^2} + \frac{dR}{du} = \frac{\partial}{\partial x^2} - \frac{\rho}{2 \sqrt u}, \tag{8}\]

since $F’(u)$ acts on $\delta u$ as represented by $F’(u)(\delta u)$ in equation (5.2). If we apply a centered finite difference scheme to discretize the second derivative operator on $\delta u$, we have

\[\frac{\partial^2}{\partial x^2} \delta u \approx \frac{\delta u_{i-1} - 2 \delta u_i + \delta u_{i+1}}{h^2}. \\ \tag{9}\]

For the derivative of the reaction function given by $\frac{dR}{du}$, $u$ in $\frac{\rho}{2 \sqrt u}$ corresponds simply to $u_i^{k}$ since this is the discrete solution at the grid point $i$ at Newton step $k$.

If you’re paying attention, you’ll notice that the discrete form of the second derivative operator—a linear operator—in equation (9) acting on $\delta u$ is the same as the discrete form of the second derivative operator when it acts on $u$ in equation (7). This may seem like a rather silly or obvious thing to note; however, it’s surprisingly important for the efficient solution of nonlinear equations since the discrete form of the linear operator is completely independent of the values of $u_i^k$—which is changing at every iteration as defined by equation (5.3). This means that a performance-conscious programmer can not only initialize the sparsity pattern of the Jacobian but can also cache those values of the discrete linear operator in the Jacobian—meaning you do not have to assign the coefficients of the discrete linear operator to the Jacobian more than once. This doesn’t apply to the PETSc implementation of the present article, but it’s information that’s certainly worth knowing. You can find an example of this sort of caching in a solving the Navier-Stokes equations with Ferrite.jl example if you like.

With the discrete operations described above, we now frame the action of the Jacobian on the perturbation $\delta u$ in the context of a Newton iteration $k$. The Jacobian can therefore be written—emphasizing the coefficients of the Jacobian by wrapping them in brackets—in the form

\[\begin{aligned} F'(u^k)(\delta u) &= J_F(u^k) \delta u \\ &= \left[ \frac{1}{h^2} \right] \delta u_{i-1} + \left[ \frac{-2}{h^2} \right] \delta u_i + \left[ \frac{1}{h^2} \right] \delta u_{i+1} + \left[\frac{-\rho}{2 \sqrt{u^k_i}}\right]\delta u_i \\ &= \left[ \frac{1}{h^2} \right] \delta u_{i-1} + \left[ \frac{-2}{h^2} - \frac{\rho}{2 \sqrt{u^k_i}} \right] \delta u_i + \left[ \frac{1}{h^2} \right] \delta u_{i+1}. \end{aligned} \\ \tag{10}\]

With the components of the Jacobian explicitly specified, we can now propose functions for implementing the solution of the reaction-diffusion equation using PETSc.

Commented Implementation in PETSc

PETSc provides and supports a suite of (non)linear solvers, preconditioners, data types for linear algebra, massive scalability through automatic support for distributed and shared memory parallelism, and much more. The user need only provide a comparatively simple—at least for the problem we consider in this article—set of functions that specify their particular problem. Per figure (1), the user needs to implement FormFunctionLocal—which is simply $F(u)$—as well as FormJacobianLocal—which is simply $J_F(u^k)$. First the implementation will be shown, then relevant parts of the code will be mapped to their mathematical equivalent.

Figure (1): An overview of a full PETSc solution to the reaction-diffusion equation. Critically, the user need only implement two functions. Taken from Bueler 2021.

Adapted Implementation

In this section we implement the necessary PETSc functions for the solution of the reaction-diffusion equation. These implementations correspond directly to the maths introduced in the present article.

The PETSc function below for $F(u)$ is adapted from p4pdes/reaction.c:92-115.

// Compute F(u) for reaction-diffusion equation
// Reference: Equation (7)
PetscErrorCode FormFunctionLocal(DMDALocalInfo *info, PetscReal *u,
                                 PetscReal *FF, AppCtx *user) {
    PetscInt   i;
    PetscReal  h = 1.0 / (info->mx-1), x, R;
    // iterate through grid indices 
    for (i=info->xs; i<info->xs+info->xm; i++) {
        // point on left boundary
        if (i == 0) {
            FF[i] = u[i] - user->alpha;
        // point on right boundary
        } else if (i == info->mx-1) {
            FF[i] = u[i] - user->beta;
        // interior 
        } else {
            // stencil includes left boundary
            if (i == 1) {
                FF[i] = user->alpha - 2.0 * u[i] + u[i+1]; 
            // stencil includes right boundary
            } else if (i == info->mx-2) {
                FF[i] = u[i-1] - 2.0 * u[i] + user->beta;
            // stencil is purely in interior 
            } else {
                FF[i] = u[i-1] - 2.0 * u[i] + u[i+1];
            }
            R = -user->rho * PetscSqrtReal(u[i]);
            FF[i] = FF[i] / (h*h) + R;
        }
    }
    return 0;
}

In line 6, the grid spacing h is computed as expected. In line 8, we iterate through the locally owned part of a distributed vector, hence the indices start from xs and go until (but excluding) the local index start plus the number of points info->xm owned by the process.

There are, of course, two special cases that occur while iterating over the grid points: the left boundary (i.e., $u(0) \equiv u_0 = \alpha$) and the right boundary (i.e., $u(1) \equiv u_{m_x - 1} = \beta$) conditions. From equation (4) and equation (7), we know that $F_i = 0$, and if we recognize the fact that the boundary conditions demand that $u_0 = \alpha $, then we can infer that $F_0 = u_0 - \alpha = 0$, which is exactly the residual form we need. Line 11 follows from this reasoning. The same logic applies to line 14 but for the right boundary condition.

Lines 16 to 29 handle the interior points. Line 19 is the discrete second derivative for the second—index i=1—grid point in the 1D grid where on the left boundary $u_{i-1} = u_{0} = \alpha$. Line 22 is analagous but for the right boundary where $u_{i+1} = u_{m_x - 1} = \beta $. Line 27 computes the reaction function using its definition. Finally, line 28 divides the numerator of equation (7) that was computed in one of the branches of lines 18 to 26 by the square of the grid size accordingly to complete the computation of the discrete second derivative of $u$, then the reaction function evaluated in line 27 is added also per equation (7).

Next, the PETSc function for the Jacobian is adapted from p4pdes/reaction.c:117-114.

// Compute J_F(u^k) for reaction-diffusion equation
// Reference: Equation (10)
PetscErrorCode FormJacobianLocal(DMDALocalInfo *info, PetscReal *u,
                                 Mat J, Mat P, AppCtx *user) {
    PetscInt   i, col[3];
    PetscReal  h = 1.0 / (info->mx-1), dRdu, v[3];
    for (i=info->xs; i<info->xs+info->xm; i++) {
        // boundary conditions
        if ((i == 0) || (i == info->mx-1)) {
            v[0] = 1.0;
            PetscCall(MatSetValues(P,1,&i,1,&i,v,INSERT_VALUES));
        // interior 
        } else {
            col[0] = i;
            dRdu = -user->rho / (2.0 * PetscSqrtReal(u[i]));
            v[0] = -2.0 / (h*h) + dRdu;

            col[1] = i-1;   
            v[1] = (i > 1) ? 1.0 / (h*h) : 0.0;
        
            col[2] = i+1;   
            v[2] = (i < info->mx-2) ? 1.0 / (h*h) : 0.0;

            PetscCall(MatSetValues(P,1,&i,3,col,v,INSERT_VALUES));
        }
    }
    PetscCall(MatAssemblyBegin(P,MAT_FINAL_ASSEMBLY));
    PetscCall(MatAssemblyEnd(P,MAT_FINAL_ASSEMBLY));
    if (J != P) {
        PetscCall(MatAssemblyBegin(J,MAT_FINAL_ASSEMBLY));
        PetscCall(MatAssemblyEnd(J,MAT_FINAL_ASSEMBLY));
    }
    return 0;
}

Line 6 and 7 are essentially the same as in FormFunctionLocal.

Once again, we handle the boundary conditions in lines 9 to 13. But why assign a value of 1.0 to the first element of a 3-element vector v and then use v to set only a single element of the matrix P?

As in FormFunctionLocal, at the boundaries we have

\[F_0 = u_0 - \alpha,\]

and

\[F_{m_x - 1} = u_{m_x - 1} - \beta,\]

in the discrete form. Though since we’re concerned with using the continuous notation as this is the form you are most likely to encounter, we have of course $u(0) = \alpha$, which implies that $F(u(0)) = \alpha $. Given that $F(u) = 0$ from equation (4), it follows that

\[F(u(0)) = u(0) - \alpha = 0.\]

In contrast to equation (4), there is obviously no second derivative operator or reaction operator acting on $u$. What this means is that for $u(0)$ and $u(1)$, we can simply compute the derivative with respect to the unknown at the given boundary points such that

\[F'(u(0)) = \frac{\partial F(u(0))}{\partial u(0)} = \frac{\partial}{\partial u(0)} u(0) - \frac{\partial}{\partial u(0)} \alpha = 1,\]

and similarly

\[F'(u(1)) = \frac{\partial F(u(1))}{\partial u(1)} = \frac{\partial }{\partial u(1)} u(1) - \frac{\partial }{\partial u(1)} \beta = 1.\]

Returning to the discrete form, if $F(u) \approx F(u_i) $ from equation (7), it follows that $F’(u) \approx F’(u_i)$. Therefore, when i = 0, we have $F’(u_0) = 1$ and when i = mx-1, we also have $F’(u_{m_x - 1}) = 1$. This means we set only a single element of P corresponding to the indices i, that is P[i][i] = 1.0 since no further elements need to be set to satisfy the discretization of the boundary conditions. Note that &i in line 11 is a 1-element array and is used since MatSetValues expects an expression that can be treated as an array since one usually assigns the values corresponding to arrays of row and column indices. Since we are only assigning one value, we need only a 1-element array. See PETSc: MatSetValues and note that idxm[] and idxn[] in the function parameters automatically decays to a pointer by the C language implementation. This concludes the discussion of enforcing boundary conditions in the Jacobian.

In lines 13 to 25 we handle the interior points. Lines 14 to 16 define the coefficient of $\delta u_i$ in equation (10). Note that in the FormJacobianLocal function, u corresponds to the vector of values of the solution u at Newton iteration k. Therefore, u[i] is equivalent to $u_i^k$. Lines 18 and 19 define the coefficient of $\delta u_{i-1}$ using equation (10) only if i does not lie on the boundary, that is enforce the PDE on the interior points and eliminate any coupling with boundary points. Lines 21 and 22 are the same principle but for the coefficient $\delta u_{i+1}$.

Lastly, lines 29 to 32 sets the Jacobian J to P (see SNES: Jacobian Evaluation), meaning J is the matrix from which the preconditioner $M$ is built. While the Jacobian is usually the same as the matrix from which the preconditioner is built, in principle you could set a different matrix P that may have more desirable properties (e.g., better conditioned) than J. In practice, P is used in conjunction with the specified preconditioner (e.g., Jacobi, ILU, etc.) $M$ that is used to solve a left (default for Krylov solvers in PETSc) preconditioned system of equations arising from equations (5.2) and (10) such that

\[M^{-1} \left [ J_F(u^k) \delta u \right ] = M^{-1}\left[-F(u^k)\right].\]

To make the preconditioner discussion concrete, if we were to tell PETSc to use a Jacobi preconditioner, then $M = \text{diag}(J) = \text{diag}(P)$.

With this, we conclude the discussion of the relationship between the mathematics in the first part of the article and the adapted implementation for $F(u)$ and $J_F(u^k)$.

Brief Comment on Original Implementation

The adapted implementation in the previous section is based on maths that we derived in the present article with the intention of making the derivation more clear as well as generally applicable to nonlinear PDE problems. However, the basis of the implementation is from Bueler 2021, and for completeness we shortly explain how the original implementation maps to slightly different rearrangements of the maths we covered throughout the rest of the article.

In Bueler 2021, the governing equation is formulated as

\[F(u) = -\frac{\partial^2 u}{\partial x^2} + \rho \sqrt u,\]

and the corresponding implemented discretization by finite differences is

\[F(u) \approx F(u_i) = F_i = -u_{i+1} + 2 u_i - u_{i-1} - h^2 (\rho \sqrt u + f),\]

though $f = 0$ in this problem, so that term may be ignored. In the original code, there is also a flag for including the the derivative of the reaction function in the Jacobian. If the derivative of the reaction function is excluded from the Jacobian, this simplifies the diagonal of the Jacobian, and therefore the resulting matrix $K \approx J$ is an approximation for the Jacobian with similar spectral characteristics. For simplicity, we omit this flag in the adapted implementation.

Other than these small changes, the adapted implementation differs very little from the original implementation.

Conclusion

In this article, we bridge the gap between continuous nonlinear PDE theory and practical implementation. We cover Newton’s method, Gateaux derivatives, and an application of the finite difference discretization, while providing concrete PETSc code to solve the nonlinear steady-state reaction-diffusion problem. With mathematical and implementation details explicitly treated, hopefully the reader can now more confidently reason about low level numerical codes that they read and write. Future articles will show similar step-by-step mathematical and implementation details for systems of PDEs as well time dependent PDEs.

References

[1] : Bueler, E. Chapter 4: Nonlinear equations by Newton’s Method in PETSc for Partial Differential Equations: Numerical Solutions in C and Python. SIAM 2021.

[2] : Logg, A. et. al. Chapter 1.2: Nonlinear problems in Automated Solution of Differential Equations by the Finite Element Method – The FEniCS Book. Springer 2012

[3] : Heath, M. Chapter 5: Nonlinear equations in Scientific Computing: An Introductory Survey, 2ed. SIAM 2018.

[4] : Bangerth, W. MATH 676 Lecture 31.55 from Colorado State University: Nonlinear problems, part 2 – Newton’s method for PDEs.

[5] : Dokken, J. The FEniCS Tutorial: Custom Newton Solver.

]]>