I run Debian on all my servers. It’s a great stable OS and I love it. Proxmox, which I run on my homelab server, is also based on Debian.
However, on my desktop I run Arch Linux. It’s a great distro to tinker with. It comes with a lot of up to date packages, but it also has the AUR - Arch User Repository. So for any app you can find, there probably is an easy way to install it.
Slllooooowwww…#
As of late, I noticed that boot times on my system were getting longer. Which is strange, because I run some pretty okay hardware.
As it turns out, cold booting this box takes 1min 7.538s, according to my logs.
Luckily, the Arch Wiki offers a nice guide on how to trouble shoot boot performance.
There’s systemd-analyze blame
which will show the time it takes each service to start up. I’ve copied the top 10 here, which incidentally are also all >1 second start-up times.
❯ systemd-analyze blame 20.771s docker.service 3.514s dev-sdb3.device 2.459s systemd-journal-flush.service 1.880s upower.service 1.806s ldconfig.service 1.687s systemd-tmpfiles-setup.service 1.587s containerd.service 1.287s systemd-modules-load.service 1.032s systemd-fsck@dev-disk-by\x2duuid-96EB\x2d4C82.service 1.028s cups.service
Docker is a clear offender here. dev-sdb3
is also quite slow it seems.
Another command recommended in the wiki is systemd-analyze critical-chain
. This will show you the critical chain to boot your system. Again, docker is here clearly a big offender.
❯ systemd-analyze critical-chain The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character. graphical.target @33.660s └─multi-user.target @33.660s └─docker.service @12.888s +20.771s └─containerd.service @11.264s +1.587s └─network.target @11.236s └─wpa_supplicant.service @27.465s +268ms └─basic.target @10.366s └─dbus-broker.service @9.822s +541ms └─dbus.socket @9.793s └─sysinit.target @9.759s └─systemd-update-done.service @9.722s +36ms └─systemd-journal-catalog-update.service @9.375s +326ms └─systemd-tmpfiles-setup.service @7.657s +1.687s └─local-fs.target @7.587s └─boot.mount @7.458s +128ms └─systemd-fsck@dev-disk-by\x2duuid-96EB\x2d4C82.service @6.398s +1.032s └─dev-disk-by\x2duuid-96EB\x2d4C82.device @6.397s
But wait, there’s more. systemd-analyze plot > plot.svg
will generate an SVG image showing you the entire boot process in time. It’s big, but there are some clear red markers that indicate issues.
At the bottom right you’ll find graphical.target
, where we want to end up as quickly as possible. And it’s clear docker
is in the way.
Open the SVG in a new window to see more detail.
Fixed it!#
So, with docker
as a clear offender in slowing down the boot process, let’s fix that.
There are two systemd units: docker.service
and docker.socket
.
docker.service
is there to start docker and make sure it is up and running.docker.socket
listens on/run/docker.sock
(or/var/run/docker.sock
through a symlink) and will startdocker.service
when needed.
I think you know where this is going. docker.socket
is disabled by default and docker.service
is enabled. Which makes sense, because when you boot your machine you want docker up and running as well. Especially for servers this makes sense.
For my desktop, not so much. I use docker, but not always and I prefer to login and check my email while docker is booting in the background anyway.
The trick thus is to disable docker.service
from starting automatically and make sure docker.socket
is enabled. That will take docker out of the criticial chain when booting and start docker when I’m logged in and ready to use it.
$ sudo systemctl disable docker.service
$ sudo systemctl enable docker.socket
So, what does that look like in systemd-analyze
?
❯ systemd-analyze critical-chain The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character. graphical.target @3.893s └─multi-user.target @3.893s └─cups.service @3.672s +220ms └─nss-user-lookup.target @3.763s
❯ systemd-analyze blame 2.152s systemd-modules-load.service 1.295s dev-sdb3.device 622ms boot.mount 385ms NetworkManager.service 310ms systemd-udev-trigger.service 280ms udisks2.service 258ms systemd-remount-fs.service 220ms cups.service 203ms user@1000.service 189ms systemd-tmpfiles-setup.service
Open the SVG in a new window to see more detail.
❯ systemctl status docker.socket ● docker.socket - Docker Socket for the API Loaded: loaded (/usr/lib/systemd/system/docker.socket; enabled; preset: disabled) Active: active (running) since Thu 2024-02-08 10:38:47 CET; 5min ago Triggers: ● docker.service Listen: /run/docker.sock (Stream) Tasks: 0 (limit: 38400) Memory: 0B (peak: 516.0K) CPU: 1ms CGroup: /system.slice/docker.socket
and
❯ systemctl status docker.service ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; preset: disabled) Active: active (running) since Thu 2024-02-08 10:39:33 CET; 5min ago TriggeredBy: ● docker.socket Docs: https://docs.docker.com Main PID: 2522 (dockerd) Tasks: 42 Memory: 222.1M (peak: 235.7M) CPU: 797ms CGroup: /system.slice/docker.service └─2522 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Was it worth it?#
Before:
Startup finished in 14.729s (firmware) + 6.386s (loader) + 12.761s (kernel) + 33.661s (userspace) = 1min 7.538s graphical.target reached after 33.660s in userspace.
After:
Startup finished in 13.735s (firmware) + 4.074s (loader) + 6.744s (kernel) + 3.893s (userspace) = 28.448s graphical.target reached after 3.893s in userspace.
Total boot time went down from 1m8s to 28s. I cannot explain the difference in kernel boot time, but the userspace savings are significant.
From here I could probably optimize more by compiling a customized kernel or using a different bootloader. Suspend to RAM would be even faster, but that feels like cheating against a hard boot.
Hopefully this will give you some pointers in how to troubleshoot slow boot times on your machine.