Deploying SwiftOS on a Hetzner Cloud ARM VM
Handoff note: how SwiftOS was brought up as the actual OS of a Hetzner Cloud
ARM (CAX) server — not as a QEMU guest under Linux. Written so a fresh session
can understand what was done, reproduce it, and continue. Done 2026-06-16; all
changes are on main (ff7c7d2, c7ff85e, b7fa6d2, ac0484e).
TL;DR — current state
SwiftOS boots on the bare Hetzner VM and is reachable over SSH:
ssh root@138.199.222.99 /bin/id → principal=1(root) (persistent, repeatable)
ssh root@138.199.222.99 /bin/netinfo → ipv4 138.199.222.99/32 gw 172.31.1.1 (dhcp) ready yes
- Target server:
swiftos.tech/138.199.222.99, Hetzner Cloud aarch64 VM, fully wipeable. - sshd is bounded-exec only (runs
/bin/<tool>;/bin/id,/bin/echo,/bin/netinfo, …). No interactive shell yet. - Login key: the operator's
~/.ssh/id_ed25519(staged into the image as/etc/ssh/authorized_keys). - SwiftOS host key for
known_hosts(derived from the staged seed):ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEaFnBj5Su/fhH1HqYa7Ri/8HECFadpoJWBv55FW5weO - Regression gate:
make hetzner-deploy-test.
The target hardware (probed from the live VM)
Hetzner Cloud ARM VM = QEMU/KVM with EDK2 (UEFI) firmware. Key facts:
| Aspect | Hetzner VM | What stock SwiftOS assumed |
|---|---|---|
| Firmware/config | ACPI (EDK2; RSDP/MADT/MCFG/SPCR/GTDT) | device-tree only (-M virt,acpi=off, FDT) |
| virtio transport | virtio-PCI over PCIe ECAM @ 0x40_1000_0000 |
virtio-mmio only |
| PCI topology | only the GPU is on bus 0 (00:01.0); NIC 01:00.0, RNG 05:00.0, SCSI 06:00.0 are behind PCIe root ports (00:02.x) on non-zero buses |
everything on bus 0 |
| Boot disk | virtio-scsi (1af4:1048), GPT + ESP |
virtio-blk on mmio |
| Interrupts | GICv3 (GICD 0x0800_0000, GICR 0x080a_0000) |
GICv2 (MMIO GICC) |
| Console | noVNC graphical only (virtio-gpu) — no serial tab, no serial input | PL011 serial (matches, but invisible on Hetzner) |
| Net (real) | DHCP gives a /32 IPv4 + off-link gateway 172.31.1.1 |
tested only against QEMU slirp /24 |
| RAM / CPU | base 0x4000_0000, ~4 GiB, 2 vCPU |
base 0x4000_0000 (matches) |
The single thing that matched out of the box was the PL011 base address — but the Hetzner Cloud console is graphical (virtio-gpu), so the serial log is invisible there anyway.
How SwiftOS got here: the H-series
H0–H5 (pre-this-session, already on main before today) made the QEMU model of
this VM boot end-to-end:
- H0
make hetzner-run— a local QEMU profile reproducing the VM (AAVMF, GICv3, virtio-pci, virtio-scsi). Found: no FDT in ACPI mode; GICv2 panic; etc. - H1 GICv3 driver (dual-path v2/v3) —
kernel/drivers/gic.swift. - H2 PCIe ECAM enumeration + a
VirtioTransport(mmio|pci) —kernel/drivers/pci.swift,kernel/drivers/virtio_transport.swift. - H3 root FS from a RAM base image loaded by the EFI loader from the ESP (no kernel disk driver) —
boot/efi/loader.c,kernel/fs/ramdisk.swift. - H4 virtio-net over PCI + bounded SSH.
- H5 platform config parsed from ACPI (RSDP→MADT/MCFG/SPCR/GTDT), not FDT —
kernel/arch/aarch64/acpi.swift.
H6 (this session) = actually putting it on the metal. It surfaced four real
bugs that only appear on the true device model — these are the important part.
The four bugs found on the real VM (the gold)
The lesson running through all four: the local tests put every virtio device on bus 0 and drove the serial console, so they never reproduced the real topology or the headless/console reality. Each fix came with a test that does reproduce it.
Per-address-space PCI MMU map (
ff7c7d2,kernel/mm/vm.swift). H2 mapped the PCIe ECAM and the 64-bit virtio MMIO window only in the kernel boot page tables (vm_early.c).addressSpaceCreate()rebuilds a fresh page table per process but replicated only the low device/RAM blocks. So once/bin/sshd(an EL0 process) was the currentTTBR0, the kernel touching the NIC BAR in the 64-bit window during TX/RX faulted at translation level 0 (ESR 0x96000004,FAR 0x80_0000_5000) → panic on the first incoming connection. DHCP survived only because it runs early on the boot tables. Fix: mirror both device windows (l1[256]ECAM,l0[1]→ 0x80_0000_0000) into every address space.virtio-gpu scanout console + headless boot (
c7ff85e,kernel/drivers/virtio_gpu.swift,kernel/drivers/fb.swift,kernel/main.swift). Two problems, both because Hetzner's only console is noVNC (virtio-gpu) with no serial:- SwiftOS is serial-first;
fb.swiftwrites a linear framebuffer (works on ramfb/GOP where RAM is scanned directly) but virtio-gpu needs explicitTRANSFER_TO_HOST_2D+RESOURCE_FLUSH— so the screen stayed "Display output is not active". Wrote a polled virtio-gpu 2D driver over the H2 PCI transport;fb.swiftflushes per line + on a throttled timer. - The serial boot path runs
/bin/ttydemo(a blocking serial read) BEFOREswos-initstarts sshd. With no serial input, ttydemo blocked forever and sshd never started. Fix: a virtio-gpu-only console (server with a display, no keyboard) takes the "boot straight into swos-init/services" path, skipping the serial demos.
- SwiftOS is serial-first;
PCIe bridge recursion (
b7fa6d2,kernel/drivers/pci.swift) — the actual SSH blocker.virtioPciFindDevicescanned only bus 0. On the real VM only the GPU is on bus 0; the NIC/RNG/SCSI are behind PCIe root ports on non-zero buses. So the NIC was never found → boot log:net: no virtio-net device attached→sshd: socket failed. (The GPU console worked precisely because it is on bus 0.) Fix: enumerate recursively — for each PCI-to-PCI bridge (header type 1 / root port), read its secondary bus and descend, scanning all 8 functions of multifunction devices (the root ports are functions of device 2). UEFI firmware already programmed the bridge bus numbers and BAR windows; we just follow them.Persistent (supervised) sshd — a config, not a code change. The default
/etc/swos/servicestokensshdis single-shot:swos-initforks it once, it serves one session, exits, and is not restarted (thenswos-initexecsconsole-login). SwiftOS's TCP sends no RST on a closed port, so the second SSH connection just times out. Fix: build with thesshd-supervisedtoken →swos-initenters itswaitpidrestart loop, restarting sshd after each session (and skipping console-login, which is useless without a keyboard). Seefixtures/swos/services-supervised.
Still open / known
- Hetzner's noVNC keyboard is USB-HID (QEMU XHCI); SwiftOS only has virtio-input, so you cannot type at the console. Not needed (SSH is the access path); a USB stack is a large future driver.
- The regression test's full
/bin/idround-trip is best-effort: under loaded TCG with the NIC behind a bridge, the encrypted-auth read sometimes fails (sshd: encrypted auth read failed) for the hc5 fixture key. This does NOT happen on real KVM (full/bin/idworks there) — likely RX timing / IRQ routing for a bridge-behind NIC under TCG. Worth investigating.
How to build the deploy image
Built in a worktree that has a full build/ (toolchain: qemu-system-aarch64,
aarch64-elf-gcc, host swiftc, lld-link, AAVMF at
/opt/homebrew/share/qemu/edk2-aarch64-code.fd). Stage the SSH material once:
D=build/hetzner-deploy; mkdir -p $D
cp ~/.ssh/id_ed25519.pub $D/authorized_keys # the login key
build/sshkey seed --out $D/ssh_host_ed25519_seed # stable SwiftOS host key
build/sshkey known-host --host 138.199.222.99 --seed-file $D/ssh_host_ed25519_seed # -> known_hosts line
printf 'sshd-supervised\n' > $D/services # persistent sshd
Then build the kernel and the GPT disk. make build MUST be run explicitly
before make disk — make disk does NOT rebuild the kernel from a changed
kernel/ source (it restages a cached kernel.bin):
make build
rm -f build/swift-os.img build/esp/EFI/swift-os/kernel*.bin build/esp/EFI/swift-os/kernel-boot
make disk \
SSHD_HOST_SEED_FILE=$PWD/build/hetzner-deploy/ssh_host_ed25519_seed \
SSHD_AUTHORIZED_KEYS_FILE=$PWD/build/hetzner-deploy/authorized_keys \
SWOS_SERVICES_FILE=$PWD/build/hetzner-deploy/services
# → build/swift-os.img (≈96 MiB GPT: ESP + BOOTAA64.EFI + kernelA/B + base.img)
Note: base.img is signed with this worktree's own image-signing seed, so it
must be paired with this worktree's kernel.elf. Do not mix images across
worktrees.
Verify locally before flashing
build/hetzner-deploy/verify-topo.sh (and verify-gpu.sh, verify.sh) boot the
exact GPT image under the real Hetzner topology — devices behind
pcie-root-port, GPU on bus 0, GICv3, -smp 2, serial OUTPUT-ONLY — and run
OpenSSH against it. Always boot a copy of the image: a RW boot mutates the
A/B boot-attempt counters in the ESP. The committed gate is make hetzner-deploy-test.
How to flash a server (live-dd)
The VM has no out-of-band disk access for us; we flash from a running Linux on the same box. Cycle:
- Operator restores Linux access via the Hetzner Cloud Console: reinstall a
fresh Ubuntu with their
id_ed25519selected, or enable the Rescue System with their SSH key selected (rescue without a key only offers a generated root password). This is required every time, because once SwiftOS is on the disk the box is no longer Linux-reachable (and its console keyboard doesn't work — USB-HID). - Copy + checksum:
scp -P 22 build/swift-os.img root@138.199.222.99:/root/swift-os.img ssh -p 22 root@138.199.222.99 sha256sum /root/swift-os.img # compare to local shasum -a 256 - Flash + reboot (overwrites
/dev/sdalive, then forces an immediate reboot before the running system writes more; the disk is fully replaced so old-FS corruption is irrelevant):The SSH connection drops on reboot (expected).ssh -p 22 -o ServerAliveInterval=5 -o ServerAliveCountMax=2 root@138.199.222.99 \ 'dd if=/root/swift-os.img of=/dev/sda bs=4M conv=fsync status=none && sync && \ echo DD-OK-REBOOTING && (reboot -nf || echo b > /proc/sysrq-trigger)' - Watch the Hetzner noVNC console — the SwiftOS boot log now renders there
(virtio-gpu). Look for
net-dhcp OK: lease <ip>. - Verify SSH (the SwiftOS host key differs from the Linux one — pin it):
KH=$(mktemp); echo '138.199.222.99 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEaFnBj5Su/fhH1HqYa7Ri/8HECFadpoJWBv55FW5weO' >$KH ssh -p 22 -o StrictHostKeyChecking=yes -o UserKnownHostsFile=$KH -o IdentitiesOnly=yes \ -i ~/.ssh/id_ed25519 root@138.199.222.99 /bin/id # → principal=1(root) ssh ... root@138.199.222.99 /bin/netinfo # confirm DHCP IP/gateway
Safety: the server is wipeable, but dd is irreversible and the only recovery
channel is the Hetzner Console — confirm the operator is at the console before
flashing.
Debugging tips
- The Hetzner noVNC console is the diagnostic channel now (it shows the virtio-gpu boot log). Prefer reading it (screenshot) over hammering the server with SSH probe loops.
- Boot log markers worth knowing:
M9 OK: hardware discovered from ACPI,M2 GIC: GICv3,net-dhcp OK: lease …,swos-init: supervision active,sshd: listening on 22,sshd: authorized key matched. Bad signs:net: no virtio-net device attached,sshd: socket failed,panic: unexpected EL1 exception … FAR_EL1=0x80_…(a PCI MMIO map gap).
Follow-ups
- Interactive SSH shell / PTY (sshd is bounded-exec only).
- USB-HID keyboard so the noVNC console is usable.
- Investigate the bridge-behind-NIC RX timing under TCG (best-effort in the gate).
- Then: host the website (nginx/Node) on the live server — see the Hetzner
section of
docs/DEPLOYMENT_GUIDE.md.