NOTES
Engineering log: accepted decisions, hardware constants, exact build/run commands, and tool versions. Newest notes at the top of each section.
USB1 xHCI controller bring-up + device detection (2026-06-17)
- First step toward a real USB keyboard (today's keyboard is virtio-input). USB
needs an xHCI controller on PCIe (
-device qemu-xhci, same controller class on the Hetzner VM / real hardware). Staged: this milestone is controller bring-up- port detection only — no enumeration, no transfers, no HID yet.
- New
kernel/drivers/usb_xhci.swiftis platform-agnostic:xhciInit(bar0)resets the controller, programs CONFIG.MaxSlotsEn, sets up the minimum DMA structures the spec requires to legally run it (DCBAA + scratchpad if the controller demands it, command ring with a Toggle-Cycle Link TRB, and a one-segment event ring on interrupter 0), starts it (USBCMD.R/S, waits HCH=0), then powers each root-hub port and reports any with CCS set. DMA structures are PMM pages — identity-mapped, so PA==VA==the bus address QEMU's PCIe host forwards 1:1, exactly as the virtio-pci drivers rely on. Cache maintenance mirrors virtio_input.swift (no-op under TCG, real on hardware). - Reused the existing H2 PCIe layer rather than adding a second one:
kernel/drivers/pci.swiftgained a genericpciFindByClass(class,subclass, progIf)(alongsidevirtioPciFindDevice) that scans bus 0 + bridges, calls the existingpciAssignBars(which assigns BAR0 in the low 32-bit MMIO window already device-mapped, and enables MEM + Bus Master), and returns BAR0 + INTx pin. No MMU change was needed: the high ECAM (0x40_1000_0000) is already mapped by the H2 work (40-bit IPS) and BAR0 lands in the low window. - Quirk: QEMU's xHCI capability MMIO region rejects sub-word reads — a 16-bit read of HCIVERSION (offset 0x02) returns 0. Read it from the upper half of the CAPLENGTH dword (32-bit access) instead. Byte reads of CAPLENGTH do work.
usbProbe()runs inkernel_mainjust beforettyInit; it is a logged no-op when no controller is present, so every existing boot/test path is unaffected.
Acceptance. make usb-xhci-test boots with -device qemu-xhci -device usb-kbd and asserts the guest logs USB1: xHCI 0x0100 at 0x10008000 slots 64 ports 8, USB1: device connected on xHCI port 5 speed 3 (high-speed), and
USB1 OK: xHCI up, 1 device(s) connected — proving real xHCI registers were
driven, not a kernel literal. Next milestones: enable-slot / address-device USB
enumeration, then the HID boot-protocol interrupt endpoint feeding ttyOnInput.
HC36 PTY job-control SIGINT (2026-06-15)
- Made Ctrl-C work for PTY sessions by giving signals a per-process target.
Until now the signal subsystem tracked a single global
pendingMaskfor "the console foreground process"; that cannot address a shell behind a PTY. HC36 moves pending signals into per-process state (pPendingSignals[maxProc]inkernel/user/process.swift, reset on slot alloc / fork / exec / thread / reap) and reworkskernel/signal/signal.swiftaround it:signalRaise(sig)now targets the running process (still correct for the UART-IRQ console path, where the foreground reader is current), and newsignalRaiseSlot(slot, sig)marks a signal pending for an arbitrary slot. Ignored signals are dropped at raise time, so any pending bit is deliverable. Dispositions/restorers stay process-global for now — correct while only one interactive process installs handlers at a time; per-process dispositions (with fork-inherit / exec-reset semantics) are the obvious follow-up. - PTYs gained a foreground target.
PtyStatecarriesfgPid(0 = none), set via the newpty_set_foreground(fd, pid)syscall (SYS_PTY_SET_FOREGROUND= 85,swiftos_pty_set_foregroundbridge) on either end of the pair — TIOCSPGRP-shaped but pid-scoped, as we do not yet model process groups.ptyInputnow honorsISIG: on Ctrl-C (0x03) it echoes^C\r\n(whenECHO), flushes the partial canonical line, and raisesSIGINTtofgPidinstead of pushing the byte as data. The default PTYlflagnow includesISIG. Ctrl-\ (SIGQUIT) and Ctrl-Z (SIGTSTP, which needs stop semantics) are still unhandled. - Delivery to a blocked reader: PTY reads block by busy-yielding
(
processYieldForIO) rather than truly parking, so the master/slave read loops inkernel/vfs/vfs.swiftnow also break withEINTR(-4) when a signal is pending for the current process. The reader returns to the syscall dispatcher, wheresignalDeliverToCurrentFrameeither terminates it (default SIGINT -> status 130) or installs the handler frame (handler runs,read()returns EINTR).processKill's remote hard-teardown is unchanged and still used by the generickill(2)path; the PTY path deliberately usessignalRaiseSlotso the target delivers in its own context and can run a custom handler. - Test:
/bin/ptysigprobe(C/newlib — it needs newlib's working sigaction + sigreturn trampoline, which the Swift userland bridge lacks) with./tests/ptysig_test.shandmake ptysig-test. It allocates a PTY, forks a child that adopts the slave as stdin and is named the PTY foreground, writes Ctrl-C to the master, and asserts both the default-terminate case (WTERMSIG == SIGINT, plus the^Cecho on the master) and the installed-handler case (handler ran, child exits 42). AddedpPendingSignalstodocs/SMP_STATE_AUDIT.mdand removed the now-gonependingMask.
Acceptance. make ptysig-test boots, logs in, runs /bin/ptysigprobe, and
asserts ptysigprobe: default terminate OK, ptysigprobe: handler delivered OK,
and PTYSIGPROBE-OK. The kernel mechanism is in place; the natural follow-up is
wiring /bin/sshd (HC35) to call pty_set_foreground for the shell it spawns so
a Ctrl-C typed in the remote session interrupts the running command end to end.
HC35 sshd interactive PTY session (2026-06-15)
/bin/sshdnow serves a real interactive login shell. It accepts thepty-reqandshellchannel requests, allocates a PTY pair viaopenpty()(HC34), forks the shell onto the slave end (/bin/busyboxassh, via the newswiftos_pty_spawn_shellhelper, which dup2's the slave to stdio and drops all other fds), and relays bytes between the SSH channel and the PTY master in a singlepollloop. Keeping the relay in one thread means the chacha20 send/sequence state stays single-owner (no locking). The loop honors the client's channel window for shell output and replenishes our receive window (SSH_MSG_CHANNEL_WINDOW_ADJUST) for keystrokes. On master EOF (shell exit) or channel close it reaps the shell withwaitpidand sendsexit-status+SSH_MSG_CHANNEL_CLOSE. No new kernel surface: built entirely on existingfork/execve/waitpidplus HC34's PTY.- busybox is built without
FEATURE_EDITING, soashreads the tty in plain canonical mode — a perfect match for the PTY's canonical+echo line discipline. Caveat (carried from HC34): Ctrl-C is a data byte, not a SIGINT to the shell — job control is a separate milestone.tcsetattrstill targets the console tty, not the PTY, and window size is fixed; neither matters for a canonical shell. - Added
swiftos_pty_spawn_shellandswiftos_waitpidbridges. New gate./tests/sshd_interactive_test.sh/make sshd-interactive-testdrives a real OpenSSHssh -ttsession: requests a pty, runsecho hc35''ok(the contiguous marker only appears in the command's output, not the echoed command line), asserts it round-trips through the PTY, and that the guest logs a cleaninteractive shell completed status 0.
Acceptance. make sshd-interactive-test proves a host OpenSSH client gets an
interactive shell over SSH — pty allocation, line-discipline echo, command
execution, output relay, and a clean exit.
HC34 PTY kernel object (2026-06-15)
- Added pseudo-terminals as a first-class kernel object (
kernel/tty/pty.swift) plus theopenpty(master*, slave*)syscall (SYS_OPENPTY= 84). A PTY is a bidirectional conduit with a per-instance line discipline on the master->slave path: master writes are cooked (canonical line assembly, echo, backspace) into a slave-readable ring; the editor's echo and the slave's output (with ONLCR, LF->CRLF) flow into a master-readable ring. Modeled on the existing pipe/socketpair objects: twoHandleKindcases (.ptyMaster/.ptySlave), a per-end reference count released at description teardown, and blocking read/write withprocessYieldForIO. Both endsfstatasS_IFCHRand participate inpoll. The whole object runs under the VFS lock from syscall context (never IRQ), so unlike the console line discipline intty.swiftit needs no IRQ-reentrancy care; it keeps its own buffers and editor state. - Out of scope here (kept for a later milestone): job control. The kernel's
signal machinery targets only the single console foreground process, so a PTY
carries Ctrl-C as a data byte rather than raising SIGINT to a foreground
process group. Per-fd termios (
tcsetattrstill drives the console tty) andTIOCSWINSZare likewise deferred; the ioctl stub reports a fixed 24x80. - Added the
swiftos_openptybridge and/bin/ptyprobe, a native Swift self-test, with./tests/pty_test.shandmake pty-test. The probe allocates a pair and asserts canonical line assembly (master write -> slave read), echo back to the master, ONLCR on slave output, backspace editing, and master-close EOF. Documented theptystable indocs/SMP_STATE_AUDIT.md.
Acceptance. make pty-test boots, logs into the root shell, runs
/bin/ptyprobe, and asserts every line-discipline marker plus PTYPROBE-OK.
The PTY object is the substrate for the HC35 interactive SSH session.
HC33 SFTP write path (2026-06-15)
- The SFTP subsystem now implements the write surface against the RAM tmpfs:
OPENwith write intent (SSH_FXF_WRITE/APPEND/CREAT/TRUNCmapped to the kernelO_*ABI),WRITE(handle + absolute offset),MKDIR,RMDIR,REMOVE,RENAME, andSETSTAT/FSETSTAT(permissions viachmod, size via the newswiftos_ftruncatebridge). The read-only base is honored by the kernel: write attempts outside tmpfs returnEROFS/EACCES/EPERM, which map toSSH_FX_PERMISSION_DENIEDand surface to the client as "Permission denied". Only symlink/readlink remainSSH_FX_OP_UNSUPPORTED(no symlinks in swift-os). - Transport sizing: the SSH transport reads at most
maxPacketLen(8192) bytes per packet, so a 32 KiB SFTP write would overflow the reassembly buffer. The session channel now advertises a 4096-byte max packet, and the SFTP server advertises thelimits@openssh.comextension (max-write 3072, max-read 4096) so the OpenSSH client bounds eachWRITEto a single channel frame. Uploads larger than one write therefore arrive as a sequence of bounded, offset-keyedWRITEs that we stream straight to the file. - Added the
swiftos_ftruncatesyscall bridge (SYS_FTRUNCATE). - Added
./tests/sshd_sftp_write_test.shandmake sshd-sftp-write-test. The gate drives a real OpenSSHsftpbatch:mkdir /tmp/hc33,puta 10 000-byte payload (several bounded WRITE frames), byte-exact round-tripget,rename, re-getthe renamed file,rm, andrmdir. A second session proves aputonto the read-only/readme.txtis denied.
Acceptance. make sshd-sftp-write-test proves a real host sftp client can
create directories, upload multi-chunk files with byte-exact integrity, rename
and delete them on the SwiftOS tmpfs, while the read-only base rejects writes.
HC32 SFTP subsystem read-only browse (2026-06-15)
/bin/sshdnow answers thesubsystem sftpchannel request and speaks SFTP protocol v3 (the OpenSSH baseline) over the existing chacha20-poly1305 session channel. This first stage covers the read-only browse surface needed by the hostsftp/scp -Oclients:SSH_FXP_INIT/VERSION,REALPATH(with a byte-level./../slash path canonicalizer rooted at the server cwd),STAT/LSTAT/FSTAT,OPENDIR/READDIR/CLOSE, andOPEN(read)/READ/CLOSE. Write operations (WRITE,MKDIR,RMDIR,REMOVE,RENAME,SETSTAT, symlinks) answerSSH_FX_OP_UNSUPPORTED— the write path is HC33.- Handles are a fixed 8-slot table keyed by a 4-byte index; per-READ absolute
offsets are honored via the new
swiftos_lseekbridge (SYS_LSEEK).DATAreplies are bounded to 4096 bytes perREADandREADDIRdrains the directory in boundedgetdentsbatches, matching the exec path's bounded-output philosophy. The OpenSSH client re-requests short reads, so large files reassemble correctly across many chunks. Inbound channel data is reassembled into complete SFTP packets and our receive window is replenished withSSH_MSG_CHANNEL_WINDOW_ADJUST. Very large downloads that would outrun the client's send window to us are still out of scope. - Added
SSHWriter.u64/SSHReader.u64for SFTP 64-bit sizes/offsets. - Added
./tests/sshd_sftp_test.shandmake sshd-sftp-test. The gate boots the autostarted SSHD, pins the host key through a derivedknown_hostsentry, authenticates with the staged Ed25519 key, requests thesftpsubsystem, and drives a real OpenSSHsftpbatch:pwd(REALPATH →/),ls /, and three downloads byte-compared on the host —/readme.txt,/etc/passwd, and the ~118 KiB/bin/sshd(byte-identical tobuild/sshd.elf, proving multi-chunk reassembly). The guest logssshd: sftp subsystem startedandsshd: sftp subsystem completed.
Acceptance. make sshd-sftp-test proves a real host sftp client can
browse and download files from SwiftOS over the host-key-pinned, key-authed
channel, with byte-exact transfers including a multi-chunk binary.
HC31 Hetzner deploy evidence bundle preflight (2026-06-12)
- Extended
tests/sshd_deploy_preflight_test.shwith optionalSSHD_DEPLOY_EVIDENCE_DIR=PATHevidence capture. A passing run now can emit a handoff bundle with the manifest, git state, artifact hashes/sizes, serial log, static IPv6 config, service manifest, publicauthorized_keys, and publicknown_hostsmaterial when host OpenSSH verification was driven. - The bundle deliberately omits private deploy material: the SSHD host-key
seed, KEX seed, and deploy login private key.
secrets-omitted.txtrecords that boundary so the preflight can be shared for review without copying per-instance secrets. - Added
tests/hetzner_deploy_bundle_test.shandmake hetzner-deploy-bundle-test. The focused gate runs the real static-IPv6 SSHD deploy preflight with evidence capture enabled, then asserts the bundle contains the reproducible public deploy records and the guestnetinfo --check --require-static6success marker.
Acceptance. make hetzner-deploy-bundle-test boots the temporary
Hetzner-style SSHD/static-IPv6 candidate under QEMU, proves the deploy preflight
passes, and verifies the generated evidence bundle is complete enough for
handoff while excluding private seeds.
HC30 netinfo deploy check mode (2026-06-12)
- Added
/bin/netinfo --check, which keeps the normal status transcript and exits nonzero when the guest network is not deploy-ready: link not ready, missing IPv4, missing IPv4 prefix, missing gateway, or missing DNS. - Added
/bin/netinfo --require-static6, which implies--checkand also fails unless the guest has staged static IPv6, prefix/64, and an IPv6 gateway. This turns the HC23/HC28 Hetzner-style IPv6 image state into a target-side deploy gate rather than only a readable transcript. - Hardened
tests/netinfo_test.shto run/bin/netinfo --checkafter the normal status print, and hardenedtests/sshd_deploy_preflight_test.shto run/bin/netinfo --check --require-static6in the static-IPv6 deploy image and over SSHD when IPv6 hostfwd is available.
Acceptance. make netinfo-test proves the default slirp profile passes
--check; make sshd-deploy-preflight-test proves a Hetzner-style static IPv6
candidate passes --check --require-static6.
HC29 SSHD IPv6 supervision preflight (2026-06-12)
- Added
sshd6-supervisedandsshd6-once/bin/swos-initservice tokens.sshd6-supervisedkeeps/bin/sshd -6under the existing restart loop for operator-style manifests;sshd6-oncecombines IPv6 listener mode with the one-session test marker so the restart path is deterministic. - Added
./tests/sshd_ipv6_supervision_test.shandmake sshd-ipv6-supervision-test. The test builds a temporary base image withsshd6-once, boots QEMU withipv6=on, and requiresswos-initto restart the AF_INET6 listener. On hosts where QEMU IPv6 hostfwd works, it also uses host OpenSSH through::1to run/bin/id, force the first daemon exit, require the restart, then run/bin/echo HC29-V6-RESTART. - The serial log must show two AF_INET6 listener cycles:
sshd: listening on 22 (IPv6 session exec preflight), two once-mode cycles, and theswos-init: service sshd6-once ...; restartingmarker.
Acceptance. make sshd-ipv6-supervision-test proves that the service
manifest can keep the IPv6 SSHD listener restartable. On hosts with IPv6
hostfwd, it also proves OpenSSH pins the SwiftOS host key and completes commands
before and after restart.
HC28 SSHD static-IPv6 deploy preflight (2026-06-12)
- Added
./tests/sshd_deploy_preflight_test.shandmake sshd-deploy-preflight-test. The gate builds a temporary signed base image with a Hetzner-style/etc/swos/net-ipv6, deploy-specific/etc/ssh/ssh_host_ed25519_seed,/etc/ssh/ssh_kex_seed, and/etc/ssh/authorized_keys, plus an/etc/swos/servicesmanifest that startssshd6. - The test verifies the staged image files before boot, then boots QEMU with
ipv6=on, virtio-net, and virtio-rng. The guest must apply the static2001:db8:0:3df1::1/64config with gatewayfe80::1, report runtime entropy readiness, autostart/bin/sshd -6, reach the serial login, and print the same static IPv6/gateway state through/bin/netinfo. - On hosts where QEMU IPv6 host forwarding works, the same gate also drives a
real OpenSSH IPv6 remote exec through
::1and requires/bin/sshdto load the deploy host-key seed, KEX seed, runtime entropy, and authorized key while running remote/bin/netinfo. - This is a local deploy-candidate gate. A real provider-routed Hetzner IPv6 SSH run remains the cloud acceptance step.
Acceptance. make sshd-deploy-preflight-test proves that a single deploy
candidate image can carry static cloud IPv6 config, SSHD host/KEX/login
material, sshd6 autostart, virtio-rng runtime entropy, and guest-visible
/bin/netinfo status without crashing.
HC27 network status deploy preflight (2026-06-11)
- Added
SYS_NETINFO(83), a fixed 56-byte read-only network status snapshot gated bycapNet. It reports virtio-net readiness, IPv4 address, gateway, DNS, mask, DHCP/fallback source, IPv6 address, prefix, static/link-local source, and IPv6 gateway status. - Added native bridge accessors in
swift_user.{h,c}and/bin/netinfo, which prints a stable deploy-preflight transcript from inside the guest. - Added
./tests/netinfo_test.shandmake netinfo-test; the focused gate boots QEMU with virtio-net/slirp, logs in asroot, runs/bin/netinfo, and asserts the in-guest network status lines. - This is observability for deploy readiness, not a routing/firewall/config control plane.
Acceptance. make netinfo-test proves that the base image contains
/bin/netinfo and that the guest reports ready virtio-net state, QEMU slirp
IPv4 10.0.2.15/24, gateway 10.0.2.2, DNS 10.0.2.3, IPv6 prefix status,
and the netinfo: HC27 OK marker.
HC26 SSH client runtime entropy preflight (2026-06-11)
/bin/sshnow usesSYS_RANDOMfor its SSH_MSG_KEXINIT cookie and Curve25519 client ephemeral scalar when the VM exposes virtio-rng. Without virtio-rng it keeps the existing development fallback so non-rng QEMU profiles remain reproducible.- Hardened
./tests/ssh_transport_test.shwith optional QEMU extra arguments and runtime-entropy assertions, then added./tests/ssh_runtime_entropy_test.shandmake ssh-runtime-entropy-test. The new focused gate reuses the full outbound OpenSSH transport proof with a QEMUvirtio-rng-device.
Acceptance. make ssh-runtime-entropy-test proves that the guest brings up
virtio-rng, /bin/ssh consumes SYS_RANDOM runtime entropy for KEX, rejects
an untrusted host key, pins the trusted OpenSSH host key through
/etc/ssh/known_hosts, completes strict-KEX
curve25519-sha256/ssh-ed25519/chacha20-poly1305@openssh.com, and finishes
the encrypted ssh-userauth service-request preauth exchange.
HC25 SSHD runtime entropy preflight (2026-06-11)
- Added a minimal modern virtio-rng MMIO driver for QEMU/cloud VM entropy
devices. It scans the device-tree-discovered virtio-mmio window for device id
4, negotiates
VIRTIO_F_VERSION_1, and serves small synchronous reads from a polled request virtqueue. - Added syscall
SYS_RANDOM(80) and theswiftos_randomuserland bridge. The bridge feedsarc4random_bufwhen a runtime source is attached, while keeping the deterministic fallback for test profiles without virtio-rng. /bin/sshdnow usesSYS_RANDOMfor its SSH_MSG_KEXINIT cookie and Curve25519 server ephemeral scalar, mixing the optional image-time/etc/ssh/ssh_kex_seedwhen present. With runtime entropy it logssshd: loaded runtime entropy from SYS_RANDOMand marks the KEX contextseeded runtime; without virtio-rng it keeps the existing development fallback.- Added
./tests/sshd_runtime_entropy_test.shandmake sshd-runtime-entropy-test. The test reuses the full host OpenSSH remote-command acceptance path with a QEMUvirtio-rng-device.
Acceptance. make sshd-runtime-entropy-test proves that the guest brings up
virtio-rng, /bin/sshd consumes SYS_RANDOM runtime entropy for KEX, host
OpenSSH still pins the SwiftOS host key through known_hosts, and authenticated
remote /bin/id, /bin/echo, quoted argv, stdin-fed /bin/cat, and bounded
long-output exec all complete.
HC24 SSHD IPv6 listener preflight (2026-06-11)
- Added
-6/--ipv6mode to/bin/sshd, selecting the existing AF_INET6 passive TCP socket path while keeping the default IPv4 listener unchanged. - Added the
sshd6/bin/swos-initservice token so custom base images can autostart/bin/sshd -6from/etc/swos/serviceswithout adding argument parsing to the tiny service manifest format. - Added
./tests/sshd_ipv6_listener_test.shandmake sshd-ipv6-listener-test. The test builds a temporary signed base image withsshd6, boots QEMU withipv6=on, and requires the IPv6 listener marker. On QEMU builds with IPv6 hostfwd, it also drives a host OpenSSH remote exec through::1. - This is an AF_INET6 listener deploy preflight. Provider-routed SSHD-over-IPv6 on a real cloud network remains a separate acceptance run.
Acceptance. make sshd-ipv6-listener-test proves that swos-init starts
sshd6, that /bin/sshd -6 binds TCP/22 as an IPv6 listener under QEMU
ipv6=on, and that the boot continues to the serial login without a crash.
HC23 Hetzner static IPv6 config preflight (2026-06-11)
- Added a boot-time
/etc/swos/net-ipv6parser for static cloud IPv6 configuration. The accepted format is intentionally narrow:address=<ipv6>/64plusgateway=<link-local-ipv6>, with comments and whitespace allowed. - Added
NET_IPV6_CONFIG_FILE=PATHbase-image staging so deploy candidates can bake provider-assigned Primary IPv6 material into the signed image. - Added an IPv6 text/CIDR parser and
/64route-target helper. Outbound IPv6 UDP now resolves the configured gateway via NDP for off-/64 destinations while preserving direct resolution for same-/64, link-local, and multicast targets. - This is a Hetzner deploy preflight, not cloud metadata ingestion and not yet SSHD-over-IPv6 acceptance. Missing config keeps the existing link-local behavior; invalid config logs a serial warning and fails closed to link-local.
Acceptance. make net-static-ipv6-test builds a temporary signed base image
with Hetzner-style static IPv6 config, boots it under QEMU virtio-net with IPv6
enabled, and requires the net-hc23 OK serial marker proving the kernel applied
the staged /64 address and link-local gateway.
HC22 SSHD KEX seed preflight (2026-06-11)
- Added a daemon-local SSHD KEX session counter and mixed it into the SSH_MSG_KEXINIT cookie plus the Curve25519 server ephemeral scalar so consecutive connections to the same daemon no longer reuse the same time/PID/stack-derived context.
- Added optional
/etc/ssh/ssh_kex_seedloading.make base-imagestages a deploy-specific hex-encoded 32-byte seed whenSSHD_KEX_SEED_FILE=PATHis supplied. Invalid seed files fail closed; missing files keep the development image behavior. - This is deploy-image hardening, not a full entropy subsystem. A real runtime entropy source remains required before treating SSHD KEX randomness as production complete.
- Added
./tests/sshd_kex_seed_test.shandmake sshd-kex-seed-test, which build a temporary base image with a generated KEX seed and reuse the host OpenSSH session/exec acceptance path.
Acceptance. make sshd-kex-seed-test proves that /bin/sshd loads
/etc/ssh/ssh_kex_seed, marks the KEX context as seeded, completes pinned
OpenSSH transport/auth/session setup, and executes the bounded remote commands.
./tests/sshd_transport_test.sh also now requires distinct logged KEX session
contexts across multiple host OpenSSH connections.
HC21 SSHD authorized_keys options preflight (2026-06-11)
- Hardened
/bin/sshdauthorized-key matching so key options are no longer silently ignored. A line whose first field is notssh-ed25519must now carry only the supported safe restriction options before the key:restrict,no-pty,no-port-forwarding,no-agent-forwarding, andno-X11-forwarding. - Unsupported or not-yet-enforced options such as
command=,from=,environment=,permitopen=, and unknown options fail closed for that line. This prevents deploy images from accidentally granting broader access than an operator intended while shell/PTY/forwarding policy is still incomplete. - Extended
./tests/sshd_authorized_keys_test.shso the custom deploy key is staged with safe restriction options and the denied fixture key is staged with an unsupported forced-command option that must not authenticate.
Acceptance. make sshd-authorized-keys-test proves that the safe restricted
deploy key authenticates through host OpenSSH, while the unsupported
forced-command fixture key is rejected.
HC20 SSHD package-tool exec preflight (2026-06-11)
- Extended
/bin/sshd's bounded direct remote-exec allowlist from single-component/bin/<tool>paths to single-component/bin/<tool>and/usr/bin/<tool>paths. Nested paths, NUL bytes, shell syntax, redirects, globbing, PTY, scp, and sftp remain outside this preflight. - This lets deploy candidates run package-installed operational tools from the read-only package overlay over an authenticated SSHD session without widening the boundary to a shell.
- Added
./tests/sshd_usr_bin_exec_test.shandmake sshd-usr-bin-exec-test, which boot QEMU with the base image plus thepkghellopayload overlay, pin the SwiftOS host key with host OpenSSH, and run/usr/bin/pkghellothrough/bin/sshd.
Acceptance. make sshd-usr-bin-exec-test proves that boot-autostarted SSHD
can authenticate the staged root key, execute a package-overlay /usr/bin
tool, and return its stdout over the pinned OpenSSH remote-exec path.
HC19 IPv4 route-target preflight (2026-06-11)
- Added a pure
ipv4RouteTargethelper for outbound IPv4 next-hop selection. Same-subnet destinations now resolve the destination MAC directly; off-link destinations resolve the configured gateway MAC. - Wired UDP and TCP active-open socket paths through that helper instead of probing the destination cache first and otherwise always ARPing the gateway. This keeps the existing QEMU/slirp behavior while making direct-on-subnet cloud peers reachable when the DHCP subnet says they are on-link.
- Kept
/32cloud addressing explicit: a non-self destination under a255.255.255.255mask routes via the gateway, matching Hetzner-style static examples with a point-to-point gateway.
Acceptance. tests/net_test.swift covers same-subnet, off-link, and /32
route-target decisions, while the live virtio-net and TCP connect smokes prove
the QEMU/slirp gateway path still works.
HC18 SSHD quoted argv preflight (2026-06-11)
- Replaced
/bin/sshd's raw ASCII-whitespace remote-exec splitter with a small direct-exec argv parser. It removes single and double quotes, supports backslash escaping, preserves empty quoted arguments, and still requires the executable path to be a single-component/bin/<tool>. - Kept the boundary deliberately below shell semantics: no expansion, globbing, redirects, pipelines, environment assignment, PTY, or shell startup. Those bytes are either ordinary argv bytes or remain unsupported future login work.
- Hardened
./tests/sshd_transport_test.shwith a host OpenSSH command that sends quoted words, a single-quoted phrase, a backslash escape, and an empty argument through/bin/echo, requiring exact stdout.
Acceptance. make sshd-transport-test proves that authenticated host
OpenSSH remote exec now preserves quoted argv grouping while retaining SSHD
host-key pinning, denied-key rejection, stdin forwarding, bounded long output,
and exit-status reporting.
HC17 TCP write backpressure preflight (2026-06-11)
- Added TCP send-space readiness helpers so socket poll/write paths can observe
whether an established or CLOSE_WAIT connection can accept more queued bytes.
socketPollWritablenow reports TCP writable only when the connection has free send-buffer space instead of treating all socket fds as unconditionally writable. - Updated VFS TCP socket writes to pump the network, queue as much as TCP will
accept, and block until ACKs reopen send space for blocking fds. Nonblocking
TCP writes now return
EAGAINonly when no bytes were queued and the connection is still open; a closed write side reportsEPIPEwhen nothing was written. - Raised
/bin/sshdbounded exec output from the HC16 temporary 1536-byte cap to 4096 bytes. The OpenSSH transport acceptance still runs/bin/cat /models/tok512.bin, now requiring the full 4096-byte bounded reply plus the serial truncation marker.
Acceptance. make sshd-transport-test proves that SSHD can return a
4096-byte bounded remote-exec response over a normal OpenSSH session, with TCP
write backpressure handling ACK-driven send-buffer refill instead of requiring a
single send-buffer-sized channel-data packet.
HC16 SSHD bounded output capture preflight (2026-06-11)
- Moved
/bin/sshdremote exec stdout/stderr capture off the old pipe and into a temporary tmpfs file at/tmp/swos-sshd-output. The daemon now runs the child synchronously, then reads back at most 1536 bytes for the SSH channel response. This avoids child-side pipe backpressure while keeping the current bounded preflight behavior and stays within the current TCP send-buffer limits for one SSH channel-data response plus close/status control packets. - Added deterministic serial markers
sshd: exec output bytes Nandsshd: exec output truncatedso deploy runs can distinguish short output from capped output. - Hardened
./tests/sshd_transport_test.shwith a long-output remote command: host OpenSSH runs/bin/cat /models/tok512.bin, expects exactly 1536 bytes back, and requires the truncation marker. The test still covers denied keys, host-key pinning,/bin/id,/bin/echo, and stdin-fed/bin/cat.
Acceptance. make sshd-transport-test proves that boot-autostarted SSHD
can execute a command that writes more than the old pipe capacity, return a
bounded 1536-byte SSH channel response, and log truncation without wedging the
session.
EL0 FP/SIMD trap-frame hardening (2026-06-11)
- Extended the lower-EL trap frame in
kernel/arch/aarch64/exceptions.Sfrom GPR-only state to include q0..q31 plus FPCR/FPSR. Preemptive scheduling can now switch away from an FP-heavy EL0 process and run another Swift process without corrupting the interrupted process's floating-point temporaries. - Updated
fork()trap-frame cloning to copy the full 800-byte frame. The existing x0/SP_EL0/ELR_EL1/SPSR_EL1 word offsets are unchanged; the FP/SIMD payload is appended after the original return state. - Added a saturating Q8 activation conversion guard in
userland/lib/llama2.swiftso rare numerical edge values cannot lower to an EL0BRK #1. The hostllm_q8_engine_testnow checks the edge conversion while still pinning both q8 model goldens byte-for-byte torunq.c.
Acceptance. The default base image, which autostarts /bin/sshd, now passes
./tests/llm_serve_test.sh with the pinned stories15M-q8 reference output.
./tests/cow_test.sh, ./tests/spawn_self_exec_test.sh, and
./tests/boot_test.sh cover fork-return and baseline boot behavior with the
larger trap frame.
HC15 SSHD bounded stdin exec preflight (2026-06-11)
- Extended
/bin/sshdsession exec handling to read post-execSSH_MSG_CHANNEL_DATApackets until channel EOF and forward up to 512 bytes into the spawned command's fd 0 through a pipe. The server now wires fd 0, fd 1, and fd 2 explicitly throughspawn_handles, preserving the current capability-scoped direct/bin/<tool>launcher. - Added the deterministic marker
sshd: exec stdin bytes Nwhen remote stdin is forwarded. Oversized stdin fails closed withsshd: exec stdin too large. - Hardened
./tests/sshd_transport_test.shso the default SSHD acceptance now feedsHC15-STDIN\nline-two\nthrough host OpenSSH into remote/bin/catand requires exact stdout round-trip, in addition to the existing denied-key, host-key pinning,/bin/id, and/bin/echochecks.
Acceptance. make sshd-transport-test proves that boot-autostarted SSHD
still pins its host key, rejects a stale key, authenticates the staged key, runs
remote /bin/id and /bin/echo, and forwards bounded remote stdin into
/bin/cat with exact output over OpenSSH.
HC14 SSHD opt-in restart supervision preflight (2026-06-11)
- Extended
/bin/swos-initwith opt-in supervised service tokens while keeping the default/etc/swos/servicestokensshdbehavior unchanged. Plainsshdstill starts the daemon and then hands the serial console to/bin/console-login;sshd-supervisedandsshd-oncekeepswos-initalive as a tinywaitpid()restart loop for deploy preflights. - Added deterministic supervisor markers:
swos-init: supervision activeandswos-init: service sshd-once pid ... exited status ...; restarting./bin/sshdalso supports one-shot mode through--once/-1and through the/tmp/swos-sshd-oncescratch marker created by thesshd-onceservice token. - Added
SWOS_SERVICES_FILE=PATHfor custom base-image staging of/etc/swos/services, so tests and deploy candidates can select a service manifest without editing the checked-in default base tree. - Added
./tests/sshd_supervision_test.shandmake sshd-supervision-test. The test builds a temporary base image withsshd-once, boots QEMU with TCP/22 forwarded, runs a host OpenSSH/bin/idcommand, requiresswos-initto observe and restart the exited daemon, then runs a second host OpenSSH/bin/echo HC14-RESTARTcommand through the restarted SSHD.
Acceptance. make sshd-supervision-test proves that opt-in swos-init
supervision restarts sshd-once, that the restarted daemon listens again on
TCP/22, and that strict host OpenSSH pinning plus publickey auth still complete
before and after the restart.
HC13 SSHD deploy authorized_keys provisioning preflight (2026-06-11)
- Added the
SSHD_AUTHORIZED_KEYS_FILE=PATHbase-image staging override. A deploy build can now replace the checked-in SSHD development/etc/ssh/authorized_keyswith operator-provided public keys at image build time. - Parameterized
./tests/sshd_transport_test.shwithSSHD_ALLOW_KEY_SRCandSSHD_DENY_KEY_SRCso the same OpenSSH session proof can validate custom deploy key material instead of only the HC5 fixture. - Added
./tests/sshd_authorized_keys_test.shandmake sshd-authorized-keys-test. The test generates an ephemeral host Ed25519 keypair with OpenSSH, stages only its.pubfile into a temporary signed base image, and proves the private key authenticates while the default HC5 fixture key is rejected.
Acceptance. make sshd-authorized-keys-test generates a non-default SSHD
authorized key, builds a custom BASE_IMG with SSHD_AUTHORIZED_KEYS_FILE,
boots the image under QEMU with TCP/22 forwarded, pins the SwiftOS SSHD host key
through known_hosts, rejects the default HC5 fixture key, and runs /bin/id
plus /bin/echo using the generated deploy key.
HC12 SSHD deploy host-key rotation preflight (2026-06-11)
- Added
sshkey seed --out PATH [--force], which creates a fresh hex-encoded 32-byte Ed25519 seed in the same format loaded by/bin/sshd. - Added the
SSHD_HOST_SEED_FILE=PATHbase-image staging override. A deploy build can now generate a per-artifact SSHD host-key seed, stage it as/etc/ssh/ssh_host_ed25519_seed, and publish the matching OpenSSH public key or known_hosts line withbuild/sshkey. - Added
./tests/sshd_host_key_rotation_test.shandmake sshd-host-key-rotation-test. The test builds a temporary signed base image with a generated seed and reuses the OpenSSH strict-pinning SSHD session proof against the rotated host key.
Acceptance. make sshd-host-key-rotation-test generates a non-default SSHD
host-key seed, builds a custom BASE_IMG with SSHD_HOST_SEED_FILE, boots the
image under QEMU with TCP/22 forwarded, derives a temporary known_hosts entry
from the rotated seed, and requires host OpenSSH to authenticate the rotated
SwiftOS SSHD host key before running /bin/id and /bin/echo.
HC11 SSHD host-key pinning preflight (2026-06-11)
- Added
build/sshkey, a host-side helper that derives an OpenSSHssh-ed25519public key or known_hosts line from the same hex-encoded/etc/ssh/ssh_host_ed25519_seedmaterial that/bin/sshdloads in the guest. This gives operators a reproducible way to publish or pin the SwiftOS SSHD host key for a specific base image. - Hardened
./tests/sshd_transport_test.shso the host OpenSSH client now usesStrictHostKeyChecking=yesand a temporary known_hosts file generated bybuild/sshkey, instead of disabling host-key checking.
Acceptance. make sshd-transport-test builds build/sshkey, derives a
[127.0.0.1]:<port> known_hosts entry from
base/etc/ssh/ssh_host_ed25519_seed, and requires OpenSSH debug output to show
the SwiftOS Ed25519 host key is known and matched before publickey auth and
remote /bin/id plus /bin/echo execute.
HC10 SSH client known_hosts preflight (2026-06-11)
- Added a minimal file-backed trust store for
/bin/sshat/etc/ssh/known_hosts. The client now verifies the server's Ed25519 signature over the exchange hash and then requires the received host-key blob to match a trustedssh-ed25519entry for the target IP before proceeding to NEWKEYS. - The current parser supports simple known_hosts lines with a bare IPv4 host or
[IPv4]:portpattern, optional comma-separated host patterns, thessh-ed25519key type, and a base64 OpenSSH public-key blob. Missing files, oversized files, malformed matching entries, or host-key mismatches fail closed. - Added a dedicated host OpenSSH fixture key at
fixtures/ssh/ssh_client_host_ed25519(.pub)and staged its public key in the base image's/etc/ssh/known_hostsfor the QEMU slirp host alias10.0.2.2.
Acceptance. ./tests/ssh_transport_test.sh first starts host OpenSSH with
an untrusted Ed25519 host key and requires
ssh: known_hosts host key mismatch, then restarts host OpenSSH with the
trusted fixture key and requires both ssh: host key signature verified and
ssh: host key matched /etc/ssh/known_hosts before completing the encrypted
ssh-userauth service request/accept.
HC9 SSHD file-backed host key seed preflight (2026-06-11)
- Moved the SSHD Ed25519 host-key seed out of
/bin/sshdand into the signed base image at/etc/ssh/ssh_host_ed25519_seed. The daemon now loads exactly 32 bytes from a hex-encoded seed file before deriving the server host key. - The loader skips ASCII whitespace and
#comments, rejects malformed or wrong-length input, and fails closed if the seed file is missing or invalid. This keeps the current proof deterministic while making deploy artifacts carry explicit host-key material. - The checked-in seed remains development material for the QEMU preflight. A real cloud deployment still needs per-instance host-key provisioning or rotation plus real entropy. HC12 later added image-time host-key seed provisioning; runtime rotation and real entropy remain follow-up work.
Acceptance. ./tests/sshd_transport_test.sh now requires the guest log to
include sshd: loaded host key seed /etc/ssh/ssh_host_ed25519_seed before it
accepts the HC5 key and executes /bin/id plus /bin/echo through OpenSSH.
HC8 SSHD boot autostart preflight (2026-06-11)
- Added
/bin/swos-initas the first user process when present in the base image. It reads immutable/etc/swos/services, starts allowlisted services withfork/execve, and then replaces itself with/bin/console-login. - The default base image now includes
/etc/swos/serviceswithsshd, so the SSHD session/exec preflight binds TCP/22 during boot before the serial login prompt. - Hardened process entry-stack construction so
execvenever enters EL0 withSP_EL0at the unmapped one-past-stack address when argv packing yields an empty or malformed argument vector.
Acceptance. ./tests/sshd_transport_test.sh boots QEMU, waits for
swos-init: started sshd pid and sshd: listening on 22 (session exec preflight), then drives OpenSSH publickey auth and remote /bin/id plus
/bin/echo commands without manually launching /bin/sshd.
HC7 SSH client transport preflight (2026-06-11)
- Added
/bin/sshas a native Swift SSH client transport preflight. It opens an outbound TCP stream, sendsSSH-2.0-swift-os_ssh-transport, reads a normal OpenSSH server banner, sends client KEXINIT, completescurve25519-sha256, verifies the server'sssh-ed25519host-key signature over the exchange hash, handles OpenSSH strict-KEX sequence reset, deriveschacha20-poly1305@openssh.comkeys, and performs one encryptedssh-userauthservice request/accept exchange. - This is intentionally pre-auth only. It does not yet implement known_hosts trust policy, user publickey authentication, session/exec channels, PTY, scp/sftp, or interactive shell behavior. Randomness is still the development pseudo-random helper, so the client is not production-secure yet.
- The base image now stages
/bin/ssh, andmake ssh-transport-teststarts a temporary host OpenSSHsshdwith a generated Ed25519 host key and restricted modern algorithms, boots QEMU with a slirp NIC, logs in asroot, and runs/bin/ssh 10.0.2.2 <port>from the guest.
Acceptance. ./tests/ssh_transport_test.sh requires guest /bin/ssh to
connect to the host OpenSSH server, report an OpenSSH server banner, verify the
Ed25519 host-key signature, detect strict KEX, negotiate
curve25519-sha256 / ssh-ed25519 / chacha20-poly1305@openssh.com, complete
the encrypted ssh-userauth service request/accept, and print
ssh: transport ready (preauth).
HC6 SSHD generic direct exec preflight (2026-06-11)
- Generalized SSHD
exechandling from a special/bin/echo ...path to a bounded direct/bin/<tool>launcher. The command string is split on simple ASCII whitespace intoargv, requires an absolute single-component/bin/executable path, and is run throughspawn_handleswith stdout/stderr connected to the SSH channel pipe. - This is intentionally still not shell semantics: no quoting, globbing,
redirects, environment assignment, pipelines, PTY, stdin forwarding, or
long-output streaming beyond the current bounded pipe read. It is enough to
support remote checks such as
/bin/idand simple argument passing such as/bin/echo HC6-OK. - HC18 later added quote removal and backslash escaping for direct-exec argv while still intentionally avoiding shell semantics.
Acceptance. ./tests/sshd_transport_test.sh now keeps the HC5 negative-key
check, then authenticates with the HC5 key and executes both /bin/id and
/bin/echo HC6-OK over separate OpenSSH session channels. The host must see
principal=1(root) from /bin/id, HC6-OK from /bin/echo, and exit status
0 for both accepted commands.
HC5 SSHD authorized_keys loading preflight (2026-06-11)
- Replaced the hardcoded HC4 authorized public key in
/bin/sshdwith a small OpenSSHauthorized_keysloader. The daemon now opens/etc/ssh/authorized_keys, parsesssh-ed25519public-key lines, base64 decodes the SSH public-key blob, and compares it with the client's offered key before accepting publickey authentication forroot. - Userauth signature verification now uses the public key from the client's
authorized key blob instead of an embedded raw key. This keeps the signature
check tied to the exact key material that matched
/etc/ssh/authorized_keys. - Added a new HC5 fixture key at
fixtures/ssh/sshd_hc5_ed25519(.pub)and staged only its public key in the base image's/etc/ssh/authorized_keys. The older HC4 key remains as a negative test fixture.
Acceptance. ./tests/sshd_transport_test.sh now performs two OpenSSH
attempts against the same QEMU guest: the old HC4 key must fail with
Permission denied (publickey), then the HC5 key from
/etc/ssh/authorized_keys must authenticate, run
/bin/echo HC5-OK, print HC5-OK, and exit 0. The guest log must include
sshd: authorized key matched /etc/ssh/authorized_keys.
HC4 SSHD publickey session/exec preflight (2026-06-11)
- Extended
/bin/sshdpast transport-only KEX into a minimal authenticated SSH session path. It now reads encrypted client packets, accepts the devssh-ed25519public key forroot, verifies the RFC 4252 publickey signature over the SSH session identifier and userauth request, opens an RFC 4254sessionchannel, handlesexec, and sends channel stdout plusexit-status. - The only supported command for this slice is direct
/bin/echo .... The daemon runs it throughspawn_handleswith stdout/stderr connected to a pipe, then returns the child's output as SSH channel data. This proves the SSH protocol path and the guest process/FD path without introducing shell parsing, PTY allocation, or scp/sftp yet. - Added the HC4 OpenSSH fixture key at
fixtures/ssh/sshd_hc4_ed25519(.pub)and staged the matching development public key in/etc/ssh/authorized_keysin the base image. At this slice, the daemon still compared the embedded raw dev key; persisted host keys, real entropy, and real authorized-key loading remained follow-up work. - Added Swift userland bridges for
pipeand rawspawn_handlesso native tools can inherit explicit file handles when launching children.
Acceptance. ./tests/sshd_transport_test.sh boots QEMU with host TCP
forwarding to guest TCP/22, starts /bin/sshd, and drives it with host
OpenSSH using the fixture key. The host command
ssh ... root@127.0.0.1 /bin/echo HC4-OK must exit 0 and print HC4-OK; the
guest log must show publickey auth, session channel open, and
sshd: session exec completed status 0.
HC3 SSHD KEX transport preflight (2026-06-11)
- Extended
/bin/sshdfrom an identification-only probe into a real SSH transport KEX preflight. It now negotiatescurve25519-sha256,ssh-ed25519, OpenSSH strict KEX, andchacha20-poly1305@openssh.comwith a normal OpenSSH client, signs the exchange hash with a development Ed25519 host key, sends NEWKEYS, and returns an encrypted SSH_MSG_DISCONNECT with the current auth/session limitation reason. - This is still intentionally not a remote-login-capable SSH daemon. The host key seed and server KEX entropy are development-only, there is no persisted host-key store, and user authentication, PTY allocation, shell/session channels, scp/sftp, service supervision, and target-side SSH client support remain follow-up work.
- The SSHD Makefile rule now links the pure Swift SHA-256, SHA-512, Ed25519,
X25519, and ChaCha20-Poly1305 sources into
/bin/sshd.
Acceptance. ./tests/sshd_transport_test.sh boots QEMU with host TCP
forwarding to guest TCP/22, starts /bin/sshd, and drives it with host
OpenSSH. The transcript must show swift-os_sshd-kex,
curve25519-sha256, ssh-ed25519, chacha20-poly1305@openssh.com, strict KEX
sequence reset, and the encrypted kex preflight disconnect reason. The SSH
command still exits non-zero because auth/session are not implemented.
HC2 SSHD transport preflight (2026-06-11)
- Added
/bin/sshdas a native Swift SSH server transport preflight. It opens a stream socket, binds guest TCP/22 by default, listens, accepts normal SSH clients, sendsSSH-2.0-swift-os_sshd-preauth, reads the client's identification string, and sends a valid unencrypted SSH_MSG_DISCONNECT with an explicit pre-auth limitation reason. - This is intentionally not a remote-login-capable SSH daemon. KEX, host keys, user authentication, PTY allocation, shell/session channels, scp/sftp, service supervision, and target-side SSH client support remain follow-up work. The next remote-login milestone should prove an authenticated host-to-guest command through the SSH session, likely by growing this first-party path or by landing a static Dropbear server port.
- The base image now stages
/bin/sshd, and the focused QEMU test forwards a host loopback port to guest TCP/22, runs/bin/sshdfrom the root shell, and drives it with the host OpenSSH client.
Acceptance. ./tests/sshd_transport_test.sh requires the guest to log
sshd: listening on 22 (transport preflight), receive a SSH-2.0-... client
banner, and send the pre-auth disconnect. The host OpenSSH transcript must show
the swift-os_sshd-preauth remote software version and the transport preflight reason, while the SSH command still exits non-zero.
HC1 DHCPv4 cloud network preflight (2026-06-11)
- Added a minimal sans-IO DHCPv4 client codec in
kernel/net/dhcp.swift. It builds DISCOVER/REQUEST Ethernet frames and parses BOOTP/DHCP replies for message type,yiaddr, server identifier, router, DNS, subnet mask, and lease time. The parser validates transaction ID when requested and always validates the client MAC inchaddr. - The IPv4
NetStackpath now accepts DHCP server replies on UDP 67 -> 68 by DHCPchaddr, including broadcast replies and unicast replies to a not-yet-configured lease address. Ordinary UDP/TCP/ICMP still require packets addressed to the current local IPv4. netInit()keeps the old QEMU/slirp constants as a fallback, then attempts DHCPv4 after virtio-net is live. On ACK it adopts the lease address, gateway, DNS, and subnet mask before the existing net-a ARP/ICMP probe. The boot log reportsnet-dhcp OK: lease ... gateway ... dns ...on success, otherwise it reports the static fallback.- Hetzner Cloud preparation note: Hetzner documents Primary IPv4 as DHCP by
default, with static
/32examples using gateway172.31.1.1; Arm64 custom ISO/snapshot paths must match Arm64 servers. This slice is network readiness only. Remote login still needs ansshdmilestone, likely Dropbear server-first, with SSH client support after or alongside the port if it stays small.
Acceptance. tests/net_test.swift now covers DHCP discover/request
construction, broadcast offer parsing, wrong-MAC rejection, and unicast ACKs to
a not-yet-configured address. make build verifies the DHCP codec under
Embedded Swift. The focused runtime gate is ./tests/virtio_net_test.sh, which
now observes DHCP before the existing ARP/ICMP proof under QEMU/slirp.
P19 OpenSSL seed package (2026-06-11)
- Added
ports/security/openssl/Port.jsonfor OpenSSL 3.5.7 LTS as the first checked TLS provider package. It packages the staticopensslCLI and a marker file; staticlibssl/libcryptodevelopment artifacts are deferred to anopenssl-devsplit package so the runtime package stays small enough for the current tmpfs-backed bootstrap installer. - Added
scripts/build-openssl.sh. The script cross-builds with the local newlib sysroot and SwiftOS compat headers, verifies the AArch64 ELF has no unresolved symbols, then publishes both the.swpkgand signed local repository fixture. - The first static build disables shared libraries, DSO/modules, threads,
async, engines, tests, docs, assembly, secure memory, and Linux/devcrypto
engines. The QEMU package smoke uses
openssl versionand a deterministicopenssl dgst -sha256check; entropy-heavyrand, certificate-chain, and live TLS client tests remain follow-up work. - The ports seed repository now publishes Lua, zlib, bzip2, zstd, xz, libarchive, ca-certificates, OpenSSL, pcre2, tzdata, nginx, and sqlite. Package seed, static-host, hosted URL, catalog, recipe, and documentation tests were extended to search, install, and run OpenSSL inside QEMU.
P18 libarchive seed package (2026-06-11)
- Added
ports/archivers/libarchive/Port.jsonfor upstream libarchive 3.8.7 as the next checked archive tooling package. It packages staticbsdtar,libarchive.a, public headers, pkgconf metadata, and a marker file. - Added
scripts/build-libarchive.sh. The script cross-builds against the local newlib sysroot and the checked zlib, bzip2, zstd, and xz package roots, then verifies the AArch64 ELF and publishes both the.swpkgand signed local repository fixture. - The first static build disables external program filters and supplies a small SwiftOS compat shim for metadata calls that are not kernel-backed yet. Built-in gzip, bzip2, xz, and zstd filters are available through the packaged libraries.
- The ports seed repository now publishes Lua, zlib, bzip2, zstd, xz,
libarchive, ca-certificates, pcre2, tzdata, nginx, and sqlite. Package seed,
static-host, hosted URL, catalog, recipe, and documentation tests were
extended to install libarchive, run
bsdtar --version, and create/list a tiny tar archive inside QEMU.
P17 xz seed package (2026-06-10)
- Added
ports/archivers/xz/Port.jsonfor upstream XZ Utils 5.8.3 as the next checked archive-format package. It packages staticxz/unxz/xzcat,liblzma.a, public headers, pkgconf metadata, and a marker file. - Added
scripts/build-xz.sh. The script cross-builds against the local newlib sysroot with scripts, NLS, docs, sandboxing, threading, assembler, dynamic-library paths, and CPU-specific CRC helpers disabled, then verifies the AArch64 ELF and publishes both the.swpkgand signed local repository fixture. - The ports seed repository now publishes Lua, zlib, bzip2, zstd, xz, ca-certificates, pcre2, tzdata, nginx, and sqlite. Package seed, static-host, hosted URL, catalog, recipe, and documentation tests were extended to install xz and run a compression round trip.
P16 zstd seed package (2026-06-10)
- Added
ports/archivers/zstd/Port.jsonfor upstream zstd 1.5.7 as the next checked archive-format package. It packages single-threaded staticzstd/unzstd/zstdcat,libzstd.a, public headers, pkgconf metadata, and a marker file. - Added
scripts/build-zstd.sh. The script cross-builds against the local newlib sysroot with threading, gzip, lzma, lz4, assembly, and backtrace integrations disabled, then verifies the AArch64 ELF and publishes both the.swpkgand signed local repository fixture. - The ports seed repository now publishes Lua, zlib, bzip2, zstd, ca-certificates, pcre2, tzdata, nginx, and sqlite. Package seed, static-host, hosted URL, catalog, recipe, and documentation tests were extended to install zstd and run a compression round trip.
P15 bzip2 seed package (2026-06-10)
- Added
ports/archivers/bzip2/Port.jsonfor Sourceware bzip2 1.0.8 as the next checked archive-format package after zlib. It packages staticbzip2/bunzip2/bzcat/bzip2recover,libbz2.a,bzlib.h, pkgconf metadata, and a marker file. - Added
scripts/build-bzip2.sh. The script performs a manual static AArch64 object build against the local newlib sysroot because the upstream makefile's link ordering is not suitable for the current freestanding runtime shape. A tiny generated compat shim supplies metadata calls bzip2 expects but SwiftOS does not implement yet. - The ports seed repository now publishes Lua, zlib, bzip2, ca-certificates, pcre2, tzdata, nginx, and sqlite. Package seed, static-host, hosted URL, and recipe tests were extended to install bzip2 and run a compression round trip.
nginx compile probe (2026-06-08)
- Added
scripts/build-nginx.shas an out-of-band compile probe. It fetches official nginx source, defaults to stableNGINX_VERSION=1.30.2, allows env override, extracts underuserland/nginx, logs tobuild/nginx-build.log, and configures a minimal static HTTP build with poll events while disabling PCRE/rewrite, gzip/zlib, OpenSSL, cache, proxy/upstream-heavy modules, mail/stream, and dynamic-module paths where upstream options allow it. The script builds a local compiler wrapper so nginx links withcrt0_newlib.o,newlib_syscalls.o,compat_stubs.o, newlib, libm, and libgcc. - The nginx-local overlay in
userland/nginx/swiftos/keeps the scaffold out of the shared compat ABI: a tiny patch preservesaarch64in nginx--crossbuild=SwiftOS:0:aarch64, and local headers describe source-level shapes forglob.h,sys/uio.h, andnetinet/tcp.hso future probes reach link/syscall gaps instead of first failing on missing headers. - Local run result in this worktree: after
make newlib,NGINX_CLEAN=1 ./scripts/build-nginx.shdownloads/extracts/configures/builds nginx and emitsbuild/nginx.elf(ELF64 AArch64 EXEC, entry0x80000000, no undefined symbols inaarch64-elf-nm -u). The probe forcesNGX_HAVE_MAP_ANONafter configure because swift-os has anonymousSYS_MMAP, but nginx cannot run its mmap feature test while cross-building. - API gaps closed for the compile probe: vectored I/O (
readv,writev,pwritev), IPv4 socket and DNS wrappers, minimal IPv6 header helpers, TCP options includingTCP_NODELAY, low-water socket options,O_NONBLOCKon TCP accept/read viafcntl(F_SETFL), UTC-only time aliases and_gettimeofday,getrlimit/setrlimit, process/signal shape expected by nginx, anonymousmmap/munmap,chown,utimes,setitimer,gethostname,initgroups, and nginx control-message header shapes. - Runtime caveats:
sendmsg/recvmsgfd passing still returnsENOSYS,setitimeris a no-op, and nginx has not been added to the boot image or exercised under QEMU.sleep/usleep/nanosleepnow use the timer-backedSYS_NANOSLEEPpath from main. The expected first runtime configuration should still be single-process (daemon off; master_process off;) until master/worker channel fd passing is real.
Environment (host) — captured 2026-06-04
Host: macOS (Darwin 25.5.0), Apple Silicon (arm64, T6050).
| Tool | Status | Version / notes |
|---|---|---|
swift / swiftc |
present | Apple Swift 6.3.2 — Command Line Tools only |
| Embedded Swift | missing | CLT does not ship the embedded stdlib; arm64-apple-none-elf fails to load |
clang |
present | Apple clang 21 (Darwin target only; no ELF cross out of the box) |
qemu-system-aarch64 |
missing | available via Homebrew |
lld / llvm-objcopy |
missing | available via Homebrew (llvm, lld) |
aarch64-elf-binutils |
missing | available via Homebrew |
aarch64-elf-gdb |
missing | available via Homebrew; lldb is present and can do remote aarch64 |
make, git |
present | — |
| Network | up | Homebrew (/opt/workbrew/bin/brew) usable |
Resolution (installed 2026-06-04)
- Brew tools installed:
qemu11.0.1,llvm22.1.6 (clang +llvm-objcopy),aarch64-elf-binutils(aarch64-elf-ld),aarch64-elf-gdb. - Embedded Swift toolchain: swift.org 6.3.2-RELEASE, extracted user-locally (no sudo) to
~/Library/Developer/Toolchains/swift-6.3.2-RELEASE.xctoolchainviapkgutil --expand-full. It shipsusr/lib/swift/embedded/including theaarch64-none-none-elftarget — exactly what we build for. - Pinned target triple:
aarch64-none-none-elf. - Pinned Embedded Swift flags:
-target aarch64-none-none-elf -enable-experimental-feature Embedded -wmo -parse-as-library -Osize -Xllvm -mattr=+strict-align,-neon -Xfrontend -function-sections -import-objc-header kernel/arch/aarch64/io.h+strict-align,-neonis an early-boot guardrail: with the MMU off, QEMU can fault on unaligned SIMD accesses that Swift may otherwise generate for ordinary value copies.
- Linker:
ld.lld(/opt/homebrew/opt/lld/bin/ld.lld,--gc-sections -nostdlib -T kernel.ld). Switched from GNUaarch64-elf-ldat M4.5: as soon as kernel code uses a SwiftArray/String, the compiler emits references to protected-visibility runtime singletons ($es23_swiftEmptyArrayStorage...). GNU ld rejects these with "copy relocation against non-copyable protected symbol";ld.lldresolves them directly. lld is the linker the Embedded Swift toolchain expects, so this also removes the spurious RWX-segment warning. - MMIO: volatile access via C inlines in
kernel/arch/aarch64/io.h(bridging header). The toolchain also ships a_Volatileembedded module — a possible modern refinement later.
Toolchain gap analysis (historical — resolved above)
- Embedded Swift stdlib is the blocker. The Command Line Tools toolchain does not include the
embedded stdlib for bare-metal ELF targets. Options:
- Install a swift.org open-source toolchain (
.pkg) that shipsusr/lib/swift/embedded/— used viaxcrun --toolchain/TOOLCHAINS=. - Install full Xcode (ships embedded resources). Decision pending — see "Open decisions."
- Install a swift.org open-source toolchain (
- C cross-compiler + linker: use Homebrew
llvm(clang can targetaarch64-none-elfwith-ffreestanding) pluslld(ld.lld) andllvm-objcopy.aarch64-elf-binutilsis a fallback linker/objcopy. - Emulator: Homebrew
qemu(qemu-system-aarch64). - Debugger: Homebrew
aarch64-elf-gdb, or hostlldbover the QEMU gdbstub.
Planned install (pending confirmation)
brew install qemu llvm lld aarch64-elf-binutils aarch64-elf-gdb
# Swift toolchain with Embedded Swift: install a swift.org toolchain (.pkg) — see ARCHITECTURE/decision.
Hardware constants (QEMU virt, aarch64) — verify against QEMU source per version
- RAM base:
0x4000_0000. - UART: PL011 @
0x0900_0000(MMIO). - Interrupt controller: GICv2 (
arm,cortex-a15-gic) verified from QEMU 11.0.1 DTB: distributor @0x0800_0000, CPU interface @0x0801_0000. - ARM generic timer: DTB
arm,armv8-timer; physical timer PPI is interrupt ID 30 (interrupts = <0x01 0x0e ...>). - Block/etc devices: virtio-mmio.
- Boot:
-kernel <image>, entry at EL1.
Re-confirm with
qemu-system-aarch64 -M virt,dumpdtb=...+dtc, or the QEMUhw/arm/virt.cmemory map, when QEMU or machine options change.
Early virtual memory (M3)
- Translation regime: EL1 stage-1, TTBR0 only, 4 KiB granule, 48-bit VA (
T0SZ=16), 36-bit PA (IPS=1), TTBR1 walks disabled for now. - MAIR slots:
- AttrIdx 0: normal write-back/write-allocate cacheable memory (
0xff). - AttrIdx 1: Device-nGnRnE (
0x00).
- AttrIdx 0: normal write-back/write-allocate cacheable memory (
- Initial mappings are identity mappings:
0x0000_0000..0x3fff_ffffas device memory for early MMIO.0x4000_0000..0x7fff_ffffas normal memory for RAM/kernel.
- A scratch L3 table under VA
0x8000_0000is reserved for M3 page map/unmap tests. - RAM identity mapping is executable during bring-up; device and scratch pages are XN.
Syscall ABI (M5)
- EL0 syscall entry is
svc #0. x8holds the syscall number.x0...x2hold the first three arguments.- Return value is written back to
x0. - Implemented bring-up calls:
1 open(path, flags)— supports/hello.txt, read-only.2 read(fd, buffer, count)— reads from fd 3.3 write(fd, buffer, count)— writes fd 1/2 to UART.4 close(fd)— closes fd 3.5 exit(status)— records M5 success.6 lseek(fd, offset, whence)— implemented for fd 3.
- M7 additions:
2 readfrom fd 0 is served by the tty;5 exitunwinds an active process to the kernel;7 tcgetattr/8 tcsetattr;9 sigaction;10 kill;11 getpid. - M8d additions include process control plus
22 psinfo(buffer, capacity): copies fixed 32-byte process records (pid,ppid, state, short command name) for userland tools such as/bin/ps. - busybox vi addition:
33 ftruncate(fd, length)— resize a writable tmpfs file (busybox vi writes withO_CREATwithoutO_TRUNC, thenftruncates to the exact length). Growth zero-fills up to the node's capacity; shrink updates the length. Read-only/base files and directories are rejected. /bin/topadditions:46 sysinfo(buffer)copies a 64-byte system-stats blob (uptime ticks, idle ticks, total/free RAM bytes, kernel image/heap bytes, tick rate, process counts);47 procstat(buffer, capacity)copies richer 56-byte per-process records (pid,ppid, state, principal, CPU ticks, start tick, resident bytes, name[16]). The 32-byte22 psinforecord is left unchanged so/bin/psis unaffected.
Build / run commands (verified at M9)
make build— assembleboot.S, compile Swift (WMO) to one object, link with the script, emitbuild/kernel.elf(+kernel.bin).make run—qemu-system-aarch64 -M virt -cpu cortex-a72 -m 256M -nographic -kernel build/kernel.elf. Exit QEMU serial withCtrl-A X.make debug— same +-s -S(paused, gdbstub on:1234). Thenmake gdb(or lldb) in another shell.make test— host page-allocator unit test, userland ELF sanity check, then QEMU boot asserts (M6:hello from ELF userland+ exit code 7) and a scripted interactive tty test (M7: echo + Ctrl-C/SIGINT interruption).make clean— remove build artifacts.
Track B — mmap/munmap/mprotect + W^X
The last "common denominator" in the long-horizon table (docs/ARCHITECTURE.md): anonymous
mmap with W^X-enforced executable mappings, the substrate JIT runtimes (V8, the JVM) and
large Swift apps need. Built on the kernel/mm/vm.swift seams (walkToL3, linkPage,
memAttrs/protPageDesc).
B1 — anonymous mmap/munmap (DONE, 2026-06-07)
protPageDesc(pa, prot)invm.swiftbuilds a 4 KiB leaf from a PROT bitmask (READ=1/WRITE=2/EXEC=4) viamemAttrs(userAccess: true, executable: prot&EXEC, userReadOnly: !(prot&WRITE)). W^X (WRITE|EXEC) andPROT_NONEboth return an invalid descriptor (0). Since NPM8, the process-layer mmap path handlesPROT_NONEVA-only reservation above this leaf layer;protPageDescstill never creates a present-but-inaccessible page.- mmap VA arena — chosen base
0x9800_0000, growing DOWN (floor0x9000_0000). The valid user window is[0x8000_0000, 0xB000_0000)(user_access.swift). Within it: the ELF image sits at0x8000_0000growing up (busybox ~1.1 MiB, far short of0x8800_0000); the 16-page user stack is at the top of[0x8FFF_0000, 0x9000_0000); thesbrkheap is at0xA000_0000growing up. That leaves a 256 MiB hole between the stack top (0x9000_0000) and the heap base (0xA000_0000). The mmap arena is parked at the midpoint (0x9800_0000) and grows down, so it keeps 128 MiB of clearance above the stack top and 128 MiB below the heap base — it cannot collide with code, data, stack, or heap. The cursor (pMmapTop) is per-process: reset onexec, copied onfork(the eager clone duplicates mmap'd pages too), seeded from the creator for a thread. address_space_mmap/munmap(vm.swift) do the frame work given an aligned base VA- page count from
process.swift:pmm_alloc_pageeach, zero the frame (anonymous memory reads as 0),linkPage(protPageDesc(...)), one bulkdsb;tlbi. A mid-map failure rolls back every frame already linked, so a failed mmap leaves no partial region. munmap clears leaves + frees frames (page tables kept; reclaimed at process exit). The kernel policy/accounting half isprocessMmap/processMunmap(cursor,pResPages, validation).
- page count from
- Syscalls:
mmap= 54 (returns base VA, or a small negative errno in[-4095,-1]encoded in the result — bridge maps that toMAP_FAILED),munmap= 55. Bridgesswiftos_mmap/swiftos_munmapinswift_user.{h,c}; POSIX-shapedmmap/munmapinlines +PROT_*/MAP_*insyscall.h. - Test:
userland/mmapdemo.swift(/bin/mmapdemo) maps anonymous RAM, asserts it reads as 0, round-trips a write/read pattern across a page boundary, munmaps.tests/mmap_test.sh(inmake test). NOTE: syscall numbers 54/55 are next-free at impl time; other concurrent sessions may also be adding syscalls to main — renumber at merge if they clash.
B2 — mprotect + W^X (DONE, 2026-06-07)
address_space_mprotect(vm.swift) changes the PROT bits over a range, preserving each page's backing frame:walkToL3(allocate: false), rebuild the leaf from the same PA viaprotPageDesc, rewrite it,dsb;tlbi. It pre-validates the whole range (every page must be mapped) before touching any leaf, so a hole is rejected (ENOMEM) without leaving a partially-changed region.processMprotectadds the cursor/arena bounds + alignment checks.- W^X is enforced at BOTH ends: at the syscall boundary (
processMmap/processMprotectrejectPROT_WRITE|PROT_EXEC→ EINVAL) and defensively insideprotPageDesc(a W^X or PROT_NONE bitmask yields an invalid descriptor, so even a directaddress_space_*caller can never install a writable+executable or present-inaccessible leaf). So a page is never simultaneously W and X. - Syscall
mprotect= 56;mprotectinline insyscall.h, bridgeswiftos_mprotect. - Test — the JIT pattern (
/bin/mmapdemo,tests/mmap_test.sh): mmap a page RW, writemov w0,#42; ret(bytes40 05 80 52 c0 03 5f d6),mprotectRW→RX (must succeed), call it through a@convention(c)function pointer → returns 42. Then assert both W^X breaches are rejected:mmapRWX fails, andmprotect→RWX on a live mapping fails. Verified in QEMU:mmapdemo: B1-OK anon mmap zero+write+read+munmap mmapdemo: B2-OK jit RW->RX call returned 42 mmapdemo: WX-OK mprotect ->RWX rejected mmapdemo: WX-OK mmap RWX rejected mmapdemo: ALL-OK
Milestone log
L0 (2026-06) — kernel log facade. Introduced
kernel/log/log.swiftwithLogLevel,klog(level, source, message)andklogInfo. Renders as[tick] [L] source: messageto UART (and fb mirror).timerGetTicks()published from the timer. The facade is additive: all existing "Mxx OK:" / "panic:" banners were left untouched so every test expectation continues to match. One demo line (L0 kernel logger active) was added aftertimerInitand asserted inboot_test.sh.make build+ real QEMU boot verified the line appears on serial. See the full plan, rationale (future central AI log collector), and design indocs/LOGGING.md. This is the first slice of the observability work called for in PHILOSOPHY.md and RISK_REMEDIATION_ROADMAP.md.L1 (2026-06) — log ring buffer + dump. Added fixed 256-entry ring of LogEntry (tick + level + source + StaticString message) with circular overwrite.
logDumpRecent(n)replays the most recent entries (oldest of the window first).kpanicnow stores + dumps the tail (~24 entries) after the panic banner. AlogDumpRecent(5)call was placed late in the kernel demo sequence so the ring is exercised on every test boot; the dump header is asserted inboot_test.sh. Ring and dump are allocation-free and safe on panic/IRQ-masked paths. Pre-existing banners unchanged. See docs/LOGGING.md.L2 (2026-06) — runtime min-level filtering. Added global
minLogLevel(defaults to .info).klogdrops sub-minimum messages (both UART and ring storage);.panicis never dropped. NewklogSetMinLevel/klogGetMinLevel. Early boot now emits "level filtering active (min INFO)" (asserted in boot_test) plus a .debug example that is suppressed by default. This gives a runtime knob for quieter production images while keeping the ability to open the logs for diagnostics or the future central collector. Filtering decision is made before ringStore. See docs/LOGGING.md.L3 (2026-06) — structured records foundation. Extended
LogEntrywithdetail: UInt64(0=none). Updated ring initialiser, ringStore,klog(now accepts optional trailingdetail: UInt64 = 0so 3-arg calls are unaffected) andlogDumpRecentdump formatting (appends " detail=NNN" when nonzero). Added real example uses (post-heap safe):klog(..., "timer", "tick rate (Hz)", 100)after timerInit,klog(..., "pmm", "free frames", UInt64(count))in main.swift reclaim demo, and scheduler capacity detail in schedulerInit.boot_test.shnow asserts representativedetail=100anddetail=4payloads. Seedocs/LOGGING.mdL3 entry and phased plan.L3 adoption (2026-06) — klog population for ring value. Moved or mirrored key boot events into klog(.info, "sched"/"platform"/"boot"/"disk"/"vfs", msg) while keeping message text recognizable. The platform discovery marker remains an early UART line in
platformInitand is mirrored with klog aftertimerInit, preserving the logger's safe post-runtime startup point. Scheduler online/context-switch, reclaim start/OK, Swift ps launch, M11b disk OK, and M11c VFS base mount now populate the L1 ring (useful for logDumpRecent panic tails and future AI correlation) without touching panics or userland. Updated affected ASSERT strings in tests/boot_test.sh EXPECTS to stable prefixed substrings (e.g. "[I] boot: reclaim OK...") that match the new [tick] [I] source output. See docs/LOGGING.md (L-plan).L4a (2026-06) — ring context enrichment. Extended
LogEntrywith process/security context captured at emit time:pid: Int32(0= kernel/no current process) andprincipal: UInt32(1= boot/root principal).klognow records this context in the ring via the existingprocessCurrentPid()/processCurrentPrincipal()accessors after L2 filtering; live UART output stays in the L0 format.logDumpRecentappendspid=N principal=Monly for non-kernel contexts, while preserving L3detail=...payloads. Added a ring-onlypsinfosyscall event viaklogRing, kept the demo dump window compact while preserving early details, and updatedboot_test.shto assert a real EL0 context suffix. Seedocs/LOGGING.md.L4b (2026-06) — per-source runtime filtering. Added a tiny fixed override table in
kernel/log/log.swiftfor exact source-tag minimum levels.klogSetSourceMinLevel(source, level)sets/replaces an override,klogClearSourceMinLevels()clears all overrides, and filtering now prefers the source override before falling back to the globalminLogLevel;.panicstill bypasses filtering. The shared acceptance path is used by both liveklogand ring-onlyklogRing, so suppressed records do not reach UART or the ring. Boot now demonstrates this on a dedicatedlog_filtersource without affecting scheduler/detail acceptance: the.infodemo is forbidden inboot_test.sh, while the.errordemo must appear. Seedocs/LOGGING.md.L4c (2026-06) — wire-format serialization. Added allocation-free ring serialization in
kernel/log/log.swift:logFormatRecentTail(maxCount, into:capacity:)writes recent records into a caller-provided byte buffer as newline-separated key=value entries (tick=N level=I source=tag msg="text"plus optionaldetail=Nandpid=N principal=N). The formatter includes the L3 detail and L4a context fields, shares the ring's oldest-first tail semantics, and has no UART side effects. Boot now records a ring-onlylog_exportmarker, emits a smallLOG-EXPORT-BEGIN/LOG-EXPORT-ENDsample afterlogDumpRecent, andboot_test.shasserts both a context-richpsinfoserialized line and the export marker line. This remains an internal formatter, not a user-visible device or remote protocol. Seedocs/LOGGING.md.L4d (2026-06) — log sink indirection + capability hook. Live
klogoutput now routes through a tiny current-sink dispatch inkernel/log/log.swift; the default and only implemented sink remains UART, butklogno longer embeds the UART renderer inline. Added reservedcapLogExportinkernel/security/security.swift(not granted to the boot/root context by default) plusklogCanInstallSink(capabilities:)/klogCanExportRing(capabilities:)hook helpers for the future userland log service/export path. Boot asserts bothsink indirection activeandsink capability hook active, while preserving the existing live line spelling and L4c wire-format sample. Seedocs/LOGGING.md.L5a (2026-06) — capability-gated userland log tail export. Added
SYS_LOG_READ(77), which copies the allocation-freelogFormatRecentTailoutput into a user buffer only when the caller holdscapLogExport; callers without the bit receiveEPERM. The native Swift bridge now exposesswiftos_log_read,/bin/logtail [max-records]prints the local key=value ring tail, and/bin/logtail-probeis an acceptance helper that proves denial under the seeded root mask (0x3f) and success after an explicit admin-contextSYS_LOGINgrant ofcapLogExport.make log-export-testboots QEMU, verifies the denial, verifies exportedtick=/level=/source=/msg=records after the grant, and confirms the shell survives.L5b (2026-06) — capability-gated log ring stats export. Added
SYS_LOG_STATS(82), which copies a fixed 32-byte stats record (capacity,available,total_written,overwritten) only when the caller holdscapLogExport. The kernel ring now tracks total accepted records since boot,/bin/logtail --statsprints the local ring counters, and/bin/logtail-probevalidates both denial before the grant and stats shape after the explicitcapLogExportgrant.make log-export-testnow covers tail export and stats export together;docs/SMP_STATE_AUDIT.mdrecords the newringTotalWrittenmutable global.FP1 (2026-06) — lower-EL FP/SIMD trap-frame preservation. Expanded the lower-EL trap frame in
kernel/arch/aarch64/exceptions.Sfrom the integer register/return-state frame to a full frame that also saves and restoresq0..q31,FPCR, andFPSR.fork()now copies the full frame so children inherit the interrupted FP state correctly. This fixes nondeterministic Q8 inference when/bin/llmdis preempted while the defaultsshdservice is also running. Acceptance:make build, hostllm_engine_test/llm_q8_engine_test, and./tests/llm_serve_test.shwith the default base image all pass; the diagnostic no-service image is no longer needed.M9 (2026-06-04) — DONE. HAL + runtime hardware discovery from a flattened device tree. Added a pure Swift FDT reader with host coverage, a global
Platformpopulated at boot, and driver/PMM use of discovered UART/GIC/RAM values.make run/make testnow dump QEMU's actualvirtDTB and load it into the direct-boot fallback address (0x4FF0_0000for-m 256M); boot assertsM9 OK: hardware discovered from device tree. The parser avoids large unaligned value-copy layouts in the early boot path because strict alignment checks are active.M8 (2026-06-04) — DONE: toward busybox. Staged sub-milestones; libc strategy = cross-build newlib.
- Swift
/bin/psutility — DONE. AddedSYS_PSINFO(22), short process names captured fromargv[0], and an Embedded Swift EL0 utility (userland/ps.swift) linked through a tiny C syscall/runtime bridge./bin/psis embedded in the kernel image and asserted inboot_test.sh. Supported syntax with today's process data:ps,ps -e,ps -A,ps -ef,ps ax,ps aux,ps -aux,ps -p pid[,pid...], andps -o pid,ppid,state,stat,user,uid,cmd(plus aliasescomm/command/argsforcmd). CPU, memory, tty, and time columns need more kernel accounting. - (a1) Full trap frame — DONE.
exceptions.Snow saves/restores a complete frame (x0..x30 + SP_EL0/ELR_EL1/SPSR_EL1 plus FP/SIMD q0..q31 and FPCR/FPSR) on every lower-EL entry, making exceptions nestable. This resolves the M7 constraint:read(0)is back to a cleanenable_irq+wfiblock (validated — it panicked before the frame, passes now), and it unblocks preemptive EL0 scheduling. No regressions: M5/M6/M7 green. - (a2-argv) Process arguments — DONE.
ustack.cbuilds the SysV AArch64 entry stack (argc/argv/envp/auxv) at the top of the process's user stack;crt0.Sreads argc from[sp], argv fromsp+8, and computes envp.processRunElftakes packed NUL-separated args;packArgsbuilds them in Swift. Newargvdemoprints its argv (argv[0]=argvdemo argv[1]=alpha argv[2]=beta, exits argc=3).boot_test.shgeneralized to assert a list of lines (M6 + M8a argv). - (a2-spawn) Nested process launch — DONE. Process runs are now a depth stack:
process.swifttracks per-level return context, child address space, and exit status, and unwinds the innermost level to its launcher onSYS_exit/signal, restoring the parent'sTTBR0. Newspawn(path, argv)syscall (12) resolves an embedded program (exec.swiftbuilt-in table) and runs it synchronously (= fork+exec+wait, since we have no COW), returning the child's exit status;waitpid(13) is a stub (ECHILD) because spawn is synchronous. Demo:spawndemo(EL0) spawns/bin/argvdemo(own address space), gets status 2, continues — proving the shell-launches-command model. - (b) Real VFS — DONE.
vfs.swiftrewritten as a fixed vnode table (parent/child/sibling inode tree) with a read-only base (/,/bin,/etc/{motd,hostname},/readme.txt,/hello.txt) and a writable tmpfs at/tmp. Implementsopen(incl.O_CREATin tmpfs),read,write(tmpfs + stdout/stderr),close,lseek,stat/fstat(14/15),getdents(16),chdir(17),getcwd(18); path resolution handles absolute/relative,./... Userlandlib/fs.hmirrors thestat/direntlayouts. Demofsdemolists/, cats/etc/motd, stats,chdir /etc+getcwd, and round-trips a/tmp/notefile — all asserted inboot_test.sh. - (c1) User heap via sbrk — DONE. Per-process heap region at
0xA000_0000;sbrk(incr)syscall (19) grows it on demand, mapping pages from the PMM into the process address space (tracked per nesting level inprocess.swift).brkdemowrites/reads across a page boundary → OK. This is the foundation newlib's malloc/_sbrk will use. - (c2) newlib port — DONE. Cross-built newlib 4.6.0.20260123 for
aarch64-elfwith the Homebrewaarch64-elf-gcc16.1.0 toolchain (--disable-newlib-supplied-syscalls), installed under./sysroot(gitignored; reproducible viascripts/build-newlib.sh/make newlib). libgloss is not used. Our bottom end:userland/lib/newlib_syscalls.cimplements_read/_write/_open/_close/_lseek/_fstat/_stat/_isatty/_sbrk/_exit/_kill/_getpid+environover thesvcABI;crt0_newlib.Spasses argv and calls newlibexit()(flushes stdio);user_newlib.lduses PHDRS for separate RX/RW segments (no RWX) so newlib's writable globals work.newlibtest(built withaarch64-elf-gcc) runsprintf,malloc/free, andfopen/fgetsof/etc/motdon the OS — all pass. Prerequisite: runmake newlibonce beforemake build(kernel embeds the newlib program). - Remaining: (d) process subsystem + cross-build busybox; (e) run
sh. Decisions (locked): eager-copyfork(no COW) +execve+ realwaitpid+ preemptive EL0 multitasking; busybox config minimal (ash + ls/cat/echo only).
- Swift
M8d plan — process subsystem for fork/exec/wait + busybox (the finale)
This is the largest single step: it replaces the current synchronous nested process model (all demos
call processRunElf and get a return value) with a real process table + preemptive EL0 scheduler,
because fork needs parent and child alive at once. Staged:
d1 — Unified preemptive process model — DONE.
process.swiftrewritten as a real process table {state, ppid, ttbr0, kernel stack, CPUContext, exit status, wait target, brk}. A dedicated scheduler context (the kernel_main stack) switches into a runnable process and regains control when it yields, blocks, is preempted, or exits. The timer preempts the current EL0 process (processOnTick→yieldToScheduler, safe thanks to the M8a1 trap frame); tick rate raised to 100 Hz and per-tick logging silenced.processRunElflaunches a top process and runs the scheduler until it exits;spawnblocks the parent and the same loop runs the child then wakes the parent (foundation for fork/waitpid). A newcoprocdemo runs two EL0 processes that interleave under preemption (coproc A/B iter 0..2in alternation) → real preemptive multitasking proven. All prior demos (M5–M8c) and the interactive tty/Ctrl-C test still pass.NOTE: process teardown now reclaims frames (address space + page tables + kernel stack) on exit/exec/reap — see "Process teardown reclaims frames" below. (Originally a documented follow-up.)
NOTE: per-process fd table/cwd still global in the VFS — fine while one EL0 process uses fds at a time; will move into the process struct when fork needs fd inheritance (d2/d4).
Security test hardening — DONE. Added an embedded
securitydemoEL0 program to the boot test. It sends invalid-but-non-faulting syscall arguments (bad fds, NULL buffers/paths/statbuf, read-only writes, too-smallgetcwd, below-basesbrk,waitpidwith no children) and asserts errno-ish returns. The first run exposed a real EL1 trap: signed syscall args such as fd-1were decoded with trappingInt(UInt.max)conversions.syscallDispatchnow decodes signed fd/offset/whence fields withInt(bitPattern:). Host PMM tests now also cover reserve idempotence, fragmentation, exhaustion, and double-free behavior.User pointer hardening — DONE. Added
kernel/user/user_access.swiftand moved VFS, TTY, termios, and spawn argv/path handling away from direct EL0 pointer dereferences. Syscalls now reject kernel/device addresses, unmapped user pages, integer-overflowed ranges, and huge lengths before copying or scanning user buffers.securitydemonow exercises faulting-class inputs (0x4000_0000kernel identity map and unmapped user VAs) without panicking the kernel.User pointer hardening follow-up (2026-06-09) — DONE. Tightened the range checks so wraparound user pointers such as
(char *)-1are rejected before formingva + counton both readable and writable copy paths.packUserArgvnow validates each argv pointer slot before reading it, so an argv array without a NULL terminator before an unmapped page cannot fault EL1. Added/bin/selfexecdemoplustests/spawn_self_exec_test.sh, which opens and spawns the same disk-backed file and feeds malformed argv shapes, then proves the shell survives.securitydemoalso covers wraparound open/stat/getcwd/read/write.d2 —
fork()+ first realwaitpid— DONE.SYS_fork(20) eager-copies the current process address space, preserving user page permissions, and clones the saved trap frame onto a fresh child kernel stack with childx0=0; the parent gets the child pid.waitpidcan now block on a direct child and reap its zombie, writing a minimal status word.forkdemoproves parent/child split, private copied data (markerstays7in parent while child writes42), child exit status42, and parent wake/reap.Per-process VFS state — DONE.
cwdand fd tables are now keyed by process slot instead of global kernel state. New processes start from/with empty user fds; forked children inherit a snapshot of parent cwd and open fds.forkdemonow verifies inherited cwd (/etc) and inherited open fd (hostname) in the child.d3 —
execve(path, argv, envp)— DONE.SYS_execve(21) resolves an embedded executable path, packs argv from the old address space, builds a fresh address space + stack, rewrites the current trap frame (SP_EL0/ELR_EL1/SPSR_EL1), switchesTTBR0, and returns from the syscall directly into the new image.execdemoreplaces itself with/bin/argvdemo exec-alpha exec-beta, proving argv survives and the old image does not resume.d4 —
waitpid/exit/SIGCHLD — mostly DONE with d2:waitpid(pid|-1, *status)blocks, reaps a matching zombie, returns the pid (ECHILD with no children). Remaining: SIGCHLD delivery and per-process fd/cwd inheritance across fork (needed once busybox keeps fds open across fork/exec).d5 — busybox — DONE. fetch busybox.net release, minimal
.config(ash + ls/cat/echo), cross-build withaarch64-elf-gccagainst./sysroot+ our stubs; add whatever syscalls it needs (dup,pipe,ioctl/TCGETS,wait4,getuid, …); runshand execute ls/cat/echo → M8 acceptance.M7 (2026-06-04) — DONE. TTY line discipline, termios, signals:
- UART RX + IRQ. PL011 receive path added (
uartRxInit/uartHandleRx/uartTryReadByte); routed through the GIC as SPI 1 → INTID 33.gicEnableInterruptnow programsGICD_ITARGETSRfor SPIs (PPIs are banked, SPIs are not) — without it the line is never delivered. - TTY line discipline (
kernel/tty/tty.swift): canonical mode (line buffering, echo, backspace editing) and raw mode, selected by termiosc_lflag(ICANON/ECHO/ISIG). Backing forread(0). - termios syscalls
tcgetattr/tcsetattr(7/8); userlandlib/termios.hmirrors the ABI. - Signals (
kernel/signal/signal.swift): pending mask + dispositions for the foreground process. Ctrl-C (ETX, withISIG) raises SIGINT; delivered from the IRQ handler after the GIC EOI. Default action terminates the process (status 128+signo);SIG_IGNhonored.sigaction/kill/getpid(9/10/11) present. Current state: NPM10 added current-process custom handler delivery via user signal frames andsigreturn; masks, process groups, blocked-syscall interruption, and remote async custom-handler delivery remain future work. - Important constraint discovered: a blocking syscall must NOT unmask IRQs, because an interrupt
taken at EL1 overwrites
ELR_EL1/SPSR_EL1(no save/restore in the sync vector yet), corrupting the pending return to EL0.read(0)therefore polls the UART with IRQs masked; the UART IRQ still drives Ctrl-C while the program runs at EL0. A full trap-frame (save/restore ELR/SPSR/SP_EL0) is the proper fix and is the prerequisite for preemptively scheduling EL0 processes — deferred. - Acceptance: typed input is echoed and returned by
read(0); Ctrl-C interrupts the running command (M7 OK: foreground interrupted by Ctrl-C (SIGINT), status 130).make testaddstty_test.sh(scripted serial input) and passes.
- UART RX + IRQ. PL011 receive path added (
M6 (2026-06-04) — DONE. libc subset, ELF64 loader, process spawn:
- Userland toolchain. Hand-written minimal libc (
userland/lib/):crt0.S, syscall wrappers (syscall.h),strlen/puts_raw(libc.c).userland/hello.ccross-built static and linked at0x8000_0000(user.ld) — our userland ABI lives high so it never collides with the kernel/device identity blocks. Built withld.lld -z max-page-size=4096. - ELF64 loader (
kernel/user/elf.c): validates anET_EXEC/AArch64 image and mapsPT_LOADsegments page-by-page (two segments may share a page — ours pack text+rodata into one), allocating frames from the PMM, per-page perms = "executable wins". Returnse_entry. - Spawn primitive (
kernel/user/process.swift):posix_spawn-style (fresh address space + load + enter EL0), chosen overforkbecause we have no COW and build fresh spaces anyway. Runs synchronously —SYS_exitswitches back (viacpu_switch_context) to the kernel context that launched it (user_entry.Strampoline installs TTBR0/SP_EL0/ELR/SPSR anderets). The exit code round-trips to the kernel. Nests naturally for a future shell. - The ELF is embedded in the kernel image (
kernel/user/user_blob.S.incbin) until M8's packed FS. - Acceptance: a static C
helloloads, printshello from ELF userlandvia our libc/syscalls, and exits with code 7 — kernel logsM6 OK: ELF process exited, code 7.make testpasses (host PMM unit + userland ELF sanity + QEMU asserts).
- Userland toolchain. Hand-written minimal libc (
M4.5 (2026-06-04) — DONE. Foundation hardening before the libc/ELF work of M6:
- PMM wired in. The host-tested
PageAllocatorbitmap now manages all RAM past the kernel image (__image_end .. 0x5000_0000, ~65k frames) viakernel/mm/pmm.swift, exposed to C aspmm_alloc_page/pmm_alloc_pages/pmm_free_page/pmm_free_count. Page tables, process stacks, and user pages now come from the PMM; the bump heap (heap.c) is only for small Swift objects. Addedkernel/runtime/string.c(mem* with-fno-builtin). - Per-process address spaces.
vm.cgained a general 4-level page-table walker (address_space_create/map/switch/translate) that allocates intermediate tables from the PMM and identity-maps the kernel/device 1 GiB blocks into every space. Probe maps one VA to two distinct frames in two spaces and reads back distinct values after switchingTTBR0_EL1→ isolation proven. - Real context switch.
kernel/arch/aarch64/switch.Scpu_switch_context(callee-saved + sp + lr, xv6-style) +thread_trampoline.scheduler.swiftrewritten with real TCBs, per-thread kernel stacks, cooperativeschedYieldand timer-driven preemption (schedulerTickafter the GIC EOI). Two kernel threads interleave through genuine switches (thread 1/2 iter 0..2) and finish. - Linker switched to
ld.lld(see above) to support Embedded SwiftArray/String. make testpasses (host PMM unit test + QEMU asserts the context-switch and M5 lines).
- PMM wired in. The host-tested
M5 (2026-06-04) — DONE. Syscall entry and VFS skeleton:
- Lower-EL SVC handling now receives a saved register frame, dispatches by
x8, and writes syscall return values back to savedx0. - Minimal VFS/file table added with one read-only base file,
/hello.txt, plus stdout/stderr writes to UART. - EL0 test program now performs
open/read/write/close/exitthrough syscalls; the file content is copied into an EL0 buffer and written back out throughwrite(1, ...). lseekis present for the read-only file. Wider VFS calls (stat,getdents, cwd handling) remain to be expanded before busybox.
- Lower-EL SVC handling now receives a saved register frame, dispatches by
M4 (2026-06-04) — DONE. Minimal processes/scheduler:
- Timer IRQs now drive a tiny round-robin scheduler model that runs two kernel-thread slots and proves A/B interleaving on serial.
- Lower-EL AArch64 synchronous exceptions dispatch through a separate vector entry; SVC traps from EL0 are handled in the kernel.
- A tiny EL0 program page is installed at
0x8010_0000, mapped read-only executable for EL0, entered viaeret, executesmov x0, #42; svc #0, and traps back into the kernel. Its EL0 stack page is mapped read/write and XN. - Kernel/device identity mappings remain EL1-only, so EL0 is confined to its mapped user window.
- Full saved-context thread switching and per-process TTBR switching remain future M4/M5 refinements; this milestone establishes the tested EL0 trap path.
M3 (2026-06-04) — DONE. Virtual memory and MMU:
- Early AArch64 stage-1 translation tables added in
kernel/mm/vm.c. - Kernel/devices are identity-mapped,
MAIR_EL1/TCR_EL1/TTBR0_EL1are configured, andSCTLR_EL1.Mis enabled. - A scratch VA page at
0x8000_0000maps to a page-aligned heap page; the kernel writes through the mapped VA, verifies the physical page contents, unmaps it, and checks software translation returns unmapped. - Timer interrupts still run after MMU enable;
make testpasses.
- Early AArch64 stage-1 translation tables added in
M2 (2026-06-04) — DONE. Interrupt and timer bring-up:
- EL1 vector table now dispatches IRQ entries through an assembly save/restore path and returns
with
eret. - Minimal GICv2 driver enables the physical timer PPI (ID 30).
- ARM generic physical timer is configured from
CNTFRQ_EL0; the kernel logs periodic ticks. make testpasses and assertstick 3on the QEMU serial console.
- EL1 vector table now dispatches IRQ entries through an assembly save/restore path and returns
with
M1 (2026-06-04) — DONE. Runtime/memory bring-up:
- EL1 vector table installed in
boot.S; unexpected exceptions dumpESR_EL1,ELR_EL1,FAR_EL1,SCTLR_EL1, andCPACR_EL1. - Early linker-reserved bump heap, Swift raw allocation hook (
swift_slowAlloc/swift_slowDealloc), class allocation support (posix_memalign/free), and stack protector stubs. - Physical page allocator added as a Swift bitmap allocator with host unit coverage.
- Boot probe instantiates and retains a Swift class;
make testpasses.
- EL1 vector table installed in
M0 (2026-06-04) — DONE. Boot skeleton boots on QEMU
virt; serial printsHello from Swift kernel.make testpasses. Files:kernel/arch/aarch64/{boot.S,kernel.ld,io.h},kernel/drivers/uart.swift,kernel/main.swift,Makefile,tests/boot_test.sh.
Risk remediation arc (post-M13) — planning started 2026-06
A dedicated plan now exists in docs/RISK_REMEDIATION_ROADMAP.md. It addresses the structural risks
that became visible once the M8–M13 + N goals were complete:
- SMP (single-core was an explicit hard constraint through M13; it is now required for the server/AI-hosting profile and for credible scaling).
- Completion of the capability model (the "flag + ambient" version shipped for M12/M13; the handle-based model with spawn-with-handles and IPC is designed in CAPABILITIES.md but not yet implemented beyond syscall number reservations and the CellId tag).
- Moving privileged in-kernel drivers and the network stack toward the documented restartable userland service model (once IPC exists).
- Making global mutable state (scheduler, PMM, VFS pools, net engine) safe for concurrent execution.
- Other gaps noted in the plan (signal frames, observability, A/B updates, etc.).
The arc follows the project rules exactly: one (sub)milestone at a time, each must build + boot (including
on -smp N) + pass tests (with new concurrency stress where relevant) + be committed + reviewed before
the next. C-arc work (explicit handles + IPC) is recommended early because it is both a risk mitigation
in its own right and a prerequisite for a sane multi-core driver/service model.
See the new document for the detailed S0–S5 SMP phases, recommended sequencing, decision forks that require explicit review ("ask, don't guess"), and acceptance criteria style.
C-arc checkpoints (post-M13)
S0a — current CPU id + parked-SMP smoke harness (DONE, 2026-06-08)
- Current CPU primitive. Added an AArch64
read_mpidr_el1()bridge and a smallcurrentCpuId()Swift helper that returns MPIDR_EL1 Aff0. For the first QEMUvirtSMP release this records the assumption that Aff0 is the CPU index; secondary CPUs still park inboot.Sand do not execute Swift/kernel work yet. - Boot marker. Early boot now logs
[I] smp: S0 OK: foundations readyon the primary CPU after the timer/logger are initialized. The log call carriescurrentCpuId()as its structured detail; the current formatter omits zero-valued detail on CPU0, but the call site is ready to become visible once nonzero secondary CPU paths exist. - SMP smoke harness. Added
tests/smp_boot_test.shplusmake smp-test/make s0-test. The harness boots the existing kernel with-smp ${SMP_CPUS:-4}and the normal DTB/base-image virtio arguments, then asserts stable boot markers. Pre-S1, this proves extra QEMU CPUs remain safely parked and do not perturb the single-CPU path. - Non-goals. No secondary CPU release, no per-CPU scheduler state, no timer PPIs on secondaries, no IPIs, no atomics/locking policy, no TLB shootdown. Those remain S0b/S1+ work after review.
S0b — barrier and atomic primitive shims (DONE, 2026-06-09)
- C bridge primitives. Added Swift-callable
dmb ish/ishld/ishstwrappers and a minimal u64 atomic vocabulary (load,store,fetch_add,compare_exchange) inio.h, backed by LLVM/C11__atomicbuiltins with acquire/release or acquire-release ordering. These are the primitives future PMM bitmap operations, VFS refcounts, and scheduler cross-CPU state will build on; no subsystem is migrated to them in this checkpoint. - Swift facade + early self-test. Added
kernel/smp/atomic.swiftwith small Embedded Swift wrappers andsmpAtomicSelfTest(). The boot path runs the self-test after timer/log startup and logs[I] smp: S0b OK: atomics and barriers readyonly after load/store, fetch-add, successful CAS, failed CAS, and barrier calls all complete. - Tests / acceptance.
make smp-testasserts the S0b marker while booting QEMU with-smp 4and parked secondaries. The normal 1-CPU boot path also runs the self-test; failures panic before userland. - Non-goals. No locks, no PMM/VFS conversion, no scheduler changes, no secondary CPU release, and no performance policy choice for a UP fast path.
S0c — executable SMP mutable-state audit (DONE, 2026-06-09)
- Audit manifest. Added
docs/SMP_STATE_AUDIT.md, recording the top-level mutable kernel storage that must become per-CPU, protected, IRQ/boot-only, or driver/service-owned before S1/S2 can safely run kernel work on secondary CPUs. This is intentionally a review artifact, not a behavior change. - Executable coverage check. Added
scripts/smp-global-audit.pyandtests/smp_state_audit_test.sh. The scanner lists top-level Swift stored globals plus top-level C mutable definitions; the test fails if the audit doc does not cover a scannedpath:symbolentry. Current coverage is 160 entries, includingsystemTicks, process/scheduler globals, VFS tables, virtio state, network socket/TCP globals, PMM/heap state, and early MMU tables. - Test integration.
make testnow runs the audit check with the host checks, andmake s0-testrunssmp-state-auditbefore the parked SMP smoke. - Non-goals. No locks, no per-CPU conversion, no C4/VFS/process behavior changes, no secondary CPU release, and no resolution of the S0 uniprocessor fast-path decision.
S0d — fixed per-CPU state scaffold (DONE, 2026-06-09)
- Heap-free per-CPU storage. Added
kernel/smp/percpu.swiftwith anInlineArray<8, SMPPerCpuState>so the first per-CPU state is fixed storage, not a Swift heap array. The scaffold records initialization, logical CPU id, per-CPU timer ticks, the mirrored kernel-thread id, and a reserved process id slot for later S2 work. - Primary-CPU init + self-test. CPU0 now runs
smpEarlyInitCurrentCpu()andsmpPerCpuSelfTest()during boot, then logs[I] smp: S0d OK: per-CPU state ready. The self-test validates CPU indexing, timer-tick recording, current-thread mirroring, current-process mirroring, and barrier calls before interrupts are enabled. - Toward S2 without behavior change. The generic timer mirrors ticks into
the current CPU's per-CPU slot, and the kernel-thread scheduler mirrors
currentThreadafter initialization and context-switch selection. The old single-CPU scheduler/process tables remain authoritative in S0. - Tests / acceptance.
make s0-testasserts the S0d marker under-smp 4; the normal 1-CPU boot path also runs the self-test and panics before userland on failure. - Non-goals. No secondary CPU release, no per-CPU run queues, no process scheduler conversion, no locking protocol, no VFS/C4 work, and no uniprocessor fast-path decision.
S0e — secondary park mailbox scaffold (DONE, 2026-06-09)
- Mailbox-aware park loop. Secondary CPUs now branch to a dedicated
boot.Spark loop instead of the generic hang loop. The loop bounds-checksAff0, selects a fixed 64-byte per-CPU mailbox slot, acquire-loads a release flag, and waits withwfe, making it safe for a later S1 release path to wake CPUs with a mailbox write plussev. - No secondary release yet. The S0e path deliberately stays parked even if a
non-zero entry appears: secondary stacks, allocator policy, and shared-state
locks are still S1/S2 work. The self-test asserts both mailbox words are zero
and emits
[I] smp: S0e OK: secondary park mailbox ready. - Audit visibility. The mailbox table lives in
kernel/smp/secondary.cand is forced into.data.smp_mailbox, not.bss, because secondary CPUs may reach the park loop before CPU0 clears BSS. The S0c mutable-state audit now records the table. - Tests / acceptance.
make s0-testasserts the S0e marker under-smp 4and checks the mailbox table is linked into.datawith the expected 8-slot size and 64-byte alignment. The normal 1-CPU boot path also runs the self-test and panics before userland on failure.
S0f — DTB CPU topology scaffold (DONE, 2026-06-09)
- CPU topology parsing. The pure FDT reader now records
/cpus/cpu@Ntopology from QEMUvirtDTBs into fixed 8-slot Aff0 storage. No heap-backed arrays are introduced, and secondary CPUs still remain parked below kernel Swift code. - Platform handoff + self-test. After the MMU is enabled, the platform layer
compares the bootloader-provided DTB with the direct-boot injected DTB address
(
0x4FF0_0000) and copies the richest discovered CPU count/Aff0 map into the global platform record. The boot path validates the count against the S0 per-CPU/mailbox limit and logs[I] smp: S0f OK: CPU topology readywith the discovered CPU count. - DTB-consistent SMP smoke.
tests/smp_boot_test.shnow dumps a QEMU DTB with the same-smp ${SMP_CPUS}value it boots, then asserts the S0f count marker. The host FDT test covers both 1-CPU and-smp 4DTBs. - Non-goals. No PSCI/spin-table choice, no secondary release, no scheduler conversion, no GIC/timer work on secondaries, no C4/VFS/process changes.
S0g — PSCI discovery scaffold (DONE, 2026-06-09)
- QEMU DTB evidence. Verified locally with QEMU 11.0.1 by dumping/decompiling
virtDTBs for-smp 1,-smp 4, and-smp 8. QEMU advertises a/pscinode withcompatible = "arm,psci-1.0", "arm,psci-0.2", "arm,psci",method = "hvc", andcpu_on = <0xc4000003>. Per-CPUenable-method = "psci"appears when there are secondary CPUs (-smp > 1); the single-CPU DTB omits it because there is no secondary to release. - Fixed discovery fields. The pure FDT reader now records PSCI presence,
call method, CPU_ON function ID, and a fixed 8-bit Aff0 mask of CPU nodes that
advertise
enable-method = "psci". This is heap-free and uses the existing post-MMU platform handoff so early Device-typed RAM parsing still avoids wide struct copies. - Boot self-test + tests. The S0 boot path validates PSCI discovery and logs
[I] smp: S0g OK: PSCI discovery readywith the PSCI enable mask. The host FDT test checks the PSCI node and CPU enable-method behavior for both 1-CPU and-smp 4DTBs; the SMP smoke test checks the S0g marker for arbitrarySMP_CPUSup to the current 8-slot scaffold. - Non-goals. No S1 release mechanism is chosen here, no HVC/SMC call is issued, no secondary stacks are allocated, no secondary enters Swift/kernel code, and no scheduler/GIC/timer/C4/VFS/process state changes are made.
S0h — full-test parked SMP gate (DONE, 2026-06-09)
- Default gate coverage. The normal
make testsuite now runsSMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb tests/smp_boot_test.shafter the classic single-core boot smoke. This makes the roadmap's S-series rule executable: the default gate covers both the 1-CPU path and QEMUvirt -smp 4parked-SMP path. - Reuse existing smoke. The integrated smoke is the same S0 harness used by
make smp-test/make s0-test: it asserts the S0/S0b/S0d/S0e/S0f/S0g markers and fails on missing/incomplete DTB discovery. The full suite reuses the already-dumped-smp 4DTB; explicitSMP_CPUS=1andSMP_CPUS=8remain useful focused checks. - Non-goals. No secondary release, no new scheduler/GIC/timer behavior, and no C4/VFS/process changes. This is test-gate hardening only.
S0i — pre-S1 release guard and SMP headroom (DONE, 2026-06-09)
- Executable no-release guard.
tests/smp_release_guard_test.shnow disassembles the built kernel/boot object and fails if pre-S1 code containshvc/smcPSCI calls, an indirect secondary-entry branch fromboot.S, or writes to the parked secondary mailbox release fields. The fullmake testsuite runs this cheap guard before the mutable-state audit. - Parked headroom smoke.
tests/smp_headroom_test.shreuses the existing parked-SMP boot harness for-smp 1and-smp 8;make s0-testnow covers the audit, mailbox layout, release guard, default-smp 4, and headroom boots. This keeps the S0/S1 handoff executable without lengthening the full product gate beyond the S0h-smp 4check. - Verifier stability. During S0i validation, the network smoke drivers were
moved toward fail-fast FIFO marker waits and dynamic host ports, and the Swift
lssmoke now drives the serial console by markers instead of fixed sleeps. This keeps full-gate failures diagnosable and avoids parallel-worktree port conflicts without changing kernel or network behavior. - Non-goals. No S1 protocol choice, no CPU_ON/HVC/SMC call, no secondary stacks, no GIC/timer initialization on secondaries, and no C4/VFS/process changes. This milestone preserves the review boundary before S1.
S0j — S1 preflight gates (DONE, 2026-06-09)
- Fresh QEMU topology evidence.
tests/smp_s1_preflight_test.shdumps current QEMUvirtDTBs for-smp 1,2,4, and8, then runs the same hostfdt_testparser the kernel shares. This validates the DTB-visible S1 inputs: CPU Aff0 slots,enable-method = "psci"for secondary-capable topologies, PSCI method/function ID, the existing GICv2/UART/virtio map, and the ARM generic timer's non-secure physical PPI (INTID 30) with the expected per-CPU PPI target mask. - UEFI parked-SMP smoke.
tests/uefi_boot_test.shnow acceptsSMP_CPUS, boots the real GPT disk through AAVMF with-smp 4in the newsmp-uefi-testtarget, and asserts the S0 markers, CPU topology count, and PSCI enable mask before reaching busybox. This covers the S0 parked-SMP path for both direct-kerneland UEFI/disk boot. - Gate integration.
make testruns the preflight next to the existing FDT checks and adds the UEFI-smp 4smoke after the single-CPU UEFI boot.make s0-testincludes the preflight and UEFI SMP smoke around the direct parked boot smokes. This keeps the S0/S1 handoff executable. - Non-goals. No C4/VFS/process behavior changes.
S0l — full-gate mailbox ABI guard (DONE, 2026-06-09)
- Full gate coverage. The normal
make testsuite now runstests/smp_mailbox_layout_test.shbefore the release guard, so the secondary mailbox ABI (.data, 512 bytes, 64-byte alignment) is checked in the same overnight/product gate that would catch accidental release-path regressions. - Verifier hardening. The mailbox layout script now fails clearly when the
expected
llvm-objdumptool is unavailable. - Non-goals. No mailbox layout change, no kernel/C4/VFS/process behavior changes.
S0m — legacy QEMU smoke harness hardening (DONE, 2026-06-09)
- Prompt-driven legacy drivers. Older QEMU smoke tests that still drove serial input with fixed sleeps now use FIFO stdin plus bounded waits for the relevant prompts or acceptance markers. This covers the tty, disk exec, console-login, cap enforcement, and throwaway disk VFS tests.
- Early-probe waits. The virtio-blk and virtio-net smoke tests now wait for their boot-time success markers before cleanup instead of killing QEMU after a fixed delay.
- Non-goals. No kernel, filesystem, process, capability, or userland behavior is changed.
S0n — native Swift file-tool harness hardening (DONE, 2026-06-09)
- Prompt-driven Swift tool tests. The native Swift coreutils, fileops, and chmod/chown smoke tests now drive QEMU through FIFO stdin and wait for login prompts plus existing output markers instead of relying on fixed serial sleeps.
- Assertions unchanged. The tests still verify the same
/bin/echo,/bin/cat,/bin/pwd, tmpfs mutation, chmod, and chown behavior. - Non-goals. No Swift userland behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0o — timed Swift tool harness hardening (DONE, 2026-06-09)
- Prompt-driven timed tool tests. The native Swift recursive
rm,head/wc/touch,date, and sleep smoke tests now drive QEMU through FIFO stdin and wait for login prompts plus existing output markers instead of relying on fixed serial sleeps. - Assertions unchanged. The tests still verify recursive removal semantics,
head/wc/touch output, RTC-backed
/bin/date, and timer-backed nanosleep/busybox sleep behavior. - Non-goals. No Swift userland behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0p — runtime demo harness hardening (DONE, 2026-06-09)
- Prompt-driven runtime demos. The userland threads/futex and
mmap/munmap/mprotect/W^X smoke tests now drive QEMU through FIFO stdin and
wait for login prompts plus their existing
threadsdemo/mmapdemosuccess markers instead of relying on fixed serial sleeps. - Assertions unchanged. The tests still verify
counter=4000for the futex thread demo and the same B1/B2/W^X mmap markers. - Non-goals. No runtime behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0q — calc REPL harness hardening (DONE, 2026-06-09)
- Prompt-driven calc REPL.
tests/calc_test.shnow drives QEMU through FIFO stdin, waits for the login shell and/bin/calcbanner, then feeds the same REPL session without fixed serial sleeps. - Assertions unchanged. The test still verifies precedence, parentheses,
assignment, lookup, modulo, unary minus, division-by-zero reporting,
:sum, and bounded heap break across churn. - Non-goals. No calc behavior, Swift runtime behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0r — HTTP server harness hardening (DONE, 2026-06-09)
- Prompt-driven httpd launch.
tests/httpd_test.shnow drives QEMU through FIFO stdin, waits for the tty demo, login shell, andhttpd: listening on 8080marker, then runs the existing curl acceptance checks without fixed serial input sleeps. - Assertions unchanged. The test still verifies concurrent index requests,
/hello.txtserving plustext/plain, generated/sub/directory listings, 404 on missing paths, and multiplehttpd: 200serial markers. - Non-goals. No HTTP server behavior, networking behavior, filesystem behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0s — serial vi harness hardening (DONE, 2026-06-09)
- Prompt-driven vi session.
tests/vi_test.shnow drives QEMU through FIFO stdin, waits for the tty demo, login shell, vi alternate-screen entry, inserted text echo, saved-file readback, and trailing shell marker instead of relying on fixed serial input sleeps. - Assertions unchanged. The test still verifies busybox vi enters the
alternate screen, saves
/tmp/vitest, returns a cleanhello-from-viline viacat, keeps the shell alive, and avoids kernel panics. - Non-goals. No vi behavior, terminal behavior, filesystem behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0t — UDP echo harness hardening (DONE, 2026-06-09)
- Prompt-driven UDP smoke.
tests/udp_echo_test.shnow writes serial input immediately after awaited tty/login markers, waits forudpecho: listening on 5555, sends the host datagram, and waits for the guest receive marker instead of relying on short fixed guard sleeps. - Assertions unchanged. The test still verifies that
/bin/udpechobinds, receives eight bytes from the slirp host, and echoesswos-udpback to hostnc. - Non-goals. No UDP behavior, socket behavior, networking behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0u — TCP connect harness hardening (DONE, 2026-06-09)
- Prompt-driven TCP client smoke.
tests/tcp_connect_test.shnow writes serial input immediately after awaited tty/login markers and waits for thesrv-replyclient output instead of sleeping after launching/bin/tcpget. - Assertions unchanged. The test still verifies that
/bin/tcpgetconnects to the slirp host, receivessrv-reply, and transmitsGET swoson the captured pcap path. - Non-goals. No TCP behavior, socket behavior, networking behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0v — Swift ls harness hardening (DONE, 2026-06-09)
- Prompt-driven native ls smoke.
tests/swift_ls_test.shnow writes serial input immediately after awaited tty/login and command-output markers instead of using short fixed guard sleeps before/bin/lsinvocations. - Assertions unchanged. The test still verifies plain
/etclisting, long-format/etc/motd,/etc/swos, and single-file/bin/busyboxowner, group, mode, size, and timestamp formatting. - Non-goals. No
/bin/lsbehavior, VFS behavior, filesystem behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0w — TCP echo harness hardening (DONE, 2026-06-09)
- Prompt-driven TCP server smoke.
tests/tcp_echo_test.shnow writes serial input immediately after awaited tty/login markers, waits fortcpecho: listening on 5555, and uses a bounded regex wait for the guest's receive marker instead of serial guard sleeps and hand-written polling loops. - Assertions unchanged. The test still preserves the one-shot TCP retry
model, verifies guest receive logging, and verifies that host
ncreceives the echoedswos-tcppayload. - Non-goals. No TCP behavior, socket behavior, networking behavior, kernel behavior, C4/VFS/process behavior, or S1 release behavior is changed.
S0x — SMP audit freshness guard (DONE, 2026-06-09)
- Bidirectional manifest check.
tests/smp_state_audit_test.shnow records the scanner output once, verifies every scanned mutable global is documented, and also rejects stale backtickedkernel/...:symbolentries that no longer appear inscripts/smp-global-audit.pyoutput. - Audit contract clarified.
docs/SMP_STATE_AUDIT.mdnow states that the executable check covers both missing and stale manifest entries. - Non-goals. No SMP release behavior, locking policy, kernel behavior, C4/VFS/process behavior, or S1 design decision is changed.
S0y — hermetic S1 preflight target (DONE, 2026-06-09)
- Direct target hygiene.
make smp-s1-preflightnow has an order-only dependency on$(BUILD)/.dirbefore writingbuild/fdt_test, so the focused preflight target is hermetic from a clean checkout/build directory. - Assertions unchanged. The target still builds the host FDT parser and runs the same QEMU virt DTB PSCI/GIC/timer/topology preflight.
- Non-goals. No preflight semantics, SMP release behavior, kernel behavior, C4/VFS/process behavior, or S1 design decision is changed.
S1 — secondary CPU bring-up and per-CPU early init (DONE, 2026-06-09)
- Policy decision. S1 records the S0 decision point as "always use the general SMP path." There is no compile-time or boot-time uniprocessor fast path; the simpler single path is preferred until measured cost justifies a later optimization.
- Release protocol. CPU0 now publishes each secondary's mailbox slot
(
entry,stack_top,argument, then a release-store flag), sendssev, and issues the DTB-selected PSCICPU_ONcall (hvcorsmc,cpu_on = 0xc4000003on the QEMU 11.0.1virtDTB). This deliberately supports both powered-off PSCI secondaries and eager-kernelsecondaries that reached the mailbox park loop first. - Secondary entry.
smp_secondary_entryderives the real CPU id fromMPIDR_EL1, loads the fixed per-CPU stack from the mailbox table, installs the EL1 vector table, enables FP/SIMD and the kernel identity MMU regime, then enterssmp_secondary_main. Secondary stacks are static, 32 KiB each, and covered bydocs/SMP_STATE_AUDIT.md. - Early online + timer PPI only. A secondary CPU initializes only its per-CPU state, its GIC CPU interface, and its banked physical timer PPI. Its IRQ path records per-CPU timer ticks and EOIs the interrupt; scheduler, process, VFS, PMM allocation, drivers, and EL0 work remain CPU0/S2+ concerns.
- Tests / acceptance.
tests/smp_release_guard_test.shis now an S1 release contract guard instead of an S0 no-release guard.tests/smp_boot_test.shassertsS1 CPU onlinemarkers for every discovered CPU plus[I] smp: S1 OK: secondary CPUs online detail=N;make s1-testcovers the mutable-state audit, mailbox layout, release contract,-smp 4, and headroom-smp 1/-smp 8boots. - Non-goals. No EL0 process runs on a secondary, no per-CPU run queues, no PMM/VFS/driver locking policy, no IPIs, and no cross-CPU TLB work. Those are S2/S3+ work.
S2a — secondary timer / scheduler-boundary readiness gate (DONE, 2026-06-09)
- Timer evidence is now explicit. S1 already required each discovered CPU to
record at least one banked physical-timer tick before declaring bring-up
complete. S2a logs one
S2a OK: per-CPU timer heartbeat readymarker per CPU after that condition is true, usingdetail = cpu_id + 1so CPU0 also has an explicit payload. - Scheduler boundary guard. Before logging
S1 OK, the bring-up path now verifies that every secondary per-CPU scheduler slot still has no current thread, no current process, no run queue, and no scheduler context pointer. This preserves the S1/S2 boundary: secondary CPUs can take timer PPIs, but scheduler, process, VFS, PMM, drivers, and EL0 work remain CPU0-only until S2 deliberately changes that contract. - CPU0 ownership seam. After
schedulerInitandprocessInit, boot runs an S2a self-test that requires the primary CPU to be online, to own a scheduler thread slot in the per-CPU scaffold, and to have no active EL0 process yet. The EL0 scheduler loop now mirrorscurrentProcinto the current CPU's per-CPU state while a process is switched in, then clears it on return to the scheduler. Today that only records CPU0 state; S2 will use the same seam when scheduler ownership becomes per-CPU. - Tests / acceptance.
tests/smp_boot_test.shnow asserts the S2a heartbeat markers for every-smp NCPU plusS2a OK: scheduler boundary heldandS2a OK: scheduler owner ready.make s1-testexercises those checks for the default-smp 4, headroom-smp 1/-smp 8, and UEFI-smp 4paths through the existing S1 gate. - Harness / guard hardening. A follow-up tightened the cheap release guard
so it also verifies the S2a accessors, boot ordering
(
schedulerInit->processInit->smpS2ReadinessSelfTest), and EL0currentProcmirroring into per-CPU state.tests/smp_boot_test.shnow escalates QEMU cleanup from TERM to KILL after a bounded grace period, so a failed expectation cannot strand the expensivemake s1-testgate inwait. - Non-goals. No EL0 work moves to secondary CPUs, no run queues are added, and no scheduler/process/VFS/PMM locking policy changes in this checkpoint.
S2b — per-CPU EL0 scheduler-context scaffold (DONE, 2026-06-09)
- Process scheduler context storage. The EL0 process scheduler context is no
longer a singleton
schedCtx[1].processInitnow allocates one fixedCPUContextper supported SMP CPU (smpMaxCpuCount()), initializes every slot, and all process scheduler switches select the slot forcurrentCpuId(). Today only CPU0 reaches those switch paths, so this is a storage/readiness scaffold rather than a scheduling policy change. - Runtime readiness marker. Boot runs
processSchedulerContextSelfTestimmediately after the S2a scheduler-owner check. The self-test verifies the context array size,CPUContextstride, alignment, primary CPU index validity, CPU0's published per-CPU process scheduler context, and zeroed initial contexts before the first EL0 process switch, then logsS2b OK: process scheduler context scaffold ready. - Secondary EL0 guard. Each EL0 process switch increments the current CPU's
per-CPU EL0 switch counter. After the Swift
psdemo has run, boot verifies that CPU0 recorded EL0 switches and every secondary CPU still has zero EL0 switches, then logsS2b OK: no secondary EL0 execution. - Owner guard. Until S2 proper deliberately moves EL0 scheduling off CPU0,
the process scheduler context helper and
processOnTickpanic if entered on any non-owner CPU. This keeps the per-CPU storage scaffold from hiding an accidental secondary scheduler entry. - Static guard.
tests/smp_release_guard_test.shnow rejects a return to singleton process scheduler context usage, verifies the S2b helper/self-test hooks, checks the CPU0 owner guard, verifies thatirqHandlerstill gatesprocessOnTickto CPU0, and checks that S2b runs after S2a in the boot order. - Non-goals. No EL0 work moves to secondary CPUs, no per-CPU run queues are active yet, no cross-CPU wake/IPI path is added, and scheduler/process/VFS/PMM locking policy remains S2+ work.
S2c — kernel-thread scheduler ownership guard (DONE, 2026-06-09)
- Kernel scheduler owner guard. The M4.5 kernel-thread scheduler now has an
explicit CPU0 owner check at its public and internal scheduler boundaries.
schedulerInit,threadCreate,schedule,schedYield,schedulerTick,schedAllThreadsDone, andthread_exitpanic if they are reached from any non-owner CPU, so the existing globalcurrentThread/statesscheduler cannot silently appear per-CPU-safe before S2 proper. - Per-CPU ownership evidence. The fixed per-CPU state keeps its 64-byte
stride: the former reserved 32-bit slot is now
kernelSchedulerActivityCount, while the S2b EL0 switch counter remains a full 64-bit counter. CPU0 marks the kernel scheduler ready in per-CPU flags, and real kernel-thread context switches increment the current CPU's kernel scheduler activity counter. - Runtime acceptance. Boot runs
kernelSchedulerOwnershipSelfTestandsmpS2cKernelSchedulerReadinessSelfTestafterschedulerInit, then logsS2c OK: kernel scheduler owner ready. After the M4.5 scheduler demo and before any EL0 demos, boot verifies CPU0 recorded kernel scheduler activity and every secondary still has zero kernel scheduler activity, then logsS2c OK: no secondary kernel scheduler execution. - Static guard.
tests/smp_release_guard_test.shnow checks the kernel scheduler owner helper, per-CPU kernel scheduler ready/activity state, CPU0 timer IRQ gating forschedulerTick, and the S2c boot-order contract. - Non-goals. No per-CPU run queues are active yet, no kernel thread can run on a secondary CPU, no cross-CPU wake/IPI path is added, and PMM/VFS/process locking policy remains S2+ work.
S2d — EL0 process run queue scaffold (DONE, 2026-06-09)
- Queue-backed EL0 scheduling. The EL0 process scheduler no longer picks
runnable slots by scanning
pStatewith a global round-robin cursor. EverypReadytransition now goes throughmarkProcessReady, which records a home CPU, links the slot into that CPU's FIFO run queue, and mirrors the queue head/tail into the fixed per-CPU state.pickReadydequeues from the current CPU's queue and verifies the slot still belongs to that CPU. - CPU0 placement, deliberately.
processHomeCpuForNewReadySlotis the new placement hook, but S2d intentionally returns CPU0 for every runnable process and panics if a process is enqueued to any secondary CPU. This makes the S2 policy boundary explicit without enabling secondary EL0 execution early. - Runtime acceptance. Boot runs
processRunQueueScaffoldSelfTestafter the S2b process scheduler context check and logsS2d OK: process run queue scaffold ready. After the Swiftpsuserland demo, boot verifies CPU0 observed run queue enqueue/dispatch activity and every secondary CPU still has an empty process run queue, then logsS2d OK: process run queue stayed CPU0-owned. - Static guard.
tests/smp_release_guard_test.shnow checks the per-CPU process run queue mirror helpers, the process scheduler run queue arrays, rejects the oldrrCursor/linear-scan scheduler path, and verifies thatpReadytransitions are centralized throughmarkProcessReady. - Non-goals. No EL0 work moves to secondary CPUs, no cross-CPU wake/IPI path is added, and PMM/VFS/process locking policy remains S2+ work.
S2e — dormant per-CPU EL0 scheduler publication (DONE, 2026-06-10)
- Dormant scheduler contexts for every CPU.
processInitnow publishes the exactschedCtx[cpu]address and an empty process run queue mirror into every fixed per-CPU state slot. CPU0 performs this publication during boot; it does not require secondary CPUs to enter process scheduler code. - Idle means no execution, not no resources.
smpPerCpuSchedulerIdlenow treats a nonzero dormant process scheduler context as allowed idle state. The idle invariant remains strict about current thread/process ownership, run queue emptiness, kernel scheduler activity, EL0 switch count, and the kernel scheduler-ready flag. - Runtime acceptance. Boot runs
processDormantSchedulerCpusSelfTestafter the S2d run queue scaffold check and logsS2e OK: dormant process scheduler CPUs published. After the Swiftpsuserland demo, boot verifies secondary scheduler contexts still point at their dormant slots, secondary run queues remain empty, and secondary EL0 switch counts remain zero, then logsS2e OK: secondary process scheduler contexts stayed dormant. - Static guard.
tests/smp_release_guard_test.shnow checks the addressable per-CPU scheduler context/runqueue publication helpers, rejects a CPU0-only context publication regression, and verifies the S2e boot-order contract. - Non-goals. No secondary CPU dispatches EL0 work, no cross-CPU wake/IPI path is added, and PMM/VFS/process locking policy remains S2+ work.
S2f — EL0 process dispatch CPU telemetry (DONE, 2026-06-10)
- Actual-dispatch telemetry. The process scheduler now records the CPU that
actually dispatches each EL0 process slot (
pLastDispatchCpu), a per-slot dispatch count, and a small CPU bitmask of CPUs that have dispatched the slot. A per-CPU aggregate telemetry counter is incremented at the same switch-in site and is cross-checked against the existing per-CPU EL0 switch counter. This is the cheap "last CPU" and history evidence needed by the later S2 acceptance test, without changing placement policy yet. - CPU0 owner guard remains strict.
recordProcessDispatchstill panics if an EL0 process is dispatched on a secondary CPU or if the process home CPU and dispatch CPU diverge. S2f is observability/readiness work, not the point where secondary EL0 execution starts. - Runtime acceptance. Boot runs
processDispatchTelemetrySelfTestafter the S2e dormant scheduler publication check and logsS2f OK: process dispatch telemetry ready. After the Swiftpsuserland demo, boot verifies the dispatch telemetry aggregate matches CPU0's EL0 switch count and every secondary CPU remains at zero, then logsS2f OK: process dispatch telemetry stayed CPU0-owned. - Static guard.
tests/smp_release_guard_test.shnow checks the dispatch telemetry fields/helper/self-tests, verifies the telemetry write is on the actual EL0 switch path beforesmpRecordEl0SwitchForCurrentCpu, and enforces the S2f boot-order contract. - Non-goals. No process migrates between CPUs, no secondary CPU dispatches EL0 work, no cross-CPU wake/IPI path is added, and no procstat/userland ABI is widened in this checkpoint.
S2g — coproc pair dispatch telemetry harness (DONE, 2026-06-10)
- Coproc pair evidence before reap.
processRunPairnow captures each process slot's dispatch count, last dispatch CPU, and dispatch CPU mask after the pair has exited but before either slot is reaped. This preserves the exact evidence the later S2 acceptance needs from the existingcoprocdemo, where the target will become "the two processes ran on different CPUs". - Current invariant remains CPU0-only. The S2g guard requires both
coprocprocesses to have dispatched at least once and to have CPU0-only dispatch masks. It does not migrate work, change the placement hook, add cross-CPU wakeups, or enable secondary EL0 execution. - Runtime acceptance. After
runConcurrentDemoprintsM8d OK: two EL0 processes ran concurrently, boot runsprocessCoprocPairDispatchTelemetrySelfTestbefore later demos can reuse the slots. S2h now owns the runtime dispatch marker and logs eitherS2h OK: coproc pair dispatched across scheduler CPUsor the explicit CPU0 fallback marker. - Static guard.
tests/smp_release_guard_test.shchecks the last-pair telemetry fields, verifiesprocessRunPaircaptures telemetry beforereapProcess(a), and enforces that the S2g guard runs immediately after the concurrent EL0 demo. - Non-goals. No secondary CPU dispatches EL0 work, no scheduler placement policy changes, no IPI/cross-CPU wake path is added, and no userland ABI is widened in this checkpoint.
S2h — restricted coproc multi-CPU EL0 dispatch (DONE, 2026-06-10)
- Secondary EL0 scheduler gate. The S1 secondary loop now polls a process
scheduler service hook before returning to IRQ-enabled
wfi. CPU0 can set a run mask for one secondary CPU, wait for that CPU to enter its per-CPU process scheduler context, and later set a stop mask so the secondary returns to its idle loop. The hook is only enabled by theprocessRunPairacceptance path; general secondary process scheduling remains off. - Per-CPU current process state. The old singleton
currentProcis now a per-CPU slot mirror, so syscalls, user access, VFS capability checks, logging, timer accounting, and signal paths read the process running on the current CPU. The kernel scheduler remains CPU0-owned; only the EL0 process scheduler uses the restricted secondary hook. - Safe cross-CPU reap boundary. A process is not reapable until it has returned to its scheduler stack. After every process context switch back to a scheduler context, the scheduler switches TTBR0 back to the kernel address space and marks the slot quiesced before another CPU may free the process kernel stack or page tables.
- Runtime acceptance. On
-smp 4, the existingcoprocpair runs with one process on CPU0 and one on a secondary scheduler CPU, then logsS2h OK: coproc pair dispatched across scheduler CPUs,S2h OK: process scheduler quiesced after multi-CPU dispatch, andS2h OK: secondary EL0 gate closed after restricted dispatch. On-smp 1, the same path logs an explicit CPU0 fallback marker.tests/smp_boot_test.shcovers both forms;tests/uefi_boot_test.shchecks the markers on firmware boots. - Harness hardening.
tests/boot_test.shnow builds a QEMU virt DTB when a clean worktree lacksbuild/virt.dtb, assembles QEMU argv through a non-empty array underset -u, and uses the same escalating QEMU cleanup style as the SMP harness. Interactive Swift/userland smoke drivers send shell input with short per-character pacing and explicit completion markers to avoid FIFO overrun flakes under long full-suite runs. - Non-goals. No process migration, reschedule IPI, TLB shootdown protocol, shared-address-space thread execution on secondary CPUs, or broad VFS/PMM concurrency is enabled here. The remaining S2 work is the general stress path: N runnable EL0 processes, cross-CPU wakeups, and no scheduler corruption under sustained timer preemption.
S3a — address-space active CPU mask preflight (DONE, 2026-06-10)
- Active-address-space evidence. The EL0 scheduler now records
pAddressSpaceCpuMask[slot]and a per-CPUprocessAddressSpaceActivationCountafter installing a process TTBR0 withaddress_space_switch(pTtbr0[s])and before accounting the EL0 switch. This is the cheap active-CPU evidence S3 needs before real TLB shootdown targeting can be implemented. - Restricted-secondary invariant. The recorder is still protected by the S2h scheduler run mask and cross-checks against dispatch telemetry, so only CPU0 and the explicitly started S2h secondary scheduler CPU may activate a process address space in this checkpoint.
- Runtime acceptance. Boot runs
processAddressSpaceCpuMaskSelfTestafter S2h readiness and logsS3a OK: address-space CPU mask scaffold ready. After the userland demos, boot runsprocessAddressSpaceCpuMaskPostRunSelfTestafter the S2h gate-closed guard and logsS3a OK: address-space CPU masks matched dispatch CPUs. - Static guard.
tests/smp_release_guard_test.shrequires the S3a fields, recorder, self-tests, marker ordering, and the exact switch-path order:address_space_switch(pTtbr0[s])->recordProcessAddressSpaceActivation->smpRecordEl0SwitchForCurrentCpu. - Non-goals. No TLB invalidation behavior changes, no process migration, no shared-address-space cross-CPU execution, and no broad scheduler placement is enabled in this checkpoint.
S3b — GIC SGI / IPI substrate preflight (DONE, 2026-06-10)
- GICv2 SGI sender.
kernel/drivers/gic.swiftnow reserves SGI ID 1 for SMP IPIs, enables SGIs per CPU interface, and writes GICD_SGIR at offset0xF00in target-list mode (SGIINTID[3:0],CPUTargetList[23:16],TargetListFilter[25:24] = 0b00). The encoding was checked against QEMU 11.0.1hw/intc/arm_gic.c, whosegic_dist_writelhandles offset0xf00by settingsgi_pending[irq][target_cpu]. - Parked secondaries can receive IPIs. After their early timer heartbeat,
secondary CPUs poll the restricted S2h scheduler hook and then sleep in an
IRQ-enabled
wfiloop. Their IRQ path still does no scheduler/VFS/driver work: timer PPIs only rearm the local timer or drive an already-active S2h process scheduler CPU, and SGI ID 1 only records atomic per-CPU IPI counters and source CPU. - Runtime acceptance. Boot runs
smpIpiSubstrateSelfTestafter S3a readiness. On SMP boots CPU0 sends SGI ID 1 to every discovered secondary, waits for the delivered mask, verifies the source CPU, and logsS3b OK: GIC SGI IPI substrate ready. After userland demos, boot verifies the IPI delivery mask stayed complete and secondary kernel scheduler state stayed idle, then logsS3b OK: IPI delivery stayed scheduler-safe. - Static guard.
tests/smp_release_guard_test.shrequires the SGIR offset, SGI sender/source helpers, IPI counters, IRQ handler hook, restricted S2h service loop, and the boot-order contract (S3a readiness -> S3b readiness -> demos, thenS2h quiesced -> S2h gate closed -> S3a matched -> S3b scheduler-safe). - Non-goals. No TLB shootdown protocol is implemented yet, no reschedule IPI is consumed by the scheduler, and no PMM/VFS/process locking policy changes in this checkpoint.
S3c — TLB shootdown IPI scaffold (DONE, 2026-06-10)
- Fixed request/ack protocol.
kernel/smp/percpu.swiftnow has separate fixed atomic TLB shootdown generations, ack generations, received counters, and probe masks. These stay outsideSMPPerCpuState, preserving the 64-byte scheduler slot while giving S3 a concrete per-CPU protocol to wire into future address-space active masks. - IPI handler consumption. SGI ID 1 still records the generic S3b IPI
counters, then consumes any pending TLB shootdown generation for the current
CPU. The S3c path performs only a local
tlbi_all()plus atomic ack/counter updates; it does not log, schedule, touch process state, allocate pages, or call VFS/driver code from a parked secondary CPU. - Runtime acceptance. Boot runs
smpTlbShootdownSelfTestafter S3b readiness. On SMP boots CPU0 publishes a generation to every discovered secondary, sends the reserved SGI, waits for the ack mask, verifies each target's ack generation and received count, and logsS3c OK: TLB shootdown IPI scaffold ready. After userland demos, boot verifies the ack mask stayed complete and secondary scheduler state stayed idle, then logsS3c OK: TLB shootdown path stayed scheduler-safe. - Static guard.
tests/smp_release_guard_test.shnow checks the S3c request/ack globals and helpers, boot-order placement, and the narrow TLB handler contract. The generic S3b handler still cannot inline logging, scheduler/process work, VFS/driver/PMM calls, or raw TLB instructions. - Non-goals. Existing VM page-table mutation sites still perform local invalidation because secondary EL0/address-space activation remains gated. The next S3 slice can connect this protocol to per-address-space active CPU masks once multi-CPU process execution is intentionally opened.
S3d — active-mask VM TLB flush facade (DONE, 2026-06-10)
- VM facade.
kernel/mm/vm.swiftnow routes TLB invalidation throughaddressSpaceFlushTlbForActiveCpuMask. The facade performs the page-table write barrier, invalidates the current CPU locally (tlbi vae1ortlbi vmalle1), and forwards any remote CPU bits to the S3c request/ack shootdown path. The exported C ABI entry points remain as current-CPU wrappers for inactive construction paths. - Process-owned active masks.
kernel/user/process.swiftexposesprocessCurrentAddressSpaceActiveCpuMaskand uses S3a'spAddressSpaceCpuMaskfor process-owned page-table mutations: heap growth rollback, anonymous mmap, demand-paged file mmap, munmap, mprotect, COW prepare/fault handling, and fork's parent COW rewrite. The current gate keeps the active mask CPU0-only, but the future multi-CPU path now has one explicit hook instead of scattered rawtlbi_*calls. - Runtime acceptance. Boot runs
processAddressSpaceTlbFlushFacadeSelfTestafter S3c readiness and logsS3d OK: address-space TLB flush facade ready. After userland demos, boot runsprocessAddressSpaceTlbFlushNoSecondarySelfTest, verifies active masks stayed CPU0-owned, and logsS3d OK: address-space TLB flush stayed CPU0-owned. - Static guard.
tests/smp_release_guard_test.shrequires the VM facade, active-mask variants, process active-mask helpers, COW/copyout routing, and the S3d boot-order contract. Generic IPI/TLB handlers remain constrained to no logging, scheduler, process, VFS, virtio, or PMM work from secondary IRQ context. - Non-goals. This checkpoint does not enable secondary EL0 execution, does not prove stale translation eviction across user threads on different CPUs, and does not change PMM/VFS/package-store concurrency policy.
S4a — PMM lock boundary and concurrent PageAllocator stress (DONE, 2026-06-10)
- Coarse PMM lock.
kernel/mm/pmm.swiftnow wraps the sharedPageAllocatorowner in an IRQ-save spinlock built from the S0b atomic CAS primitive. The exported PMM entry points (pmm_alloc_page,pmm_alloc_pages,pmm_free_page, COW ref/unref/refcount, and PMM counters) all enter through the samepmmWithAllocatorboundary, so the bitmap, hint, free-frame count, and refcount table are no longer raw global mutable state once secondary CPUs can call allocation paths. - Atomic last-ref release.
pmm_frame_releasedrops one COW reference and raw-frees the frame under the same PMM lock. VM user-frame teardown now uses this primitive instead of a splitpmm_frame_unref/pmm_free_pagesequence. - Executable checks. Boot runs
pmmS4aConcurrencySelfTest()after the S3d readiness checks, then sends a bounded PMM stress request to discovered secondary CPUs over the existing SGI/IPI path, and logsS4a OK: PMM lock boundary ready. After the userland demos, boot verifies the lock word is balanced, the PMM stress ack/failure masks are clean, and logsS4a OK: PMM lock boundary stayed balanced. - Host stress.
tests/page_allocator_test.swiftkeeps the existing unit and adversarial cases and adds an 8-worker threaded allocation/free/ref/unref stress through a synchronized wrapper over the same purePageAllocatorlogic. It asserts no duplicate live frames and full frame-count recovery. - Static guard.
tests/smp_release_guard_test.shrequires the PMM lock helpers, the atomic release primitive, the bounded SGI PMM stress path, rejects direct optional PMM allocator access outside the wrapper, and checks the S4a boot-marker order.tests/smp_boot_test.shand the UEFI boot smoke now require the S4a markers. - Non-goals. No per-CPU page magazines yet, no lock-free bitmap operations, no small-object heap synchronization, and no VFS/handle/package-store pool locking in this slice.
S4b — VFS lock boundary and handle accounting guard (DONE, 2026-06-10)
- Coarse VFS lock.
kernel/vfs/vfs.swiftnow protects the shared VFS mutable pools (node table, per-process handle slots, shared open descriptions, pipes, endpoints, cwd nodes, and confinement roots) with an IRQ-save spinlock built from the S0b atomic CAS primitive. The lock has acquire/contention counters so boot can prove the boundary was exercised and left balanced. - Borrowed open descriptions. Long operations borrow the open description
before dropping the VFS lock. Pipe reads/writes and endpoint receives release
the lock before
processYieldForIO(), sockets run the TCP/UDP work without the VFS lock held, and disk-backed reads reserve the shared file offset before block I/O.close/dup/fork/exechandle refcount paths are serialized by the same boundary. - Executable checks. Boot runs
vfsS4bReadinessSelfTest()immediately aftervfsInit()and logsS4b OK: VFS lock boundary ready. After userland demos it runsvfsS4bLockBoundaryHeldSelfTest(), which verifies the lock word is clear and fd/open-description/pipe/endpoint accounting is balanced, then logsS4b OK: VFS lock boundary stayed balanced. - Static guard.
tests/smp_release_guard_test.shrequires the VFS lock helpers, borrowed-description helpers, socket borrow helper, accounting self-test, and S4b boot-marker order. The SMP and UEFI boot smokes now require both S4b markers. - Non-goals. S4b does not enable secondary EL0 execution, does not make the small-object kernel heap concurrent, and does not protect package-store mutation or network engine state beyond keeping VFS socket descriptors alive.
S4c — kernel bump-heap lock boundary (DONE, 2026-06-10)
- C heap lock.
kernel/runtime/heap.cnow protectsheap_cursor,heap_limit, andheap_initializedwith an IRQ-save spinlock built from the S0b C atomic bridge.swiftos_kernel_alloc,swift_slowAlloc,posix_memalign, andswiftos_kernel_heap_used_bytesall pass through that boundary. - Idempotent init.
swiftos_heap_init()no longer rewinds the bump cursor after the heap is already live. The lazy allocation path initializes under the same lock if an early caller reaches it first. - Executable checks. Boot runs
swiftos_heap_s4c_self_test()after the S4b VFS readiness check and logsS4c OK: kernel heap lock boundary ready. After userland demos it runsswiftos_heap_lock_boundary_self_test()and logsS4c OK: kernel heap lock boundary stayed balanced. - Static guard.
tests/smp_release_guard_test.shrequires the C heap lock, counter/self-test exports throughio.h, and S4c boot-marker order. SMP and UEFI boot smokes now require both S4c markers. - Non-goals. This keeps the minimal bump allocator design. There is still no small-object free/reclaim, no per-CPU heap cache, and no secondary EL0 execution in this checkpoint.
S4d — package-store lock boundary (DONE, 2026-06-10)
- Package-store lock.
kernel/pkg/store.swiftnow protects package-store payload/activation tables, active payload publication, record offsets, and S4d counters with a short IRQ-save spinlock. - Writer gate.
pkgStoreInstalluses a single-writer gate around the target-side install transaction. Hashing and virtio-blk reads/writes happen outside the spinlock; record reservation/commit and final active payload publication happen through short locked helpers. - Reader snapshot. Active payload count/info/read paths copy the active payload index/offset/size snapshot under the S4d lock, then perform package store block I/O without holding it.
- Executable checks. Boot runs
pkgStoreS4dReadinessSelfTest()immediately afterpkgStoreInit()and before VFS consumes active package payloads, then logsS4d OK: package-store lock boundary ready. After the userland demos it runspkgStoreS4dLockBoundaryHeldSelfTest()and logsS4d OK: package-store lock boundary stayed balanced. - Static guard.
tests/smp_release_guard_test.shrequires the S4d lock, writer gate, record reservation/commit helpers, unlocked payload reads, and S4d boot-marker order. SMP and UEFI boot smokes now require both S4d markers. - Non-goals. S4d does not add a package-store journal, multi-writer transactions, or a package-management service. Install remains a serialized operation.
S4e — network/socket lock boundary (DONE, 2026-06-10)
- Network lock.
kernel/net/socket.swiftnow protectsgNet, DNS scratch state, socket tables, TCP connection state, RX datagram rings, and the virtio-net poll/TX/RX boundary with a short IRQ-save spinlock and S4e acquire/contention counters. - Pump boundary.
netPump()is the public locked pump entry point;netPumpLocked()is the internal helper that may callvirtioNetPoll(&gNet)and deliver RX frames into sockets. Blocking recv/accept/connect paths pump or wait without holding the lock, then take short locked snapshots to inspect socket state or copy payloads. - Boot probe boundary. The net-a boot probe no longer reads
gNetor calls virtio-net TX/poll helpers directly frommain.swift; it uses small locked probe helpers for MAC, ARP, and ICMP echo checks. - Executable checks. Boot runs
netS4eReadinessSelfTest()immediately afterrunVirtioNetProbe()and logsS4e OK: network lock boundary ready. After the userland demos it runsnetS4eLockBoundaryHeldSelfTest()and logsS4e OK: network lock boundary stayed balanced. - Static/runtime guard.
tests/smp_release_guard_test.shrequires the S4e lock, counters, pump/probe helpers, and boot-marker order. SMP and UEFI boot smokes require both S4e markers. Runtime network coverage was re-run across virtio-net ARP/ICMP, UDP/TCP echo, TCP active open, DNS, HTTP, TLS, zero-copy RX refs, socket handle transfer, IPv6 link-local/NDP smokes, and signed HTTP package repo install. - Non-goals. S4e does not service-ize the network stack, add a NIC interrupt thread, or enable broad secondary network work. It is a correctness boundary for the current in-kernel polled engine; C5/network service work still owns the architectural move out of the kernel.
S4f — restricted-SMP resource stress (DONE, 2026-06-10)
- Userland workload.
/bin/s4stressis a small static C binary that drives the kernel resource paths hardened during S4. Each run repeats anonymousmmap/munmap, pipe create/dup/read/write/close, tmpfs write/rename/read cycles plus bounded create/unlink/mkdir/rmdir smoke paths,fork/waitpid, andspawn/exec of/bin/argvdemo, then printsS4F-*completion markers. - Runtime harness.
tests/s4_resource_stress_test.shboots QEMU with-smp 4, logs in through the normal console path, runs/bin/s4stressfrom the packed base image, and requires the S2 timer heartbeat plus the S4e post-run lock-balance marker before accepting the S4f markers.make testruns the harness after the SMP boot smoke. - Post-boot SMP churn harness.
tests/smp_resource_stress_test.shboots with-smp 4, logs in after the normal boot demos, and reruns resource-heavy userland paths (forkdemo,fdopsdemo,execdemo,threadsdemo, and a tmpfs create/write/pipe/move/remove loop) while all discovered CPUs remain online and ticking. It checks the S4a-S4e post-demo lock-boundary markers and its ownS4F-*status markers, and is available asmake smp-resource-stress-test. - Static guard.
tests/smp_release_guard_test.shnow requires the/bin/s4stressMakefile wiring, base-image install step, executable harnesses, and workload coverage markers so the S4f stress does not silently fall out of the release contract. - Non-goals. S4f is intentionally a restricted-SMP stress pass for the current S2h gate. It does not enable broad secondary EL0 execution or make one address space execute concurrently on multiple CPUs; that remains S5.
S5a — per-CPU utilization export (DONE, 2026-06-10)
- Kernel accounting. The existing per-CPU timer scaffold now also exposes
idle ticks through the S5a
SYS_sysinfoextension. CPU0 mirrors its legacy idle accounting fromprocessOnTick; parked secondary CPUs account their timer interrupts as idle in the IRQ path. - Userland visibility. The first 64 bytes of the
/bin/topsysinfo record remain compatible with the previous layout. Userland that passes the extended record capacity receivescpuCount,cpuCapacity,cpuTicks[8], andcpuIdleTicks[8]starting at offset 64./bin/toprenders aggregate busy/idle from those deltas plus a per-CPU busy line. - Executable checks. Boot logs
S5a OK: per-CPU utilization counters ready.tests/top_test.shis parameterized bySMP_CPUS;make smp-cpu-utilization-testruns/bin/top -b -n 2under-smp 4and requires the four per-CPU busy entries. - Non-goals. S5a is observability only. It does not broaden process placement, enable one address space on multiple CPUs, or change the restricted S2h secondary EL0 gate.
S5b — bounded EL0 scheduler placement batch (DONE, 2026-06-10)
- Scheduler placement acceptance.
processRunS5bPlacementBatchextends the restricted S2h gate from a pair demo to a three-process batch: the stable pair phase pins onecoprocprocess to the explicitly started secondary scheduler CPU and one to CPU0, then a third CPU0coproctail runs before any of the batch slots are reaped. The default scheduler placement still remains CPU0 outside this acceptance path. - Telemetry before reap. The batch captures per-process dispatch counts,
dispatch CPU masks, and last-dispatch CPUs before the slots are reaped. The
S5b guard requires the secondary-pinned process to dispatch on a non-primary
online CPU under
-smp 4, or the explicit CPU0 fallback under single-CPU boot. - Placement correctness fixes. Requeue now preserves an existing process's
pHomeCpuinstead of recalculating the default CPU0 placement, and new slots explicitly clear dispatch/address-space telemetry before first enqueue. Liveklogoutput also renders non-zerodetail=fields, matching the SMP boot contract that already used structured details. - Executable checks. Boot prints
S5b OK: three EL0 processes ran with scheduler placementand logs eitherS5b OK: EL0 scheduler placed batch across CPUsor the CPU0 fallback. The SMP boot smoke checks marker ordering, the release guard checks capture-before reap and boot-order contracts, andmake s5-scheduler-placement-testruns the focused-smp 4boot acceptance. - Non-goals. S5b does not enable arbitrary secondary EL0 scheduling, migration, work stealing, cross-CPU wakeups, or concurrent execution of multiple EL0 threads in the same address space.
S5c — repeated EL0 placement stress + run queue lock (DONE, 2026-06-10)
- Run queue locking. EL0 process run queue enqueue/dequeue now takes a
small per-CPU IRQ-save spinlock. This keeps the CPU0 producer path and the
secondary scheduler consumer path from racing on the
head/tailpair when CPU0 publishes work to a secondary run queue. Cross-CPU enqueue also sendssevafter the queue update so a parked secondary scheduler does not wait for the next timer interrupt before noticing new work. - Repeated placement workload.
processRunS5cPlacementStressruns three independentcoprocprimary/secondary rounds through the restricted S2h gate, then stops the secondary scheduler and runs two CPU0 tail processes. Slots are reaped after each round, but aggregate dispatch counts and CPU masks are captured before each reap and folded into S5c telemetry. - Executable checks. Boot prints
S5c OK: repeated EL0 placement stress completedand logs eitherS5c OK: repeated EL0 placement stress crossed CPUsor the CPU0 fallback. The S5c guard validates the expected process count, primary/secondary masks, nonzero dispatches, visible run queue lock activity, cleared gate masks, and idle queues.make s5-placement-stress-testruns the focused-smp 4boot acceptance. - Non-goals. S5c still does not enable arbitrary process migration, load balancing, shared-address-space execution on multiple CPUs, or secondary scheduler access to unrelated kernel subsystems.
S5d — independent EL0 fanout across scheduler CPUs (DONE, 2026-06-10)
- Multi-secondary fanout.
processRunS5dFanoutstarts every online secondary scheduler CPU with a live timer heartbeat, creates one independent top-levelcoprocprocess for CPU0 and one for each started secondary CPU, then waits for all slots to become zombie and scheduler-quiesced before stopping the secondary schedulers. - Exact placement telemetry. The fanout captures the scheduler CPU mask, aggregate dispatch CPU mask, secondary CPU mask, total dispatch count, and a count of processes whose dispatch mask exactly matched their home CPU before any fanout slot is reaped. The S5d guard requires those masks to match, all participating CPUs to record EL0 switches, all gate masks to be clear after stop, and every run queue to be idle.
- Executable checks. Boot prints
S5d OK: EL0 fanout ran across scheduler CPUsand logs eitherS5d OK: EL0 fanout crossed scheduler CPUsor the CPU0 fallback.make s5-el0-fanout-testruns the focused-smp 4boot acceptance, the SMP boot smoke enforces S5b -> S5c -> S5d marker ordering before Swift ps, and the release guard checks the fanout wiring. - Non-goals. S5d still does not migrate a process after creation, run a single shared address space on multiple CPUs, enable arbitrary load balancing, or let secondary schedulers execute unrelated kernel-thread work.
S5e — shared-address-space thread fanout (DONE, 2026-06-10)
- Futex SMP boundary.
kernel/sched/futex.swiftnow protects the futex wait table with an IRQ-save spinlock and exposes S5e lock/waiter self-tests. TheFUTEX_WAITpath records the waiter and marks the caller blocked while holding the futex lock, then releases the lock before yielding so a different CPU can runFUTEX_WAKEwithout deadlocking or losing the wake. - Gated thread placement.
processRunS5eThreadFanoutstarts online secondary scheduler CPUs, runs/bin/threadsdemoon CPU0, and temporarily enables an S5e-onlythread_createplacement policy. Created sibling threads share the creator TTBR0 and are placed round-robin on active secondary scheduler CPUs; ordinarythread_createremains CPU0-placed outside this acceptance path. - Telemetry and guard. S5e records created/exited thread counts, shared address-space count, home/dispatch CPU masks, exact home-CPU dispatch matches, futex lock activity, and a protected telemetry-lock count before the top-level demo process is reaped. The guard requires two sibling threads, a shared address space, nonzero futex lock activity, idle futex waiters/run queues, and closed secondary gate masks.
- Executable checks. Boot prints
S5e OK: shared-address-space thread fanout completedand logs eitherS5e OK: shared-address-space threads crossed CPUsor the CPU0 fallback.make s5-thread-fanout-testruns the focused-smp 4boot acceptance. - Non-goals. S5e does not make all shared address spaces freely migratable,
add load balancing/work stealing, or protect concurrent
mmap/brkmutations from multiple threads. It proves the narrow thread/futex runtime path under the restricted S2h scheduler gate.
S5f — run-any EL0 placement policy (DONE, 2026-06-10)
- Gated run-any policy. The ordinary default process placement still chooses
CPU0 outside the acceptance window.
processRunS5fRunAnyPlacementtemporarily enables a run-any hook that round-robins new EL0 processes across CPU0 plus all active secondary scheduler CPUs, using the normalhomeCpu: unassignedCpucreation path instead of explicit affinity. - More work than CPUs. The boot demo starts all online secondary scheduler
CPUs and creates more
/bin/coprocprocesses than scheduler CPUs. This forces the run-any selector to wrap while each process remains pinned to the selected home CPU for the duration of the narrow test. - Telemetry and guard. S5f captures the scheduler CPU mask, aggregate dispatch CPU mask/count, secondary CPU mask, policy selection count, process count, and exact home-CPU dispatch matches before reaping. The guard requires policy selections to match created processes, dispatch coverage to match the scheduler mask, every process to dispatch only on its selected CPU, and all run queues plus secondary gate masks to be idle after stop.
- Wake robustness. Full-gate stress exposed that secondary EL0 scheduler
start waits were relying on
sevplus timer interrupts while the secondary loop sleeps inwfi.processWaitForSecondaryActivenow sends the reserved SGI/IPI only while opening the gate, and secondary timer preemption requires active+run masks while rejecting the stop mask, so S5f does not depend on timer luck without widening the stop race. - Executable checks. Boot prints
S5f OK: run-any placement policy completedand logs eitherS5f OK: run-any placement covered scheduler CPUsor the CPU0 fallback.make s5-run-any-placement-testruns the focused-smp 4boot acceptance. - Non-goals. S5f does not add migration, work stealing, load balancing, or a production scheduler heuristic. It proves that the default placement path can select any active scheduler CPU under the existing restricted SMP boundary.
S5 aggregate readiness gate (DONE, 2026-06-10)
- Scope. Added
make s5-testas the review-facing aggregate for S5 runtime readiness. It runs the existing S5a-S5f focused gates in order:smp-cpu-utilization-test,s5-scheduler-placement-test,s5-placement-stress-test,s5-el0-fanout-test,s5-thread-fanout-test, ands5-run-any-placement-test. - Why. S0/S1 already had aggregate targets (
s0-test,s1-test), but S5 required reviewers to remember the full focused-gate list. The aggregate target preserves the narrow gates and gives broader reviews a single command. - Guard.
tests/phase1_roadmap_test.swiftchecks the Makefile target and docs references so future S5 docs do not drift back to only naming the final focused gate.
C1 — handle table + fds-as-handles (DONE, 2026-06-08)
- Typed handle slots.
kernel/vfs/handle.swiftnow owns the dependency-freeHandleKind,Rights,HandleInheritance, andHandleEntryvocabulary. The VFS fd table storesHandleEntryvalues keyed by(process slot, fd): each slot records the fd-visible object kind, the shared open-description index, per-handle rights, and the per-slotcloexecflag. - Behavior-preserving fd view. POSIX-visible fd numbering is unchanged:
top-level stdio is still
0/1/2,open()andsocket()allocate from fd3, whiledup,dup2,F_DUPFD(_CLOEXEC), andpipepreserve their existing lowest-free behavior. Shared offsets, pipe/socket lifetime, close-on-exec, fork inheritance, and exec behavior remain backed by the existing reference-countedOpenDescriptionpool. - Rights without new policy.
read/writerights stay per handle and are used by the same syscall paths as before. C1 does not add enforcement for.duplicate,.transfer,.getattr, socket-specific operations, or process capability checks beyond the policy that already existed before this checkpoint. - Tests / acceptance.
tests/handle_test.swiftcovers stable rights bits, attenuation,rights(read:write:), distinct handle kinds, and typedHandleEntryinitialization. The boot path printsC1 OK: fds-as-handles preservedonly after/bin/fdopsdemoexits successfully, andtests/boot_test.shasserts that marker. - Non-goals left for later C milestones. No new user-visible generic handle syscalls, no spawn-with-explicit-handles default flip (C2), no object-scoped authority policy expansion (C3), no new IPC/VMO/device/cell handle semantics (C4+), and no SMP work.
C2 — spawn-with-handles / explicit handle inheritance (DONE, 2026-06-08)
- Explicit spawn inheritance.
SYS_SPAWN_HANDLESadds a synchronousspawn_handles(path, argv, HandleSpec[], count)ABI. A spawned child starts with an empty handle table and receives only the named(source fd -> target fd)entries, with per-entry rights attenuated by the supplied mask and optional child-side close-on-exec. - Compatibility preserved. The existing
SYS_SPAWN/spawn()wrapper still inherits stdio only (0/1/2).fork()still inherits the full handle table, including fd numbers, shared open descriptions, offsets, rights, andcloexecflags.execve()still closes only close-on-exec descriptors. - Tests / acceptance.
tests/handle_test.swiftcovers the C2 inheritance selector andHandleSpecABI constants./bin/spawndemonow proves both that legacy spawn drops parent fd3and thatspawn_handlescan explicitly pass fd3;tests/boot_test.shassertsC2 OK: explicit handle inheritance preservedand rejects leak/failure markers. - Non-goals left for later C milestones. C2 does not add object-scoped filesystem authority, subtree grants, resource-limit enforcement, IPC transfer policy, service/cell launch semantics, or SMP work.
C3 — per-handle VFS rights gates (DONE, 2026-06-08)
- Operation rights now live on the handle. Existing fd-backed operations now
check
HandleEntry.rightsbefore dispatch for duplication, metadata (fstat,F_GET*), fd attributes (F_SET*), directory iteration, lseek, socket send/receive/control paths, and explicit spawn handle passing (.transfer).read,write,ftruncate, pipe poll, and endpoint send were already rights-aware; C3 closes the obvious fd bypasses without changing fd numbering. - Compatibility defaults preserved. Legacy
open(),pipe(), stdio, andsocket()still mint POSIX-compatible handles with read/write plus the meta rights needed by shells, redirects, fork, andspawn_handles. Process-global caps remain coarse constructor gates for ambientopen()/socket()and tmpfs mutation compatibility; once a handle exists, those caps do not widen it. - Scoped filesystem authority. The existing
confine()subtree root now gates path syscalls beyondopen():stat,chdir, namespace mutations, chmod/chown, and disk-backed exec image lookup. Confinement is narrow-only and keeps cwd inside the subtree. - Tests / acceptance.
tests/handle_test.swiftcovers C3 rights bit stability,hasRights, and attenuation to all/empty masks./bin/spawndemopasses deliberately attenuated handles and/bin/argvdemoproves missing write, duplicate, getattr, and directory-read rights are denied./bin/fsdemoproves/etcconfinement allows inside access but denies open/stat/widen/create outside, and the boot test assertsC3 OK: per-handle rights enforced. - Non-goals left for later C milestones. C3 does not add C4 IPC expansion, endpoint handle policy beyond existing fd rights, VMOs, async rings, userland drivers, Cells/resource domains, cap-stripping spawn policy, or SMP work.
C4a — minimal endpoint handle-passing IPC hardening (DONE, 2026-06-08)
- Reviewed endpoint slice. The existing
endpoint_create/ipc_send/ipc_recvpath is now treated as the first C4 sub-milestone: a pollable single-slot endpoint can carry bytes plus one moved handle between processes. The sender's source fd is cleared without releasing the open description, and the receiver installs the same attenuatedHandleEntryrights into a fresh fd. - Rights and lifetime hardening.
ipc_sendrequires endpoint.writeand.transfer, and the moved source handle must have.transfer.ipc_recvrequires endpoint.readand, when a handle is pending, endpoint.transferbefore importing it. Same-endpoint self-transfer is rejected, endpoint creation rolls back reserved fd/description/object slots on failure, and receive checks fd-space before consuming a pending moved handle. - Poll and teardown behavior. Endpoint poll readiness now mirrors the rights
ipc_send/ipc_recvenforce: send readiness needs write+transfer, receive readiness for a pending handle needs read+transfer, and peer close still reports HUP/ERR-style readiness. Closing the last endpoint references still releases an in-flight unreceived handle, and endpoint slots retain their heap-backed message buffers for reuse so repeated create/close tests do not burn the bump heap. - Tests / acceptance.
tests/handle_test.swiftcovers the C4a endpoint vocabulary./bin/spawndemopasses attenuated endpoint handles to/bin/argvdemo, proving missing endpoint write/read/transfer rights are denied./bin/forkdemoproves bytes, received-handle readback, and move-only source fd invalidation. The boot acceptance marker isC4a OK: endpoint IPC moved handles safely. - Non-goals left for later C4 milestones. No VMOs, async rings, batched
descriptors,
ipc_call, badges, multi-handle vectors, service supervisor, userland drivers, Cells/resource domains, socket-transfer smoke, endpoint close-on-exec policy change, or SMP work.
C4b — socket handle transfer smoke (DONE, 2026-06-09)
- Socket objects now have endpoint-transfer coverage. No kernel ABI change
was needed: C4a already moves a full
HandleEntry; this slice proves that the same move works for.socketdescriptions and preserves the socket table object/lifetime across process ownership transfer. - Executable smoke. Added
/bin/c4b-sockxfer: the parent binds a UDP socket afterfork, moves that socket handle over an endpoint to the child, verifies the source fd is invalidated (-EBADF), and waits for the child to receive and echo a host datagram through the transferred socket. - Tests / acceptance.
tests/ipc_socket_transfer_test.shboots with virtio-net + slirp UDP hostfwd, runs/bin/c4b-sockxfer, sends a host datagram to the bound port, asserts the child received/echoed through the moved socket, and is wired intomake testafter the UDP smoke. A harness-hardening follow-up makes the host UDP receive window more patient and fails fast with the serial tail if QEMU exits while the script is waiting for a marker. - Still deferred. VMOs, async rings, batched descriptors,
ipc_call, badges, service supervisor, userland drivers, Cells/resource domains, endpoint close-on-exec policy change, and SMP work remain later C/S milestones.
C5a — restartable driver-service supervisor smoke (DONE, 2026-06-10)
- Supervisor/service shape. Added
/bin/drvsvcdemo, a tiny userland supervisor, and/bin/drvinputd, a pseudo input-driver service. The supervisor creates two endpoint pairs, forks/execs the service with only the service-side endpoint fds left open, waits for a ready message, sends a command, receives an event, stops the service, and repeats the sequence with a fresh generation. - Restart evidence. The service returns a generation-specific exit status
after
STOP, so the supervisor proves both the old service stopped and a new service instance recovered the endpoint protocol. - Executable checks. Boot now runs the smoke and prints
C5a OK: restartable driver service recovered over IPC;make c5-driver-service-testruns the focused-smp 4direct-boot acceptance. - Non-goals. No real device handle, MMIO mapping, IRQ endpoint, DMA window, or virtio-input ownership is moved to userland yet. This is the C5 supervisor/IPC contract that the next device-handoff slice can attach hardware authority to.
C5b — opaque device-handle handoff scaffold (DONE, 2026-06-10)
- Device handle vocabulary.
HandleKind.deviceis now part of the typed handle table. Device grants default togetattr + transferonly: they can be inspected and moved over C4 IPC, but not duplicated, read, written, or mapped. - Registry scaffold. VFS owns a tiny device registry with
pseudo-input.0, a C5 scaffold entry markedNO_MMIO_GRANT.device_claim(name, info*)creates a unique device handle for the boot authority and returns-16while another live handle owns the grant.device_info(fd, info*)exposes fixed metadata; the MMIO base/length and IRQ fields are zero because C5b does not grant hardware access yet. - Lifecycle and IPC transfer. Open-description refcounts now release device
ownership on final close/process exit.
/bin/drvsvcdemoclaims the pseudo device, transfers it to/bin/drvinputdwithipc_send(..., handle_fd), verifies the moved source fd becomes-9, observes-16on a concurrent claim while the service owns it, stops the service, and successfully reclaims the device. - Executable checks. Boot now requires
C5b OK: opaque device handle transferred and released;make c5-device-handle-testis the focused direct-boot acceptance. The hosthandle_testalso covers.devicekind stability and default rights. - Non-goals. Still no MMIO map syscall, IRQ endpoint, DMA window, real virtio-input device claim, manifest matching, or driver replacement. C5b only makes the ownership/transfer/release contract executable.
C5c — virtio-input device discovery and manifest matching (DONE, 2026-06-10)
- Discovery ABI. Added
device_discover(index, info*)as syscall 64. It is read-only, requires the same boot authority capability asdevice_claim, and enumerates the device registry by manifest ordinal. It writes the same 64-byteswiftos_device_inforecord asdevice_info; out-of-range enumeration returns-2. - Discovery-backed registry. The VFS device registry now probes the
platform virtio-mmio window for device id 18 and registers
virtio-input.0when a QEMU virtio-input transport is present. The grant metadata recordsSWIFTOS_DEVICE_KIND_VIRTIO_INPUT,SWIFTOS_DEVICE_BUS_VIRTIO_MMIO, the transport MMIO base/length, andDISCOVERED | NO_MMIO_GRANT. - Headless fallback. Direct serial boots that do not attach a keyboard
device still register
pseudo-input.0, so the C5 supervisor and lifecycle smoke remains part of the broad boot path. - Supervisor/service manifest check.
/bin/drvsvcdemoprefersvirtio-input.0, validates the manifest fields, transfers the device handle to/bin/drvinputd, proves the grant is busy while the service owns it, and reclaims it after service exit./bin/drvinputdvalidates the same manifest before acknowledging. The focused path emitsC5c OK: virtio-input device grant discovered and matched. - Executable checks.
make c5-device-discovery-testattaches QEMUvirtio-keyboard-deviceand runs the C5 gate under-smp 4; the ordinaryboot_test.shstill covers the pseudo fallback and requiresC5c OK: device discovery manifest matched pseudo input. - Non-goals. C5c still grants only
getattr + transfer. No userland MMIO map syscall, IRQ endpoint, DMA window, or replacement of the in-kernel virtio-input queue owner lands in this slice.
C5d — virtio-input discovery metadata (DONE, 2026-06-10)
- Metadata source. The virtio-input keyboard probe now scans
platform.virtioMmioBase/Stride/Countinstead of the old fixed QEMU window constants. VFS device-registry setup reuses the same read-only probe: when avirtio-keyboard-deviceis present,virtio-input.0reportsVIRTIO_MMIO, the transport MMIO base, and the slot length inswiftos_device_info. - Authority boundary. The registry still sets
NO_MMIO_GRANT, and IRQ is still zero. These MMIO fields are discovery manifest metadata only; no userland mapping, IRQ endpoint, DMA window, or driver ownership is handed out. - Executable checks.
/bin/drvsvcdemoand/bin/drvinputdvalidate both the synthetic no-device fallback and the virtio-mmio metadata case.make c5-device-metadata-testboots with a QEMU virtio keyboard and assertsC5d OK: virtio input discovery metadata surfaced.
C5e — device authority envelope preflight (DONE, 2026-06-10)
- Authority flags.
userland/lib/syscall.hnow reservesSWIFTOS_DEVICE_FLAG_MMIO_GRANT,SWIFTOS_DEVICE_FLAG_IRQ_GRANT, andSWIFTOS_DEVICE_FLAG_DMA_GRANT, plus the combinedSWIFTOS_DEVICE_FLAG_HARDWARE_AUTHORITYmask. The current registry never sets those bits. - Executable boundary. The supervisor and service both validate that device
grants keep the future hardware-authority mask clear, keep
NO_MMIO_GRANTset, and reportirq == 0. This makes the current metadata-only contract a testable boundary rather than a comment. - Acceptance.
make c5-device-authority-testruns the focused C5 QEMU path withvirtio-keyboard-deviceattached and assertsC5e OK: device authority withheld until explicit handoff.
C5f — metadata-only device grant rights contract (DONE, 2026-06-10)
- Shared rights helper.
kernel/vfs/handle.swiftnow definesdeviceMetadataGrantRights()as the single metadata-only device grant shape:.getattr + .transfer. The VFS device claim path uses that helper instead of assembling device rights locally. - No implicit hardware authority. The host handle test and static C5f guard
reject accidental
.read,.write,.execute,.map,.duplicate, or.setattrrights on current device grants. Runtime C5 still proves the grant can be inspected, moved over IPC, and not duplicated. - Acceptance.
make c5-device-rights-testruns the host handle vocabulary check plustests/device_authority_guard_test.sh. The focused and broad C5 boot smokes now requireC5f OK: device grant rights stayed metadata-only.
C5g — device authority capability gate (DONE, 2026-06-11)
- Negative EL0 probe. Added
/bin/deviceauthdemo, a small guest-side probe that callsdevice_discover(0, info*)anddevice_claim("pseudo-input.0", info*). A restricted principal must receive-13for both operations before it can enumerate or mint an opaque device grant. - Acceptance.
make device-authority-cap-testboots QEMU, logs in as the seededguestprincipal (principal=3 session=3 caps=2), runs the probe, and requiresDEVICE-AUTH-DISCOVER-DENY-OK err=-13,DEVICE-AUTH-CLAIM-DENY-OK err=-13, andC5g OK: non-console principal cannot discover or claim device grants. - Aggregate wiring.
make c5-testnow includes the C5g gate after the C5a-C5f driver-service/device-authority checks, and the stability coverage guard requires the Makefile, testing guide, roadmap, and notes references. - Non-goals. C5g does not change the registry or hand real MMIO/IRQ/DMA to
userland. It freezes the existing
capConsoledevice-authority minting boundary as an executable regression test.
C5 aggregate readiness gate (DONE, 2026-06-10)
- Scope. Added
make c5-testas the review-facing aggregate for C5 readiness. It names the existing C5a-C5g gates in order:c5-driver-service-test,c5-device-handle-test,c5-device-discovery-test,c5-device-metadata-test,c5-device-authority-test,c5-device-rights-test, anddevice-authority-cap-test. - Why. C5 review previously required remembering which QEMU gates cover the restartable driver-service path and which host/static guard covers metadata-only grant rights. The aggregate keeps focused gates available but gives broad reviews one command.
- Guard.
tests/phase1_roadmap_test.swiftchecks the Makefile target and docs references so future C5 additions keep the aggregate readiness contract visible. - Full-gate coverage hardening.
make testnow runsmake c5-test, so the broad shipped gate includes restartable driver-service supervision, device grant transfer, virtio-input discovery metadata, authority withholding, metadata-only rights checks under-smp 4, and guest denial before device grant minting. Addedtests/qemu_virt_hardware_map_test.sh/make qemu-virt-hardware-map-testto validate QEMUvirtPL011, GIC, timer, PSCI, CPU topology, and virtio-mmio DTB facts for 1-CPU and 4-CPU profiles. Addedtests/stability_coverage_test.swiftplusmake stability-coverage-test;docs-testruns this static guard so memory/resource, hardware/SMP, security/isolation, update/rollback, package, network, C5, and UEFI coverage cannot silently fall out of the full gate. Addedtests/swpkg_header_integrity_test.swift/make swpkg-header-integrity-testto reject tampered.swpkgmanifest hash, payload hash, and reserved signature header fields before verification or payload extraction.
Post-M8 roadmap (M9 → M13) — locked 2026-06-04
M8 is complete (busybox sh on QEMU virt). The next arc is portability + a real boot + identity.
Three forks were raised and decided with the maintainer (each touched a previously-locked decision):
- Boot/portability → keep aarch64, add UEFI boot. "Run in VirtualBox" does NOT mean an amd64
port (amd64 stays a non-goal). We make the kernel boot from a real disk via UEFI firmware and
discover hardware at runtime instead of hardcoding the QEMU
virtmap. Reference validation is QEMU + AAVMF (edk2 aarch64 UEFI); the end target is VirtualBox ARM on Apple Silicon, treated as best-effort because that machine model is experimental and differs from QEMUvirt. - Identity → capability/principal model (as already described in ARCHITECTURE.md). Kernel
authorization is capability-based, not
uid==0./etc/passwd//etc/groupare generated compat views for busybox/newlib, never the source of policy. - Filesystem → virtio-blk + packed read-only base image. Load
/bin,/etc, busybox from a disk image instead of embedding ELFs in the kernel. tmpfs stays; persistent writable storage is NOT introduced (data loss on reboot remains by design).
Milestone sequence (one at a time, each builds/boots/tests/commits, then stop for review):
- M9 — HAL + runtime hardware discovery (DTB). Replace hardcoded UART/GIC/RAM constants with a
Platformstruct populated from a flattened device tree. Prerequisite for both UEFI and any non-QEMU host. Low risk: falls back to QEMUvirtdefaults if no valid DTB. - M10 — UEFI boot + bootable disk image. Build the kernel as an EFI-loadable image (or a small
UEFI loader): get the memory map + ACPI/DTB config table,
ExitBootServices, hand off. Produce a GPT image with an ESP. Acceptance: boots under QEMU+AAVMF from disk (no-kernel) to busybox. - M10.5 — VirtualBox ARM validation (spike + milestone). Research VBox ARM device model (UART, GICv2/v3, storage backend, ACPI), adapt the HAL/drivers, boot the M10 image in VirtualBox on Apple Silicon. If too immature, record findings and keep QEMU+AAVMF as the reference.
- M11 — virtio-blk + packed base FS from disk. virtio-blk driver (discovered via HAL); host-side
image packer; VFS serves the RO base from disk; drop the embedded
user_blob. - M12 — capability/principal core + login. Typed
Principal/Session/Capability; process security context;console-loginauthenticates a principal from a base-image identity store, opens a session, grants capabilities, spawns the shell. Generated/etc/passwdcompat view. - M13 — permission enforcement on the VFS. File access checked against capabilities;
ls -lshows ownership/mode from generated views; unprivileged session denied writes to the RO base.
Critical path M9 → M10 → M11 → M12 → M13, with M10.5 a parallel validation after M10. Highest risk is
the UEFI handoff (M10) and VBox ARM immaturity (M10.5); the -kernel path stays as a fallback until UEFI
is stable.
Hardware abstraction (M9)
- The boot stub (
boot.S) preserves an optional DTB pointer fromx0and passes it tokernel_main(dtbPhys:). QEMU's direct ELF-kernelpath does not reliably provide that pointer, somake run/make testdump QEMU's realvirtDTB and load it into the last MiB of RAM (0x4FF0_0000for-m 256M) with-device loader,...,force-raw=on;platformInittriesx0first and then this direct-boot fallback address. kernel/arch/aarch64/fdt.swiftis a small, pure, host-testable flattened-device-tree reader (no UART, no MMIO, no heap). It extracts the/memoryreg (RAM base/size), thearm,pl011UART reg + IRQ (SPI/PPI decode), and thearm,cortex-a15-gicdistributor/CPU-interface regs.kernel/arch/aarch64/platform.swiftholds a globalPlatformstruct initialised to QEMUvirtdefaults, then overridden byplatformInit(dtbPhys:). If neitherx0nor the direct-boot fallback address contains a valid DTB it keeps the defaults and logs a warning, so the kernel never regresses.- Drivers read their bases/IRQs from
platform:uart.swift(platform.uartBase/uartIrq),gic.swift(platform.gicDist/gicCpu),pmm.swift(RAM end =ramBase + ramSize). The EL1 physical timer PPI (INTID 30) stays an architectural constant, not board-specific. - Tests: a host unit test (
tests/fdt_test.swift) parses a real QEMU DTB (dumped via-M virt,dumpdtb=...) and asserts the extracted map; the in-QEMU boot test assertsM9 OK: hardware discovered from device tree, proving the DTB→Platformpath end to end.
UEFI boot (M10)
M10 moves the boot path off QEMU's -kernel shortcut to a real firmware booting a disk image.
M10 is DONE — the OS boots to busybox under QEMU+AAVMF from an EFI System Partition with no
-kernel. Staged as M10a (loader bring-up), M10b-prep (firmware state + load-address reservation), and
M10b (ExitBootServices + kernel handoff); details below.
M10a — UEFI loader bring-up (DONE, 2026-06-04)
- Toolchain (verified): the EFI loader is an AArch64 PE32+ application. clang targets
aarch64-unknown-windows(COFF) andlld-link -subsystem:efi_application -entry:efi_main -nodefaultlibemits the EFI image. AArch64 UEFI uses ordinary AAPCS64, so firmware function pointers are called like normal C — no special calling convention (unlike x86_64 EFIAPI). No gnu-efi or EDK2 headers:boot/efi/efi.hdeclares only the structures used, at spec-correct offsets. - Firmware: QEMU's prebuilt AAVMF/edk2 at
/opt/homebrew/share/qemu/edk2-aarch64-code.fd, loaded with-bios(no separate NVRAM vars store needed — AAVMF's default boot order scans removable media for\EFI\BOOT\BOOTAA64.EFI). - ESP bring-up: M10a initially used QEMU virtual FAT from a directory
(
-drive file=fat:rw:build/esp,format=raw,if=virtio) so no mount/root privileges were needed. M10c adds a real GPT disk image path; virtual FAT remains available asUEFI_BOOT=fat. - Device tree handoff: AAVMF defaults to ACPI, which does NOT publish an FDT table. Booting
-M virt,acpi=offmakes the firmware run in device-tree mode and install the FDT configuration table (vendor GUIDb1b621d5-f19c-41a5-830b-d9152c69aae0, EDK2gFdtTableGuid). The loader walksSystemTable->ConfigurationTablefor that GUID and finds the DTB (observed at0x47EF2000). This is the right mode for swift-os since it is a device-tree OS (M9 HAL). The loader must not return — returning hands control to the Boot Manager's setup UI — so it halts after reporting. - Build/run:
make uefi(buildBOOTAA64.EFI+ stagebuild/esp),make uefi-run(boot under AAVMF). Test:tests/uefi_boot_test.shasserts the loader banner +device tree found at 0x…on serial.
M10b-prep — firmware state + load-address reservation (DONE, 2026-06-04)
boot/efi/efi.hnow types the slice ofEFI_BOOT_SERVICESneeded for the handoff path:AllocatePages,GetMemoryMap, andExitBootServices, keeping unused members as placeholders at spec offsets.- Under QEMU+AAVMF with
-M virt,acpi=off, the loader observesCurrentEL == EL1and reportssctlr_el1(MMU currently on under firmware). This removes the immediate EL2-drop concern for the reference boot path, though other firmware can still differ. - The loader successfully reserves the direct-boot kernel load address
0x4008_0000usingAllocatePages(AllocateAddress, EfiLoaderData, 16 pages, ...). This proves the next step can copy/load the Swift kernel at the address it is currently linked for before callingExitBootServices. tests/uefi_boot_test.shnow asserts the EL1 observation, successful fixed-address reservation, andM10b-prep OK.
M10b — ExitBootServices + kernel handoff (DONE, 2026-06-04) — M10 ACCEPTANCE MET
The loader now hands off to the Swift kernel and the OS boots to busybox from disk under UEFI, with no
-kernel — the M10 acceptance.
- Embedded kernel. The loader has no filesystem driver, so it carries the flat kernel image inside
its own PE:
boot/efi/kernel_blob.S.incbinsbuild/kernel.bin(byte 0 = link base0x4008_0000) and is linked intoBOOTAA64.EFI.make uefitherefore depends on the built kernel. - Handoff sequence (
efi_main): locate the DTB;AllocatePages(AllocateAddress, 0x4008_0000); copy the kernel there;dc cvacclean the region to the point of coherency (the kernel will run with the data cache off);GetMemoryMapinto a static buffer (so no allocation perturbs the map key) →ExitBootServices(one retry if the key is stale);msr daifset, #0xfto mask the firmware's still- armed timer; then jump to0x4008_0000with the DTB pointer in x0. No firmware calls after exit. - Kernel entry hardened (
boot.S): the firmware hands us EL1 with the MMU/caches ON, so_startnow force-disables MMU + D/I caches + alignment checks inSCTLR_EL1and runstlbi vmalle1; ic iallu; dsb; isb. This normalizes both entry paths — UEFI (MMU on) and QEMU-kernel(MMU off) — to the same MMU-off bring-up, so the rest of boot is unchanged. The DTB pointer in x0 flows straight into the M9 HAL (platformInit), which parses it (no scan needed). - Verified: under QEMU+AAVMF (
-M virt,acpi=off,-bios, real GPT disk image) the kernel runs every milestone demo M1→M8 identically to-kernel, reaches the busybox shell, andtests/uefi_boot_test.shdrivesecho/ls/cat(M10-UEFI-OK, dir listing,Welcome to swift-os.). Wired intomake testalongside the-kernelpath (both green).
M10c — real GPT disk image for UEFI boot (DONE, 2026-06-04)
scripts/make-disk.shcreatesbuild/swift-os.img: a sparse GPT disk with one EFI System Partition starting at sector 2048, typeEF00, formatted/populated withmtoolsvia byte-offset access (image@@offset) so no mount or root privileges are required.make diskbuildsBOOTAA64.EFI, creates the image, and copies it to\EFI\BOOT\BOOTAA64.EFI.make disk-runboots QEMU+AAVMF from that raw disk image (-drive file=build/swift-os.img,...), with no-kerneland no QEMU virtual FAT.tests/uefi_boot_test.shdefaults toUEFI_BOOT=diskand is wired intomake test;UEFI_BOOT=fatremains as a quick fallback for the directory-backed ESP path.- Remaining (deferred, not blocking M10): the loader still embeds the kernel rather than reading it from the ESP — fine for now, revisit if the image grows or once M11's on-disk base image exists.
M10.5 — VirtualBox ARM validation (prep DONE; needs a manual run)
VirtualBox ARM is a developer preview whose machine model differs from QEMU virt, and it is a GUI
hypervisor that cannot run in this headless dev environment — so M10.5 needs a manual run on an Apple
Silicon Mac with VirtualBox installed. Prepared for that:
- Loader diagnostics. Before handing off,
loader.cnow reports, via the firmware-independent UEFI console: device tree present/absent, ACPI 2.0 table present/absent,CurrentEL+ MMU bit, and the largest conventional RAM region (base/size) fromGetMemoryMap. These print even if the kernel cannot drive VirtualBox's UART after handoff, so the first run is informative regardless. (On QEMU+AAVMF withacpi=off: DTB found, ACPI absent, EL1, RAM region base0x4800_0000.) - Procedure is in
docs/VIRTUALBOX.md:make disk→VBoxManage convertfromraw … --format VDI→ create an EFI ARM VM (256 MB, 1 core) → attach the disk → capture serial-to-file and/or a screenshot → send theUEFI:lines back. Those lines (DTB vs ACPI, RAM base, EL) drive the HAL adaptation. - Expected first outcome. The loader banner should appear (proving VBox launches our EFI app); the
kernel may stay silent after handoff if VBox's UART base differs from QEMU's PL011
0x0900_0000. That is the signal to extendplatform.swift(and, if VBox is ACPI-only with no DTB, add minimal ACPI table discovery — likely the SPCR table for the console UART — alongside the M9 device-tree path).
Disk-backed base filesystem (M11)
M11a — packed base image format + host packer (DONE, 2026-06-04)
- Added a deterministic packed read-only base image format (
SWOSBASE, version 1): 64-byte header, fixed 40-byte entries, UTF-8 path string table, and concatenated file data. All integer fields are little-endian so the kernel reader can stay tiny on AArch64. - Added
base/as the host seed tree mirroring today's in-kernel read-only VFS files:/etc/motd,/etc/hostname,/readme.txt,/hello.txt, and/bin/psplaceholder. - Added
tools/basepack.swiftandmake base-image, producingbuild/base.img. - Added
tests/base_image_test.swift, wired intomake test, which parsesbuild/base.imgand verifies the expected directories, file contents, and binary layout. - Remaining M11 work: virtio-blk discovery/driver, attach
build/base.img(or a partition/file inside the GPT image) as the read-only base source, and replace the static Swift VFS literals.
M11b — virtio-blk driver (DONE, 2026-06-05)
- Extended the M9 HAL: the FDT reader now collects the
virtio,mmiotransport bank (lowest base, per-slot stride, slot count) andplatformInitpublishes it asplatform.virtioMmio{Base,Stride,Count}. On QEMU virt that is0x0A00_0000, stride0x200, 32 slots — verified intests/fdt_test.swift. Note:PlatformInfo's new 64-bit fields are grouped with the other pointers (32-bit fields last) so the struct stays naturally aligned — the parser runs before the MMU, where a wide unaligned load faults. - Added
kernel/drivers/virtio_blk.c: a minimal polled virtio 1.0 (modern, MMIO) block driver. It scans the HAL window for device id 2, negotiatesVIRTIO_F_VERSION_1, brings up one request virtqueue, reads the capacity from config space, and reads 512-byte sectors via a 3-descriptor chain (header / data / status), polling the used ring. Synchronous and blocking — fine for a read-only base. Cache clean/ invalidate around every DMA region, mirroring the virtio-input driver. runVirtioBlkProbe(kernel/main.swift) reads sector 0 at boot and recognises theSWOSBASEmagic; a no-op (just a log line) when no block device is attached, so the-kerneltest paths are unaffected.- Test:
tests/virtio_blk_test.shattachesbuild/base.imgas a virtio-blk disk (modern transport viavirtio-mmio.force-legacy=false) and asserts sector 0 is read with its magic verified. Wired intomake test.
M11c — serve the read-only base FS from disk (DONE, 2026-06-05)
kernel/vfs/vfs.swiftnow parses theSWOSBASEheader/entries off the virtio-blk disk atvfsInitand backs the read-only vnodes with extents into the disk image (adiskOffset/dataLenpair per file);vfsReadpulls the requested span viavirtio_blk_read_range. Directory entries are sorted so parents precede children — the builder resolves each path's parent against already-created nodes.- The metadata block (entries + string table) is read once into a kept heap buffer; vnode names point
straight into it, so no per-name copies. File data is read lazily from disk on each
read(). - Fallback: when no disk / no
SWOSBASEmagic (the-kerneltest paths and the UEFI GPT boot, whose disk is not a packed image),vfsInitkeeps the compiled-in literals, so every existing path is unaffected./tmptmpfs is added in both cases. runVirtioBlkProbenow runs beforevfsInitso the disk is up when the VFS may mount from it.- Added
virtio_blk_read_range(byte_off, buf, len)(spans sectors via the bounce buffer). - Test:
tests/vfs_disk_test.shpacks a throwaway image whose/etc/motdholds a unique marker absent from the kernel literals (plus a disk-only file), boots with it attached, and asserts busybox reads the marker and the extra file — proving the bytes came off disk, not the fallback. Wired intomake test.
M11d — disk-first executable lookup (DONE, 2026-06-05)
make base-imagenow stages real ELFs into the packed base image under/bin: busybox, Swiftps, and the milestone demo programs. The static seed tree still supplies/etcand text files, while the staging tree overwrites/bin/pswith the executable.exec.swiftnow resolves known/bin/*programs through the VFS first. When a path is a disk-backed file in the mountedSWOSBASEimage, the kernel reads the ELF into a reusable staging buffer and runs it from there; otherwise it falls back to the embedded blob. The fallback keeps the no-disk-kerneltests and the UEFI GPT boot path working until the boot disk also carries/attaches a base image.- The final busybox shell launcher uses the same disk-first path, so an attached packed base image makes
/bin/busyboxthe source of the interactive shell. Busybox applets still re-exec through busybox as before, while native/bin/psis served from the packed base image. - Tests:
tests/base_image_test.swiftverifies that/bin/busyboxand/bin/psinbuild/base.imgare real ELF files, andtests/disk_exec_test.shboots withbuild/base.img, asserts the M11d disk-load log lines, and runspsfrom disk. Wired intomake test.
Embedded blob removed (2026-06-05) — M11 complete
kernel/user/user_blob.Sand the*_elf_*symbols inio.hare gone; the kernel no longer carries any userland code. The image shrank from ~1.4 MiB to ~208 KiB. The packed base image on disk is the sole source of busybox,/bin/ps, and every demo (loaded into a 2 MiB physically-contiguous PMM buffer, not the small bump heap).virtio_blk_initnow brings up each block device, reads sector 0, and selects the disk whose magic isSWOSBASE(falling back to the first block device). This lets a medium carry both a boot disk and the base image — needed for UEFI/gfx, where the firmware boots a GPT/ESP disk and the base image rides along as a second modern virtio-blk device.- Every QEMU launch attaches
build/base.imgwith-global virtio-mmio.force-legacy=false:make run, the-kerneltests (boot/tty/busybox), UEFI (disk-run,uefi_boot_test), andrun-gfx. The-kerneltest scripts gained ablk_argsblock;tty_testtimings were relaxed for disk-loaded demos. - All 11
make testsuites green;BOARD=virtualboxstill builds (its boot path parks beforevfsInit, so it does not load programs).
Capability/principal core (M12)
M12a — process security context scaffold (DONE, 2026-06-05)
- Added
kernel/security/security.swiftwith the first kernel-nativeProcessSecurityContext:principal,session, and an explicit capability mask. The boot console context is principal1, session1, with initial capabilities for console, spawn, read-only FS, tmpfs writes, and process inspection. This is not Unixuid==0; it is the capability/principal model chosen in the roadmap. - Process table entries now carry that security context. Top-level kernel-launched processes receive the
boot console context; child processes inherit it through
spawn/fork;execvepreserves it. - Added
SYS_SECURITY_INFO(31), returning the current process security record to EL0. It is introspection only; M13 will start using capabilities for enforcement. - Added
/bin/identitydemo, packed into the base image and run during boot. It validates the boot principal/session/capability mask and forks a child to prove context inheritance.boot_test.shasserts the M12a lines;base_image_test.swiftverifies the demo is present as an ELF.
M12b — identity store + console-login (DONE, 2026-06-05)
- Added the base-image identity store
/etc/swos/passwd, one principal per line asname:principal:session:caps:password:shell(caps a decimal capability bitmask; plaintext passwords for bring-up).rootgets all caps (31);usergets spawn|fsread|tmpwrite (14), no console/inspect. A compat/etc/passwdview ships alongside for tools that expect the Unix file (it is not the security source). - Added the privileged
SYS_LOGIN(32):login(principal, session, caps)replaces the calling process's security context, but only if the caller holdscapConsole(the boot/login context), so an ordinary program cannot escalate. The new context is inherited across the subsequentexecveinto the shell. - Added
/bin/console-login: reads the store, prompts for login name + password on the console, matches a store line, callslogin()to adopt that context, prints the adoptedprincipal/session/caps(viasecurity_info), andexecve's the shell from the store's last field. - Test:
tests/console_login_test.shboots with the base image, runs console-login, rejects a wrong password, then logs in asuserand asserts the adopted context (principal=2 session=2 caps=14) and that the user shell starts. Wired intomake test(12 suites green).
M12c — console-login as init (DONE, 2026-06-05)
main.swift's shell launcher becamerunInit: it starts/bin/console-login(re-read from disk each iteration, since the session's shell exec overwrites the shared ELF buffer) instead of launching busybox directly. console-login authenticates, thenexecve's the shell with the adopted context; when a session exits, init loops back to a fresh login prompt. A raw-busybox fallback remains for a base image with no login program.- Boot-flow tests updated:
busybox_test,disk_exec_test, anduefi_boot_testlog in (root/swordfish) after the M7 Ctrl-C;console_login_testlogs in at the init prompt directly;boot_testTIMEOUT 20→45s because every demo now loads from disk.
M12d — SHA-256 password hashing (DONE, 2026-06-05)
- The identity store no longer holds plaintext passwords. The password field is
salt$sha256hex, with a per-user salt andhash = SHA-256(salt + password)in lowercase hex (e.g.swos-root$2e03ca04…). console-logincarries a self-contained Swift SHA-256 (FIPS 180-4; constant table + temporary-allocation buffers, no heap) and verifies by recomputingSHA-256(salt + entered password)and comparing the hex. Verified against the hostshasum -a 256reference values baked into the store.- A stronger, iterated/memory-hard KDF (and password change tooling) is a later refinement; this milestone removes plaintext storage.
VFS capability enforcement (M13)
M13a — open-time capability checks (DONE, 2026-06-05)
vfsOpennow consults the running process's capability mask viaprocessCurrentCaps(): a read (O_RDONLY/O_RDWR) requirescapFsRead, and a write/create (O_WRONLY/O_RDWR/O_CREAT, which only the tmpfs accepts) requirescapTmpWrite. Missing the capability returnsEACCES(-13). The kernel itself (no active process) is treated as fully privileged.- Checking at open time also gates
read/getdents: a file or directory cannot be opened to read or list it withoutcapFsRead, socat/lsfail up front for a capless principal. - Added a
guestprincipal to/etc/swos/passwdwith onlycapSpawn(caps = 2).tests/cap_enforce_test.shlogs in as guest and asserts thatecho(a shell builtin, no FS access) still works whilecat /etc/motdandls /are denied. root/user keepcapFsRead, so the existing flows and the boot demos (which run under the fully-capable boot context) are unaffected.
M13b — gate tmpfs namespace mutations (DONE, 2026-06-05)
vfsUnlink/vfsMkdir/vfsRmdir/vfsRenameare path-based (they don't go throughvfsOpen), so they now requirecapTmpWriteup front (mayWriteTmp) — closing the gap where a capless principal could still mutate the tmpfs namespace.ftruncate/writewere already covered: they need a writable fd, whichvfsOpenonly hands out withcapTmpWrite.- Positive path stays green:
fdopsdemo(mkdir/rename/unlink under the fully-capable boot context) still passes inboot_test. There is no shell-level negative test because the busybox-min build ships nomkdir/touchapplet; the check is the samemayWriteTmpused by the open path, whichcap_enforce_testalready exercises forguest.
M13c — file ownership + ls -l (DONE, 2026-06-05)
Per-vnode owner + mode.
VNode(kernel/vfs/vfs.swift) gainedowner: UInt32(principal; 1 = root) andmode: UInt32(permission bits; 0 = unset → fall back to the old heuristic, so the compiled-in literal tree is unchanged). Disk-backed nodes take owner/mode from the image; a tmpfs node is stamped withprocessCurrentPrincipal()at creation, sols -l /tmpreflects who wrote the file (the live login context, not always root). NewprocessCurrentPrincipal()in kernel/user/process.swift mirrorsprocessCurrentCaps().Widened kstat (the ABI was never the risk). The kernel writes a private
kstatrecord, not newlib'sstruct stat;userland/lib/newlib_syscalls.ctranslates it, so the C compiler computes newlib's offsets from the sysroot header.writeStatModegrew from 16 to 24 bytes —u32 mode, u32 uid, u64 size, u32 gid, u32 nlink(first 16 bytes unchanged, so older readers stay valid) — and reportsst_uid = st_gid = owner,st_nlink = 1(no group model; gid mirrors the owner principal)._stat/_fstatcopy uid/gid/nlink into newlib's struct;userland/lib/fs.hmirrors the 24-byte layout. The Swift userland tools (/bin/ps,/bin/id) don't call stat, so widening is safe.SWOSBASE format v2. The 40-byte entry already reserved a
modeu32 (off 32) and a spare (off 36);tools/basepack.swiftnow writes the real mode (dir/exec 0o755, text 0o644 — from the host execute bit) andowner = 1(root) into off 36, and bumps the version 1 → 2. The kernel parser (buildBaseFromDisk) requires v2 and reads both fields. Base files are all root-owned; non-root ownership is demonstrated at runtime via tmpfs. (A host-side manifest for non-root base owners is recorded as future work.)busybox
ls -lshows names.scripts/build-busybox.shenablesFEATURE_LS_USERNAME(resolve uid/gid → name),FEATURE_LS_SORTFILES(alphabetical → deterministic tests), and theMKDIRapplet (so a logged-in principal can create a tmpfs node without shell redirection). The compatgetpwuid/getpwnam/getgrgid/getgrnam(userland/compat/stubs.c) — previously hardcoded to "root" — now parse/etc/passwdand the newbase/etc/group; an unknown id returns NULL and busybox prints the number. New compat stubs:getpagesize(libbb/procps + dd reference it) and a no-opchmod/fchmod(mkdir chmod()s the new dir; the kernel already created it 0o755). (Timestamps stay off; the date column shows the 1970 epoch since we have no clock — cosmetic.)Open-flag ABI fix (found while testing). newlib's
<fcntl.h>uses BSD values (O_CREAT 0x200,O_TRUNC 0x400) but the kernel ABI is Linux-style (O_CREAT 0x40).newlib_syscalls.c::_opennow translates the create/truncate/append bits into the kernel ABI (the access-mode bits already match) and setserrnoon a negative return. The kernel honorsO_TRUNC/O_APPENDon writable tmpfs files. This fixes a latent bug: busybox file creation via newlibopen(O_CREAT)never reached the create path before —vi's:wqonly appeared to work becausevi_testgreps the on-screen echo of the inserted text. With the fixvigenuinely saves.Redirection limitation — RESOLVED in the next milestone (see "Shell redirection + fcntl" below). M13c shipped with
echo > filenon-functional (the demo usedmkdir); the follow-up implementsfcntland makes redirection work.Tests.
tests/base_image_test.swiftasserts version 2, owner 1 on every entry, and the expected modes (busybox/ps 0o755, motd 0o644, dirs 0o755). Newtests/ls_l_test.sh(wired intomake test) logs in as root and assertsls -lshows root-owneddrwxr-xr-xdirs,-rwxr-xr-x/bin/*, and-rw-r--r--text files; then logs in asuser, runsmkdir /tmp/d, and assertsls -l /tmpshowsdowned byuser— proving a tmpfs node is stamped with the creating principal.Follow-ups: enforcement on the read/write syscalls (for contexts that change while an fd is open); a host-side ownership manifest for non-root base files; real mtimes/clock;
chown/chmod; and richer principals.
Shell redirection + fcntl (DONE, 2026-06-05)
Made busybox shell I/O redirection work (echo > file, >>, pipe-into-redirect), the top M13
follow-up. ash saves/restores descriptors around every redirect with fcntl(F_DUPFD_CLOEXEC, 10);
newlib's fcntl is a hard ENOSYS stub, so it never worked.
- Root cause of the M13c revert, now fixed.
F_DUPFD_CLOEXECis a distinct command number (newlib value 14, notF_DUPFD=0). The M13c prototype'sswitchonly handledF_DUPFD;14fell todefault: return 0, so ash read 0 as the duplicated fd and on restore diddup2(0,1); close(0)— closing stdin → the shell read EOF and exited. The fix handlesF_DUPFD_CLOEXEC, and crucially makes thedefaultcase return a negative error so an unhandled command can never be misread as "fd N". - Kernel.
SYS_FCNTL(34) →vfsFcntl(kernel/vfs/vfs.swift):F_DUPFD/F_DUPFD_CLOEXECduplicate to the lowest free fd ≥ arg (sharing the open description, likedup);F_GETFD/F_SETFDread/write a per-fd close-on-exec flag (FDEntry.cloexec);F_GETFLreturns the stored open flags;F_SETFLupdates mutable status flags; anything else isEINVAL. A plaindup/dup2clears cloexec; fork copies it. - close-on-exec honored.
vfsCloseCloexec(slot:)drops cloexec fds, called fromprocessExec(kernel/user/process.swift) — POSIX exec semantics, so ash's relocated/redirect-saved fds (it usesF_DUPFD_CLOEXEC) don't leak into exec'd applets.O_CLOEXEC(newlib 0x40000 → kerneloCloexec0x200, translated in_open) marks an fd cloexec at open time. - Userland. newlib's
fcntl(sysfcntl.o) is a hard ENOSYS stub that never calls a syscall stub, so a strong variadicfcntlinuserland/compat/stubs.c(pulled before-lc) routes toSYS_FCNTL. - Tests. New
tests/redirect_test.sh(wired intomake test): asserts> filewrites content,>>appends,cmd | cat > fileworks, and a laterechostill runs — proving the interactive shell survives the redirects (the exact regression that caused the M13c revert).tests/vi_test.shhardened to match the saved content as a clean line (^hello-from-vi$) rather than vi's on-screen echo, since the M13c_openfix made vi genuinely save (previously a false positive). - Follow-up (2026-06-08): nonblocking socket fd status.
O_NONBLOCKuses newlib's_FNONBLOCKvalue (0x4000) because compatfcntlpassesF_SETFLflags directly.F_SETFLcurrently records only that mutable status bit in the shared open description;F_GETFLreports it with the stored flags. TCPaccept/readon nonblocking fds returnEAGAINwhensocketPollReadablesays no child/data is ready, and accepted TCP children inheritO_NONBLOCKfrom the listener. HC17 later addedsocketPollWritableplus TCP send-space helpers, so VFS TCP writes can block or returnEAGAINbased on actual send-buffer availability. - Out of scope:
dup3, file locking (F_GETLK/F_SETLK).
Native Swift /bin/ls (DONE, 2026-06-05)
A pure-Embedded-Swift /bin/ls with -l (userland/ls.swift), advancing the "Swift everywhere"
first principle and the "more Swift userland utilities" roadmap item. It dogfoods the M13c per-file
ownership work entirely in Swift instead of relying on busybox.
- What it does. Lists a directory (or a single file).
-lformatsmode nlink owner group size name: the mode string from the stat type/permission bits, and owner/group resolved by name from/etc/passwd//etc/group(numeric fallback when unreadable), reusing the colon-table scan pattern from/bin/id. - Bridge.
userland/lib/swift_user.{h,c}gainedswiftos_getdents(overSYS_GETDENTS) andswiftos_stat(overSYS_STAT, unpacking the 24-byte kstat into mode/uid/gid/nlink/size). It walks the kernel dirent records (d_reclen@16,d_name@19) and stats each entry bydir/name. - Applet shadowing. The busybox standalone shell runs a bare
lsas its own applet, so/bin/lsis invoked by absolute path to exec our binary (a command with a/is exec'd directly, not applet-dispatched — verified).exec.swiftnow routes/bin/lsto the packed disk ELF (removed from the busybox-applet fallback list); barelsis unchanged (still busybox), sobusybox_testandls_l_testare unaffected. - Test.
tests/swift_ls_test.sh(wired intomake test):/bin/ls /etclists entries, and/bin/ls -lshowsdrwxr-xr-x … root root … swos,-rw-r--r-- … root root 21 motd, and a single-file-rwxr-xr-x … /bin/busybox. - Out of scope: multi-path args, column/wide output, sorting,
-a/-h/time columns.
Native Swift cat / echo / pwd (DONE, 2026-06-05)
Three more pure-Swift coreutils (userland/{cat,echo,pwd}.swift), continuing the move off busybox.
- cat copies files (or stdin when given none) to stdout in 4 KiB chunks. echo prints its args
space-separated + newline, with
-nto suppress the newline. pwd printsgetcwd(). - Bridge.
swift_user.{h,c}gainedswiftos_write(overSYS_WRITE) andswiftos_getcwd(overSYS_GETCWD). - Invocation. Like
/bin/ls, they are reached by absolute path (/bin/cat…) —exec.swiftroutes/bin/{cat,echo,pwd}to the packed disk ELFs (removed from the busybox-applet fallback). A barecat/echo/pwdstays the busybox applet/ash builtin, so existing tests are unaffected. - Test.
tests/swift_coreutils_test.sh(wired intomake test):/bin/echoprints args,/bin/cat /etc/motdprints the motd,cd /etc; /bin/pwd→/etc(proves getcwd + cwd inheritance across execve), and/bin/echo -nsuppresses the newline.
Native Swift mkdir / rmdir / rm / mv (DONE, 2026-06-05)
Pure-Swift tmpfs-mutation utilities (userland/{mkdir,rmdir,rm,mv}.swift), built directly on the
existing kernel syscalls (no new kernel work).
- Bridge.
swift_user.{h,c}gainedswiftos_mkdir/swiftos_rmdir/swiftos_unlink/swiftos_renameoverSYS_MKDIR/SYS_RMDIR/SYS_UNLINK/SYS_RENAME. They only affect the writable tmpfs; the base FS is read-only, and the calls already requirecapTmpWrite(M13b). - Scope.
rmis files-only (no-r);rmdirremoves empty dirs;mvis a single rename. Reached by absolute path;exec.swiftroutes/bin/{mkdir,rmdir,rm,mv}to the packed disk ELFs. (busybox ships no mkdir/rm/mv applets in our config exceptmkdir, which is only used byls_l_testas a bare command — unaffected.) - Test.
tests/swift_fileops_test.sh(wired intomake test):/bin/mkdir /tmp/d, write a file,/bin/mvit,/bin/lsconfirms the rename and/bin/catconfirms content survived, then/bin/rm+/bin/rmdirand/bin/ls /tmpconfirms removal.
The native-Swift userland now covers ls cat echo pwd ps id mkdir rmdir rm mv — a usable coreutils
set, all over the swift_user bridge.
Native Swift chmod / chown (DONE, 2026-06-05)
/bin/chmod and /bin/chown (userland/{chmod,chown}.swift) plus the two kernel syscalls they need,
completing the M13c ownership story: tmpfs file mode/owner can now actually be changed and is reflected
by ls -l.
- Kernel.
SYS_CHMOD(35) →vfsChmod(path, mode)sets a node's permission bits;SYS_CHOWN(36) →vfsChown(path, owner)sets its owning principal. Both are tmpfs-only (the base FS is read-only →EROFS) and requirecapTmpWrite, consistent with the other namespace mutations (M13b). Cosmetic only, since tmpfs is ephemeral, but it makes ownership/mode first-class and editable. - Tools.
chmod OCTAL FILE...(octal mode),chown UID FILE...(numeric principal id — swift-os principals are small numbers, no name lookup). Bridge:swiftos_chmod/swiftos_chown. - Test.
tests/swift_chmodown_test.sh(wired intomake test):echo > /tmp/f,chmod 600→ls -lshows-rw------- … root,chown 2→ls -lshows… user user.
Native-Swift userland: ls cat echo pwd ps id mkdir rmdir rm mv chmod chown.
Native Swift head / touch / wc (DONE, 2026-06-06)
Three more pure-Swift coreutils over the existing bridge (no new kernel work, no new bridge calls):
userland/{head,touch,wc}.swift.
- head prints the first N lines (
-n N, default 10) of each file, or of stdin. wc counts lines/words/bytes (L W C name), stdin when given no file. touch creates each missing file in the writable tmpfs (swift-os has noutimes, so it is "create if missing", not an mtime bump; the base FS is read-only). - All three are byte-oriented (UnsafePointer +
withUnsafeTemporaryAllocation), so unlike/bin/calcthey pull no Unicode data tables — they link likels/cat. Reached by absolute path;exec.swiftroutes/bin/{head,touch,wc}to the packed disk ELFs; bare names stay busybox/ash. - Test.
tests/swift_headwc_test.sh(wired intomake test): builds a 3-line file with the shell, assertswcreports3 3 14,head -n 2 … | wcreports2 2 8(proving head stops at the limit), andtouch+wcreports an empty0 0 0file.
Native-Swift userland: ls cat echo pwd ps id mkdir rmdir rm mv chmod chown head touch wc calc.
Wall clock: PL031 RTC + /bin/date (DONE, 2026-06-05)
swift-os had no clock (timestamps showed the 1970 epoch). Added a real wall clock from the QEMU virt PL031 RTC.
- Kernel.
platform.rtcBase(QEMU virt0x0901_0000; 0 on the VBox board → disabled).rtcNow()(generic_timer.swift) reads the PL031 data register (Unix seconds; QEMU seeds it from the host).SYS_TIME(37) returns it to EL0. /bin/date(userland/date.swift): prints UTCYYYY-MM-DD HH:MM:SS. The epoch→calendar conversion (Howard Hinnant's civil-from-days) lives in the C bridge asswiftos_fmt_timesolscan reuse it;swiftos_timeexposes the syscall.- Test.
tests/swift_date_test.shasserts a plausible20xx-..-.. ..:..:.. UTCline (year in the 2020s proves the RTC was actually read, not a zero/epoch fallback). - Out of scope: timezones,
settimeofday/RTC writes, DTB discovery of the RTC base (QEMU default is hardcoded, like the other pre-discovery defaults).
Per-file mtime + ls -l date column (DONE, 2026-06-05)
Files now carry a real modification time, shown by ls -l.
- Kernel.
VNode.mtime(Unix seconds). Set fromrtcNow()oncreateTmpNodeand on every tmpfs write/ftruncate; the base/literal tree (and/tmp) is stamped with the boot time atvfsInit, so read-only files show a real date instead of 1970. The kstat grew 24→32 bytes (mtime u64 at off 24; earlier fields keep their offsets). - Userland.
newlib_syscalls.cfillsst_mtim/st_ctim/st_atim(so busyboxls -lshows the date too);fs.hand theswift_userkstat mirror the 32-byte layout;swiftos_statgained anmtimeout-param. Native/bin/ls -lprints aYYYY-MM-DD HH:MMcolumn (reusing the bridge'sswiftos_fmt_time).swift_ls_test/swift_chmodown_testupdated for the new column.
Userland editors — busybox vi (DONE, 2026-06-05)
A side feature off the M9→M13 critical path: a usable full-screen text editor. We took the cheap path —
busybox already ships a self-contained vi applet (no terminfo/ncurses, draws with hardcoded ANSI escapes)
— rather than porting GNU nano (which would need an ncurses/terminfo port + locale/regex; recorded as larger
future work). The same porting pipeline as M8 busybox: cross-build against ./sysroot (newlib) + the
userland/compat shim layer, link with our crt0/syscall stubs, stage into the packed base image.
- Enable.
scripts/build-busybox.shnow setsCONFIG_VI+ a curated feature set (COLON, YANKMARK, SEARCH, DOT_CMD, SET/SETOPTS, UNDO). Three features are deliberately forced OFF because swift-os's headless serial tty breaks their assumptions:FEATURE_VI_USE_SIGNALS(needs SIGWINCH/SIGINT custom handler delivery while the editor is blocked in terminal reads; NPM10 only covers syscall-return delivery),FEATURE_VI_WIN_RESIZE(SIGWINCH; our console is a fixed 80×24, whichioctl(TIOCGWINSZ)already reports), andFEATURE_VI_ASK_TERMINAL(emitsESC[6nand blocks reading the cursor-position report, which our tty never sends back — vi would hang at startup). Note: the int-valued config symbolsFEATURE_VI_MAX_LEN/FEATURE_VI_UNDO_QUEUE_MAXmust be preset to a number beforeoldconfig(it errors on a NEW int symbol fed EOF). - Compat fix.
userland/compat/termios.hwas missing thec_ccindexVERASE(and the rest of the Linuxc_cctable); vi'sisbackspacemacro needs it. Added the full Linuxc_ccindex set. - New syscall.
33 ftruncate(fd, length)(see Syscall ABI) — vi saves by openingO_CREAT(noO_TRUNC),full_write, thenftruncateto the exact length, so without it a save that shrinks a tmpfs file would leave a stale tail. Architectural constraint kept: the base FS is read-only by design, so vi can only save into/tmp(tmpfs); editing a base file and:w-ing it elsewhere works, overwriting the base does not. This is the two-tier FS, not a bug. - Root-cause kernel fix (the hard part). Enabling vi exposed a latent kernel bug: vi crashed the kernel
(intermittent EL1 data abort in
trap_returnwith a wild SP near RAM end, or a lower-EL sync with a wild PC) right after drawing its screen. A syscall trace pinned the trigger topoll()(syscall 26): vi polls stdin with a timeout to disambiguateESCsequences.vfsPollblocked by callingprocessYieldForIO()(a cooperative scheduler switch) in a loop with IRQs enabled — and that cooperative-yield-from-inside-a- blocking-syscall path is not robust under timer preemption (it can corrupt the resumed trap frame). The workingttyReadpath, by contrast, blocks withenable_irq()+wfi()and never yields. First fix:vfsPollwaits withwfi()for tty/vnode fds (input arrives via the UART RX IRQ; the timer wakeswfifor the timeout) — exactly ttyRead's proven pattern, and it avoids a busy-spin for a single foreground reader. The cooperative-yield path stays only for pipe sets (a pipe becomes ready only when another process writes, so the CPU must be yielded). - Root-cause yield fix (the underlying bug). The cooperative yield itself was unsafe, not just for poll:
yieldToScheduler()rancpu_switch_contextwith the surroundingcurrentProc/pStatebookkeeping non-atomically with IRQs enabled. If a timer tick landed mid-switch it ranprocessOnTick→yieldToSchedulerre-entrantly and overwrote the veryCPUContextbeing saved/restored (and the single sharedschedCtx), corrupting the resumed trap frame → the wild SP/PC panic. Why poll exposed it: it yields in a tight loop for the whole timeout, so a tick lands in the switch window with high probability; spawn/fork-wait yield once and rarely hit it, and the wfi paths never switch. Fix (process.swift):yieldToSchedulerbrackets the switch withirq_save()/irq_restore()(mask across the switch, restore the caller's prior IRQ state on resume — preemptive callers entered masked, cooperative ones enabled), and theschedule()loop runs IRQ-masked end to end, unmasking only around its idlewfi(safe:currentProc == -1there, soprocessOnTickis a no-op and no switch is in flight). Addedirq_save/irq_restoretoio.h. Validated by temporarily forcing vi through the yield path: it crashed reliably before, survives 3/3 after. With the fix the yield path is preemption-safe, sovfsPoll's pipe branch is sound. - Tests.
tests/vi_test.sh(wired intomake test) logs in, runsvi /tmp/vitest, inserts text,:wq, thencats the file back — asserting vi's alternate-screen banner, the saved content (proves:wq/ftruncate), and a trailing shell marker (proves the kernel did not panic).fdopsdemo(run on every boot, asserted byboot_test.sh) gained a pipe-poll preemption stress: a CPU-bound child streams a 0..63 counter through a pipe, busy-burning between writes so the 100 Hz timer preempts it mid-loop, while the parentpoll()s the pipe with a timeout — crossingcpu_switch_contextdozens of times under active preemption (the exact interleaving that used to panic). The byte counter also catches a dropped/reordered wakeup, not just a crash. 13 suites green. - Framebuffer console VT100 support. vi worked on the serial console but was garbage on the graphical
(ramfb/UEFI-GOP) display:
fb.cwas a line printer that drew\n \r \b \tand printable bytes but echoed ANSI escapes literally, so vi's cursor-positioning/erase sequences became junk glyphs. Added a small VT100/ANSI interpreter tofb_putc(a CSI state machine): CUP (H/f), relative moves (A/B/C/D,G,d), erase-in-display (J) and erase-in-line (K), the alternate-screen private modes (?1049/1047/47→ clear+home, since vi repaints in full), with SGR (m) and other sequences consumed and ignored so a stray escape never prints. The erase/move helpers update both the pixel framebuffer and the shadow cell buffer (and lift the blinking block cursor first). Keyboard input already worked (virtio_input.cmaps arrows/Home/End/Del to the matching escapes). vi now renders correctly on the graphical window. Geometry note:TIOCGWINSZstill reports a fixed 80×24, so vi uses the top-left 80×24 of the (e.g. 100×37 at 800×600) display; reporting the real framebuffer size is a possible enhancement but would also affect serial terminals that share the one tty. - Tests (fb).
tests/fb_vi_test.sh(wired intomake test) boots the graphical path headless (-device ramfb -display none), drives vi over the serial console, screendumps the framebuffer via QMP to a PPM, and parses the pixels: it asserts a column of~down the left over otherwise-blank lines (proving CUP/erase were interpreted, not printed), a non-empty status line near the bottom of the 80×24 editor, and no kernel panic. 14 suites green. - nano: not done — it needs an ncurses/terminfo port plus locale/regex, a separate multi-step effort.
First native Swift app: /bin/calc + free-capable allocator (DONE, 2026-06-06)
The first idiomatic Embedded Swift EL0 program on swift-os. Every prior userland tool
(ls/cat/ps/console-login, …) is hand-rolled with UnsafePointer/withUnsafeTemporaryAllocation
and manual byte loops — none ever used the high-level runtime, so ARC/String/Array/Dictionary/
generics were asserted to work but never exercised, and the bridge's allocator had never been
stressed. /bin/calc (an interactive Int64 expression REPL) drives all of it end to end:
classes + ARC, an indirect enum AST, Array/String/Dictionary<String,Int64>, generics, a
closure, a protocol witness table, and print() with String interpolation.
Runtime-low decision (locked): extend the minimal bridge, not newlib
For "real" Swift apps we keep building Embedded Swift on our own svc ABI + the
userland/lib/swift_user.* bridge, and we grow the bridge as the runtime demands — rather than
relinking Embedded Swift against newlib for malloc/stdio. Why: it keeps the userland Swift-first and
lightweight; the genuinely missing primitive is a free-capable allocator (ARC churn), which is a
~80-line addition, not a reason to pull in a second libc; and a working malloc/free over sbrk
is exactly the bottom end the long-horizon Node/JVM targets will need. Newlib stays the third-party
path (busybox, the newlib port). This is the answer to the session's "what runtime-low" fork.
Gaps that surfaced (verified empirically, all closed)
- Allocator never freed. The old
swift_slowAlloconly bumpedsbrk;swift_slowDealloc/freewere no-ops. A REPL that builds+drops an AST per line would grow the break monotonically untilsbrkfailed. Replaced with a classic K&R free-list allocator with coalescing (16-byte units → 16-aligned payloads, the Embedded Swift heap alignment; grows the arena fromsbrkin 64 KiB chunks). Nowmalloc/calloc/realloc/freeare real;swift_slowAlloc/swift_slowDeallocroute through it (over-aligned requests stash the base pointer in the preceding word);posix_memalignlikewise.calc's:memprintssbrk(0)and the test asserts the break is identical before/after a 24-line churn (0xA0010000= heap base + one 64 KiB chunk) — proof the allocator recycles. print()needsputchar. Embeddedprint/String output lowers toputchar; added a thin one to the bridge overSYS_WRITE.Stringcompare/hashing needs the Unicode data tables. DynamicString ==(and soDictionary<String,_>) references_swift_stdlib_getNormData/nfd_decompositions/grapheme-break accessors. The toolchain shipslibswiftUnicodeDataTables.aforaarch64-none-none-elf; we link it into/bin/calconly (SWIFT_UNICODE_DATAin the Makefile), and--gc-sectionstrims its 825 KiB to just the referenced tables (final ELF ~160 KiB).Dictionary/Setalso needarc4random_buf(hash seed) — added a deterministic fill to the bridge (reproducible; the seed only randomises hash-table iteration order, and at that point we had no entropy source).- FP at EL0 is fine (not relied upon):
boot.SsetsCPACR_EL1.FPEN=0b11, which permits FP/SIMD at EL0 too, so scalar FP would not trap. The calculator core stays Int64 anyway so acceptance does not hinge on soft-float/compiler-rt; floating point is recorded as available for a future app.
Files / tests
userland/calc.swift— the REPL (lexer →indirect enum Expr→ recursive-descent parser →final class EnvwithDictionary→ recursive evaluator returning anEvalResultenum).:help:mem:vars:sum:qcommands.userland/lib/swift_user.{c,h}— the allocator,putchar,arc4random_buf,swiftos_heap_break.kernel/user/exec.swift—/bin/calcrouting (disk-backed, like the other Swift tools).Makefile—SWIFT_UNICODE_DATA,user_calc.o/$(USER_CALC_ELF)rules (calc links the Unicode tables), base-image staging.tests/calc_test.sh(wired intomake test): precedence/parens/assignment+lookup/modulo/unary/ division-by-zero/:sum, plus the bounded-heap churn assertion, then returns to a working shell.- Out of scope: floating point, multi-line input, functions/conditionals, REPL history editing.
Second native Swift app: /bin/kv (DONE, 2026-06-06)
An in-memory key-value store REPL — the second idiomatic Embedded Swift EL0 app. Where calc
stressed the runtime through a recursive-enum AST + ARC, kv leans on the String/Unicode
machinery: it stores arbitrary user-supplied keys and values in a Dictionary<String, String>
behind a final class Store, so every SET/GET/DEL hashes text the user typed (calc only
ever hashed String keys it minted itself), KEYS sorts those keys (String: Comparable,
Unicode-ordered), the verb dispatch runs through .uppercased() (Unicode case mapping), and
:stats reduces over map.values with a closure (reduce(0) { $0 + $1.utf8.count }). No new
kernel work and no new bridge calls — it reuses the calc-era allocator/putchar/arc4random_buf
and links libswiftUnicodeDataTables.a (SWIFT_UNICODE_DATA), trimmed by --gc-sections.
- Commands:
SET k v…(value keeps interior spaces — the rest of the line),GET k,DEL k,KEYS(sorted),COUNT, plus:stats/:mem/:help/:q. Line parsing is a smallsplitFields(line, max:)over the UTF-8 bytes so the value field preserves spaces. - Files:
userland/kv.swift;kernel/user/exec.swiftroutes/bin/kv(disk-backed);Makefileuser_kv.o/$(USER_KV_ELF)rules (links the Unicode tables like calc) + base-image staging. tests/kv_test.sh(wired intomake test): SET with a multi-word value, GET/DEL of a missing key ((nil)), DEL of a present key, COUNT 3→2, KEYS sorted,:stats, then a SET/DEL churn loop with two:memreadings asserting the heap break stays identical (the free-capable allocator recycles), and a final return to the shell. The QEMU window is 75 s (boot+login+churn under emulation lags the scripted feed; the suite is sequential, so this is comfortable in practice).- Out of scope: persistence (in-memory only, lost on exit by design), value quoting, TTL/expiry.
Native-Swift userland: ls cat echo pwd ps id mkdir rmdir rm mv chmod chown head touch wc date calc kv.
Open decisions / resolved
- Runtime-low for native Swift apps (2026-06-06): extend the
swift_userbridge (real free-capable allocator on our own ABI), not Embedded-Swift-on-newlib. See the calc section above. - Embedded Swift toolchain → swift.org 6.3.2-RELEASE (user-local xctoolchain).
- Embedded Swift flags & triple → pinned above (
aarch64-none-none-elf). - Linker →
aarch64-elf-ld. - Post-M8 direction (2026-06-04): keep aarch64 + UEFI boot (no amd64 port), capability/principal identity, virtio-blk packed RO base FS (no persistent writable FS). See "Post-M8 roadmap" above.
d5 — busybox cross-build: feasibility findings (2026-06-04)
Downloaded busybox 1.38.0; configured allnoconfig + ash/ls/cat/echo + static; cross-built with
aarch64-elf-gcc against ./sysroot (newlib). busybox is Linux-oriented; newlib is bare-metal, so
the bring-up needed a small userland/compat header surface for POSIX/Linux-ish declarations that newlib
does not ship.
- Header shims added under
userland/compat/for the minimal BusyBox build surface: endian/feature helpers, directory APIs, termios, sockets/netdb, mount/shadow/utmp placeholders, poll, mmap, statfs, sysinfo, sysmacros, utsname, wait/status, stdio/stdlib extensions, and related network headers. - Repro target added:
make busybox-checkdownloads pinned busybox 1.38.0, applies the minimal ash/ls/cat/echo/static config, includesuserland/compat, and passes only if it produces a static AArch64 busybox binary. Current log:build/busybox-check.log.
Conclusion: busybox-on-newlib is viable for the minimal ash + ls/cat/echo configuration. The binary now
cross-builds statically; the next milestone is launching that image under the OS and filling runtime
syscall gaps (dup, pipe, ioctl/termios variants, uid/gid, process helpers, directory backing, etc.)
over our own syscall surface, not Linux syscall numbers.
d5 progress — busybox now COMPILES against newlib + compat (2026-06-04)
A userland/compat/ POSIX/Linux shim layer (≈30 headers, passed via -isystem before the newlib
sysroot) now lets busybox 1.38.0 (ash + ls/cat/echo, static) compile cleanly with aarch64-elf-gcc.
Key gaps filled: byteswap/endian/features, full termios.h (newlib aarch64 ships none — struct +
flags ICANON=1/ECHO=2/ISIG=4 matching the kernel ABI + baud table), dirent.h (newlib's is
"unsupported"), sys/{ioctl,mman,statfs,sysinfo,sysmacros,resource,wait,un,termios}.h,
netdb/sys/socket/netinet/arpa/net/if network stubs, poll/sched/mntent/utmpx/shadow, and
include_next shims for stdlib.h (rename newlib's nonstandard itoa/utoa), stdio.h (getline),
signal.h (SA_RESTART). busybox .config saved at userland/busybox/config-minimal.
Remaining for d5:
- Link-time stub layer (
userland/compat/*.c): realopendir/readdir/closedirovergetdents;tcgetattr/tcsetattrover syscalls 7/8 (+tcflush/cf*stubs);lstat→stat,getuid/...→0,getpwuid/...→minimal,ioctl(TIOCGWINSZ/TCGETS),fork/execve/waitpidwrappers, and ENOSYS stubs for the networking/mount/utmp surface libbb references. - Custom final link: busybox's default
gcclink can't find-lc/crt0.o; relink the busybox objects with ourcrt0_newlib.o+ stub lib +-T user_newlib.ld+ newlib (--start-group). - Runtime bring-up: get the ash prompt, then run ls/cat/echo (likely a few iterations: applet re-exec path, tty modes, missing syscalls surfaced at runtime).
d5 — busybox runs. M8 COMPLETE (2026-06-04)
scripts/build-busybox.sh (make busybox) cross-builds busybox 1.38.0 (ash standalone shell +
ls/cat/echo/pwd, static) with aarch64-elf-gcc against ./sysroot (newlib) + userland/compat, then
links the busybox objects with our crt0_newlib + newlib_syscalls + compat/stubs.c (dirent over
getdents, termios over syscalls 7/8, fork/execve/waitpid, uid/pwd/ioctl/getline/… ) using
user_newlib.ld → build/busybox.elf, embedded in the kernel (user_blob.S).
Standalone applet dispatch: the shell re-execs bb_busybox_exec_path (/proc/self/exe) with
argv[0]=<applet>; exec.swift resolves /proc/self/exe (and /bin/{busybox,sh,ls,cat,echo,pwd}) to
the embedded busybox image, so execve reloads busybox and it runs the named applet.
M8 acceptance MET: the kernel boots, runs every milestone demo, then launches busybox sh as the
init shell; tests/busybox_test.sh drives it and asserts:
BusyBox v1.38.0 ... built-in shell (ash)
# echo M8-BUSYBOX-OK -> M8-BUSYBOX-OK
# ls / -> bin etc readme.txt hello.txt tmp
# cat /etc/motd -> Welcome to swift-os.
# exit -> code 0
Prereqs: make newlib && make busybox once, then make build / make test. The full
M0 → M8 path is complete: a static busybox sh runs ls/cat/echo on our read-only base + tmpfs in QEMU.
Network stack (N-series) — own Swift, sans-IO
The next major arc is our own TCP/IP stack in Embedded Swift, following the sans-IO direction recorded
in docs/ARCHITECTURE.md ("Future network stack model"). Decisions locked at net-a:
- In-kernel for now. ARCHITECTURE's long-horizon target is a userland driver/stack service gated by capabilities, but restartable driver services are a non-goal "this stage" and the codebase is monolithic. net-a keeps the driver and the protocol core in-kernel. The sans-IO purity of the core is what preserves the option to lift it into a userland service later without rewriting its logic.
- Zero-copy data path. RX buffers are PMM pages the device DMAs into; the sans-IO core reads the
Ethernet frame straight out of the RX buffer (no bounce copy in), and replies are written directly into
the TX DMA buffer and handed to the transmit ring by address (no copy out). Only the 12-byte
virtio_net_hdris added. Honors the ARCHITECTURE N0–N4 zero-copy requirement from the start. - sans-IO core in
kernel/net/*.swift— pure Swift, no MMIO/heap-per-packet/syscalls — compiled both into the kernel (Embedded) and into a host unit test (tests/net_test.swift), exactly likefdt.swift↔tests/fdt_test.swift. The control-plane ARP cache is the only heap use; the per-packet path does not allocate.
net-a — virtio-net driver + sans-IO Ethernet/ARP/IPv4/ICMP (DONE, 2026-06-06)
- Driver
kernel/drivers/virtio_net.swift(Swift). Mirrorsvirtio_blk.cbut in Swift (the project default;uart.swiftis the Swift-MMIO precedent) with two virtqueues plus an RX buffer pool. Scans the HAL virtio-mmio window for a modern device id 1, negotiatesVIRTIO_F_VERSION_1(+VIRTIO_NET_F_MACwhen offered), reads the MAC from config space, sets up the receive (queue 0) and transmit (queue 1) rings from PMM pages, pre-fills the RX ring, and polls the used rings (IRQs masked, like the blk driver and virtio-input). MMIO + cache maintenance go through the io.h C bridge (newdc_cvac/dc_ivac/dsb_syinlines); everything else is Swift, including~Copyable-style buffer ownership via the PMM pool. - sans-IO core
kernel/net/.packet.swift(byte/BE helpers, RFC 1071 internet checksum,MAC),ethernet.swift,arp.swift(request/reply + a tiny ARP cache),ipv4.swift(no options/frag),icmp.swift(echo), andstack.swift(NetStack.onFrame+buildArpRequest/buildEchoRequest). The core consumes one received frame and writes any reply into a caller buffer; it does no I/O. NB: ARPspais at offset 14 (after the 6-byteshaat 8), not 12 — an early bug caught by the host test. - Boot probe
runVirtioNetProbe(kernel/main.swift), run aftervfsInit: brings up virtio-net, ARPs the slirp gateway10.0.2.2, then sends an ICMP echo request and waits for the reply, loggingnet-a OK: ICMP echo reply from 10.0.2.2. A no-op (one log line) when no NIC is attached, so the other boot/test paths are unchanged (mirrorsrunVirtioBlkProbe). Static addressing: guest10.0.2.15, gateway10.0.2.2; no DHCP yet. - Tests.
tests/net_test.swift(host) feeds crafted frames and asserts ARP request/reply build + parse, ARP-cache population, IPv4/ICMP checksum correctness, echo reply recognition, the inbound echo responder, and rejection of runt/bad-checksum frames.tests/virtio_net_test.shboots-kernelwith-netdev user,id=n0 -device virtio-net-device,netdev=n0and asserts the threenet-aserial lines. Both wired intomake test. - QEMU launch: the slirp gateway answers ARP for and ICMP echo to
10.0.2.2while the guest spins — the vCPU busy-poll does not starve QEMU's iothread, so the reply arrives. Acceptance is guest-initiated because slirp does not reliably originate ICMP to the guest headless.
net-b — sans-IO UDP + a capability-gated socket syscall surface (DONE, 2026-06-06)
- sans-IO UDP
kernel/net/udp.swift(pure, host-tested): parse/build + the IPv4 pseudo-header checksum, reusing a newsumBytes/sumWord/foldChecksumaccumulator inpacket.swift(so a checksum can span the pseudo-header + UDP header + payload).NetStack.onFramegained a UDP branch that reports a received datagram viaRxOutcome(gotUDP, src IP/port, dst port, payload offset+len) without copying, plusbuildUDP; it also now learns L2 from inbound IPv4 (arp.insert(ipSrc, ethSrc)) so replies route without an extra ARP. - Sockets are VFS fds. New
fdKindSocketinkernel/vfs/vfs.swift;OpenDescription.nodeindexes a kernel socket table.close/pollwork uniformly (poll pumps the NIC when a socket fd is present). - Kernel socket layer
kernel/net/socket.swift(kernel-only, not in the host test): one shared liveNetStack(gNet), brought up once bynetInit(); a fixed socket table with a small per-socket datagram ring backed by a single PMM region.netPump()drains the NIC and routes UDP to bound sockets (socketDeliverUDP, called fromvirtioNetPoll).socketRecvpumps until a datagram arrives or a bounded timeout.socketSendroutes via the ARP cache, falling back to the slirp gateway. net-a's probe now sharesgNet/netInitinstead of a local stack. - Syscalls 38–41:
socket/bind/sendto/recvfrom.socket()requires the newcapNet(1<<5); the boot context androot(store caps 31→63) hold it. The 3-arg ABI is kept:sendto/recvfrompass a smallswiftos_udp_msgstruct by pointer (buf/len/ip/port), validated viauser_access. - Userland:
swiftos_socket/bind/sendto/recvfromin theswift_user.*bridge;userland/udpecho.swift→/bin/udpechobinds UDP 5555, echoes the first datagram, prints the size/sender. - Tests:
tests/net_test.swiftgained UDP cases (build/parse + pseudo-header checksum + bad-checksum reject).tests/udp_echo_test.shboots with-netdev user,hostfwd=udp::5555-:5555, runs/bin/udpecho, sends a datagram from the host withnc -u, and asserts the guest's "got 8 bytes from 10.0.2.2:" line and that nc received the echo back. Both wired intomake test. (busybox_testupdated: root caps now0x3f.)
net-c1 — sans-IO TCP connection state machine (DONE, 2026-06-06)
kernel/net/tcp.swift(pure, host-tested): TCP segment parse/build + the pseudo-header checksum (reusingsumBytes/sumWord/foldChecksum), wraparound-safe sequence comparisons (seqLT/seqLEQ, RFC 1982), and aTCPConnectionstate machine. It consumes parsed inbound segment fields (+ payload + anowtick) and emits outbound segment descriptors (TCPSegmentOut: flags/seq/ack/window/payload span) into a fixed queue the caller drains — no I/O, no kernel state, identical Swift for kernel and host.- Scope: passive open (LISTEN→SYN_RCVD→ESTABLISHED) and active open (→SYN_SENT→ESTABLISHED); in-order
data with cumulative ACK (out-of-order/old → drop + re-ACK); an app send buffer with a single-timer RTO
retransmit of the oldest unacked data; a fixed window; the full close handshake (active
FIN_WAIT_1→FIN_WAIT_2→TIME_WAIT; passive CLOSE_WAIT→LAST_ACK→CLOSED); RST. The SYN/FIN phantom sequence
numbers are handled (passive-open completion is an explicit branch since
processAckonly tracks data + a queued FIN). Intentionally deferred to net-c2+: out-of-order reassembly, delayed ACK, Nagle, congestion control beyond the fixed window, SACK, timestamps. ISS is fixed (0x1000) for net-c1 determinism; net-c2 seeds it from the RTC. - Not wired into the kernel yet — the engine is dead code in the image (
--gc-sectionsdrops it) until net-c2 connects it to sockets. It compiles into the kernel (Embedded) to keep it building. - Tests:
tests/net_test.swiftdrives the engine with crafted segments — checksum, passive handshake, in-order data + cumulative ACK, old-segment re-ACK, app send + ACK drain, RTO retransmit, passive close, active open + active close, and RST — plus the sequence-wraparound comparisons. Host gate inmake test.
net-c2 — TCP sockets + /bin/tcpecho, in-QEMU (DONE, 2026-06-06)
NetStackreports TCP (stays pure):onFrame's IPv4 path validates the TCP checksum and fillsRxOutcomeTCP fields (flags/seq/ack/window/payload offset+len);buildTCPbuilds a segment frame (payload placed before the header so the checksum covers it).tcp.swiftgained an ISS parameter on the open calls andcopySegmentPayload.- Kernel TCP sockets (
kernel/net/socket.swift): the socket table carries a protocol tag; a TCP socket is a listener or a connection (owns aTCPConnection, keyed by the 4-tuple).socketDeliverTCP(called fromvirtioNetPoll) demuxes by 4-tuple, spawns a connection on a SYN to a listener, drivesonSegment, andtcpDraintransmits the emitted segments viabuildTCP.tcpListen/tcpAccept/tcpRecv/tcpSendback the syscalls;socketClosesends a FIN first. ISS seeded fromrtcNow(). - Accept latch (bug fixed during bring-up): a fast client (nc) sends SYN→ACK→data→FIN within one NIC
pump, so the connection races past
.establishedto.closeWaitbeforeacceptpolls.acceptnow matches a one-shot "handshake completed" latch (set on theestablishedevent) rather than the live state — otherwiseacceptnever returns for a quick client. - Sockets-as-fds:
vfsSockethonorstype(SOCK_STREAM→TCP); newlisten(42)/accept(43) syscalls; TCP streams useread/writeon the connection fd (vfsRead/vfsWritedispatchfdKindSocket+TCP totcpRecv/tcpSend);pollreports a listener readable when a connection awaits accept, a connection when it has data or peer-closed. UDP keeps sendto/recvfrom. - Userland:
swiftos_socket_stream/listen/acceptbridges (stream I/O reusesswiftos_read/write);userland/tcpecho.swift→/bin/tcpecho(bind 5555, listen, accept one connection, read a chunk, echo, close). - Acceptance:
tests/tcp_echo_test.shboots with-netdev user,hostfwd=tcp::5555-:5555, runs/bin/tcpecho, connects withnc, and asserts the guest's "got N bytes" line + that nc received the echo — the full SYN/data/echo/FIN round-trip. Wired intomake test. - Deferred: accept backlog > 1, graceful TIME_WAIT after close (the slot is freed once the FIN is flushed), congestion control. net-c (a+b+c1+c2) is complete.
net-d — TCP connect() (active client) + /bin/tcpget (DONE, 2026-06-06)
socketConnect(kernel/net/socket.swift): assigns an ephemeral local port, resolves the dest MAC (ARP cache → slirp gateway),activeOpens theTCPConnection(RTC-seeded ISS), drains the SYN, then pumps the NIC until the established latch fires or a timeout — the 4-tuple demux already routes the SYN-ACK back.netPumpnow also runs each live TCP connection'stick+ drain (RTO retransmit), closing a net-c2 gap.connect(fd, ip, port)= syscall 44 (fits the 3-arg ABI directly — no arg struct).vfsConnectvalidates the fd/port; read/write/close on the connected fd reuse the net-c2 stream paths.- Userland:
swiftos_connectbridge +userland/tcpget.swift→/bin/tcpget [ip] [port](dotted-IP parser; default10.0.2.2:5555): connect, send a request line, read the reply, print it, close. - Acceptance:
tests/tcp_connect_test.shruns a hostnc -l 5555server (QEMU slirp maps10.0.2.2to the host, so it is reachable with no hostfwd), boots, runs/bin/tcpget, and asserts the guest received the server'ssrv-reply(host→guest) and the guest's request appears on the wire (guest→host) via a QEMUfilter-dumppcap. The pcap is used for the guest→host check because nc's file output is block-buffered and its exit timing is unreliable — the guest's TX bytes on the NIC are the deterministic signal. (A live debug confirmed the guest correctly transmits data even from CLOSE_WAIT when a fast server FINs first.) - Deferred: DNS/name resolution (numeric IP only), a real ephemeral-port allocator (currently
40000 + slot). The TCP stack now does both directions: inbound server (/bin/tcpecho) and outbound client (/bin/tcpget).
net-e — concurrent poll()-driven HTTP server /bin/httpd (DONE, 2026-06-06)
- A real concurrent server, the stated purpose of swift-os.
userland/httpd.swift→/bin/httpd:socket/bind(8080)/listen, then a singlepoll()event loop multiplexing the listener plus all live connections (fixed table, cap 8). On listener-readable itaccepts and tracks the new fd; on connection-readable it reads the request, sends a fixedHTTP/1.0 200 OK+Hello from swift-os(built viaStaticString.withUTF8Buffer— no String/Array/unicode-table dependency), and closes (Connection: close). Concurrency is real: several connections are in flight across poll iterations. - Kernel at the time: no change needed.
vfsPollalready pumps the NIC and reports socket readiness (socketPollReadable: a listener is readable when a connection awaits accept, a connection when it has data or peer-closed), andsocketDeliverTCPspawns a connection socket per SYN. The calls are poll-gated, so the existing blockingaccept/readreturned immediately; later nginx work added minimalO_NONBLOCKhandling for direct nonblocking socket calls. - Only new plumbing: a
swiftos_poll(fds, nfds, timeout_ms)userland bridge over the existingSYS_POLL(26); the Swift caller builds the 8-bytepollfdrecords (fd@0/events@4/revents@6) in a scratch buffer. Reached by absolute path (/bin/httpd). - Acceptance:
tests/httpd_test.shboots withhostfwd=tcp::8080-:8080, runs/bin/httpd, fires two concurrent hostcurls (falls back to annc-built GET), and asserts both receive the body and the serial shows ≥2httpd: 200lines — concurrent serving end to end. Wired intomake test. - Deferred: keep-alive (HTTP/1.0 close only), request parsing/routing (responds to any request),
maxSockets/conn-table caps (8). swift-os now hosts a working concurrent network server.
Process/resource monitor: native Swift /bin/top + CPU/mem accounting (DONE, 2026-06-07)
A live top (userland/top.swift) — the natural successor to /bin/ps. ps was a one-shot dump
because the kernel had no CPU/memory accounting ("CPU, memory, tty, and time columns need more kernel
accounting", M8 note). This adds that accounting and renders it as a refreshing top-style screen:
a summary header (uptime, task states, CPU busy/idle, RAM total/used/free, and the kernel's own
footprint) plus a per-process table (PID/PPID/USER/STATE/%CPU/RES/TIME+/COMMAND) sorted by %CPU.
- Kernel accounting (
kernel/user/process.swift). New parallel arrays:pCpuTicks(per-process CPU ticks),pStartTick(systemTicks at creation),pResPages(resident user pages), plus anidleTickscounter. RES is tracked at the obvious map sites:createProcess= ELF image pages + stack;forkcopies the parent's count (the eager clone duplicates every page);execveresets to the new image;sbrkadds heap growth. The image page count comes fromelf.c, which now exportself_last_load_pages()(distinct frames the lastelf_loadmapped — counted only on a freshpmm_alloc_page, not on a shared-page perm upgrade). - EL0-charged CPU time.
processOnTicknow takesfromEL0: it charges a tick as user time to the running process only when the timer interrupted EL0; EL1 ticks (the scheduler's idlewfi, and a process parked in awfi-based blocking syscall such aspoll/read) count as idle.irqHandlerreadsSPSR_EL1.Mat entry (still the pre-IRQ PSTATE — no nested EL1 exception is taken before it) and passes it down. Effect: an idle system reads ~100% idle and a process sleeping on input reads ~0% CPU, while a CPU-bound EL0 loop reads ~100%. The preemption decision is byte-identical to before — only which counter increments changed. Limitation (documented in the code): kernel "system" time is bucketed into idle, since a real syscall doing work and a syscall parked inwfiboth look like "currentProc at EL1"; a separate sy% would need to tell them apart. - Syscalls 45/46.
sysinfo(buffer)fills a 64-byte stats blob;procstat(buffer, capacity)fills 56-byte per-process records. Both are additive — the 32-bytepsinfo(22) record is untouched, so/bin/pskeeps working. Memory totals come fromplatform.ramSize,pmm_free_count/pmm_total_count(new), the kernel image span (__image_end − (ramBase + 0x80000)), andswiftos_kernel_heap_used_bytes.generic_timer.swiftnow publishestimerHzfor tick↔second conversion. - Userland.
/bin/topis a pure byte-oriented Embedded Swift program (no String/Unicode tables, so it links ~27 KiB like/bin/ps, not ~160 KiB likecalc). It builds each frame in one 8 KiB buffer and writes it once. %CPU is a per-interval rate from the delta in a process's CPU ticks between refreshes (the first frame falls back to the since-start average). USER resolves the principal→name from/etc/swos/passwd(theid/lscolon-scan pattern), TIME+ isM:SS.cc(centiseconds, exact at 100 Hz), RES is in KiB. The bridge (swift_user.{c,h}) gainedswiftos_sysinfo_refresh/swiftos_top_refresh- scalar accessors (the proven
pspattern, so Swift never touches a C struct field) andswiftos_set_raw(clear ICANON+ECHO for single-keyq, keep ISIG).
- scalar accessors (the proven
- Modes.
topinteractive (clear+repaint every 2s via an ANSI home, raw tty,qto quit — the delay is apoll(stdin, timeout)that doubles as the quit check);top -bbatch (no cursor/raw, for scripts/logs);top -d SECSdelay;top -n Niterations;top -h. Reached by absolute path; routed inexec.swift. Caveat: interactive mode left raw if killed by Ctrl-C (no custom signal delivery yet) — the next shell resets its own termios;qis the clean exit. - Test.
tests/top_test.sh(wired intomake test): logs in as root, runs/bin/top -b -n 2 -d 1, and asserts the uptime/Tasks/Cpu/Mem/Kernel header lines, the column header, that two frames rendered (the refresh/%CPU-delta path), that top lists its own row, and that the shell survives top.
Native-Swift userland: ls cat echo pwd ps top id mkdir rmdir rm mv chmod chown head touch wc date calc kv.
Kernel memory footprint (measured 2026-06-07, before this feature)
Recorded because /bin/top's Kernel: line reports it live. For the QEMU virt -m 256M build at the
time /bin/top was added (llvm-size build/kernel.elf + the linker symbols + the boot log):
- Static:
.text+.rodata+.got≈ 140 KiB,.data≈ 2.3 KiB,.bss≈ 55 KiB → ELFdec≈ 197 KiB;kernel.bin(flat, loadable) ≈ 142 KiB. - Resident at boot (
_start0x4008_0000 →__image_end, roughly 2.3 MiB with the current linker reservation): 144 KiB code/data + 55 KiB bss + 64 KiB boot stack + 16 MiB early bump heap. - Dynamic: of 256 MiB RAM the kernel, the 512 KiB sub-load-base hole, and the PMM bitmap consume about 1.3 MiB before any process runs. The accounting/syscalls added by this feature grow the image by ~3 KiB.
- P13 server package smoke raised the disk-backed ELF exec staging buffer to 8 MiB because the first static nginx package is larger than the earlier 2 MiB busybox-sized buffer.
net-f — DNS resolver: sans-IO codec + resolve syscall + /bin/nslookup (DONE, 2026-06-07)
- sans-IO codec
kernel/net/dns.swift(pure, host-tested):dnsBuildQuery(header + length-prefixed QNAME labels + QTYPE A/QCLASS IN) anddnsParseResponse(validate id/response/rcode, skip the question, walk answers, return the first A record). Handles name-compression pointers (0xC0) when skipping names and bounds-checks every read (a malformed/hostile response can't over-read). - Kernel resolve
dnsResolve(kernel/net/socket.swift): a transient UDP socket (reusingsocketCreate/Bind/Send/Recv) sends the query to a DNS server and parses the reply. Query id fromrtcNow(); a dedicated PMM scratch page holds the query/response.serverIP == 0defaults to slirp's DNS at 10.0.2.3:53. resolve(name, server_ip, server_port) = syscall 45, gated oncapNet; returns the IPv4 in x0 (0 = failure), a value return liketime.userland/nslookup.swift→/bin/nslookup <name> [server] [port]printsname -> a.b.c.d.- Tests:
tests/net_test.swiftgained DNS cases (query encoding; parse an A record reached via a compression pointer; CNAME-then-A; NXDOMAIN/wrong-id → 0).tests/dns_test.shruns a tiny hostpython3UDP DNS responder (answers any A query with192.0.2.7); the guest/bin/nslookup test.swos 10.0.2.2 5354queries it (slirp routes guest→10.0.2.2to the host) and printstest.swos -> 192.0.2.7— fully hermetic. Skips cleanly ifpython3is absent. Wired intomake test. - Deferred: connect-by-name in
/bin/tcpget(small follow-up), caching, IPv6/AAAA, a real ephemeral port allocator./bin/nslookup name(no server) resolves against slirp's real DNS for interactive use.
net-g — static-file HTTP server (/bin/httpd serves the VFS) (DONE, 2026-06-07)
/bin/httpdnow serves real files instead of a canned body. Per connection it parses the request line (GET <path>), maps the path into a/wwwdocroot on the VFS (/→/www/index.html), and streams the file with astat-derivedContent-Length(open/read→writein chunks), 404 on miss. The poll() concurrency from net-e is unchanged. Userland-only (userland/httpd.swift+ the existingopen/read/close/statbridge); no kernel change.- Docroot, not the whole VFS: only
base/www/is reachable (seed filesindex.html,hello.txt), so the server never exposes/etc/swos/passwdetc. A path-traversal guard rejects any..in the request path (and requires a leading/) → 404; verified a rawGET /../etc/swos/passwdreturns 404, no leak. - Tests:
tests/httpd_test.shupdated — two concurrentcurls for/index.htmlboth get the page (concurrency),/hello.txtreturns its content (file serving), a missing path returns HTTP 404, and the serial shows ≥2httpd: 200lines.base/www/*ride along via the existingBASE_SEED_FILESglob. - Deferred: keep-alive, MIME types (all served as
text/html), large-file streaming beyond a chunk loop is present but untuned, directory listings. swift-os now serves its filesystem over HTTP.
net-h2 — HTTP MIME types + directory listing (DONE, 2026-06-07)
- MIME by extension:
/bin/httpdderivesContent-Typefrom the request path's final extension (.html→text/html,.txt→text/plain,.css/.js/.json, elseapplication/octet-stream) instead of the net-g hardcodedtext/html. The extension is the last.within the final path segment (a/resets the scan). - Directory listing: when the resolved
/wwwpathstats as a directory (S_IFDIR), httpd reads it withswiftos_getdents(same dirent layout as/bin/ls) and serves a generated HTML index (skipping./..), buffered soContent-Lengthis accurate./still prefers/www/index.html; a dir with no index (the new seedbase/www/sub/) gets the listing. The..guard is intact. Userland-only. - Test:
tests/httpd_test.shextended —GET /hello.txtcarriesContent-Type: text/plain(viacurl -D -), andGET /sub/returns a listing containingnote.txt, alongside the net-g concurrent index + 404 assertions. - Deferred: keep-alive, percent-decoding, HTML-escaping dirent names, listing sort/size columns.
net-rob — TCP/socket robustness (DONE, 2026-06-07)
Hardening pass, no new syscalls. Confined to kernel/net/tcp.swift, kernel/net/socket.swift,
tests/net_test.swift.
- Ephemeral-port allocator. A pure rotating allocator
nextEphemeralPort(cursor:inUse:)over the IANA dynamic range 49152–65535 now lives intcp.swift(sans-IO, so the host net_test can unit-check it).socket.swiftkeeps a liveephemeralCursorand aninUsepredicate over the bind table;socketConnect, the UDPsocketSend(implicit bind on first send), anddnsResolveall draw from it, replacing the old40000 + slotscheme so two concurrent outbound connections (or a slot reused after close) can't collide on a stale port. The cursor wraps within the range and skips ports another bound socket already holds. - Larger connection tables.
maxSockets16 → 32. The socket buffer pool scales with it (sockBufBytes = 32·4·1536 = 192 KiB→sockBufPages = 48, one PMM alloc atnetInit); the DNS scratch is a separate single page, unaffected. Memory cost is ~24 KiB extra, allocated once. - TCP teardown edge cases (engine). A RST from any non-LISTEN state (incl. SYN_SENT, the
FIN_WAIT/CLOSING/CLOSE_WAIT/LAST_ACK close states, TIME_WAIT) now cleanly tears down: state→CLOSED, RTO
off, queued output dropped, and both
ev.resetandev.closedflagged. TIME_WAIT already decays to CLOSED viatickaftertcpTimeWaitTicks; simultaneous-close ordering is correct (FIN before the ACK-of-our-FIN → CLOSING → TIME_WAIT once acked; FIN+ACK together → TIME_WAIT directly). - Slot reaping (socket layer).
netPumpnow callsreapConnIfDeadper live connection: a listener-spawned connection that reaches CLOSED (TIME_WAIT having decayed) and was never accepted is freed, so a refused/reset backlog entry or a TIME_WAIT remnant can't leak the (now larger) table. Accepted connections stay owned by their fd and are freed only bysocketClose(which already discards the engine state regardless of TIME_WAIT), and active-open sockets (sockListenerOf == -1) are never reaped out from under the app. - Tests (
tests/net_test.swift): added RST-from-ESTABLISHED, RST-from-FIN_WAIT_1/FIN_WAIT_2, full active+passive close → TIME_WAIT → CLOSED (drivingtickpast the timer), simultaneous-close → CLOSING → TIME_WAIT, and an ephemeral-allocator unit check (rotation, skip-in-use, wrap). All prior cases still pass; the two in-QEMU acceptance scripts (tcp_echo_test.sh,tcp_connect_test.sh) still pass. - Deferred: TIME_WAIT FIN re-ACK on a peer retransmit, SO_REUSEADDR semantics, a per-connection RTT estimator (RTO is still a fixed 1 s). The wider table is a cap bump, not a dynamic table.
net-h — ChaCha20-Poly1305 AEAD (RFC 8439), TLS groundwork (DONE, 2026-06-07)
- Pure crypto module
kernel/crypto/chacha20poly1305.swift(no Foundation/MMIO/syscalls/heap, same purity askernel/net/packet.swift, so it compiles both for the host test and for the kernel — Embedded):chacha20Block(20-round keystream block) andchacha20Encrypt(key, counter, nonce, in, out, len)(the symmetric stream cipher;outmay aliasin). 256-bit key, 96-bit nonce, 32-bit block counter.poly1305Mac(key, msg, len, tagOut)— the one-time MAC over GF(2^130 − 5), implemented with a schoolbook 5×26-bit-limb multiply-reduce (no 128-bit-int dependency), final reduction + add-s.aeadSeal(...)/aeadOpen(...) -> Bool— the AEAD construction (§2.8): block-0 keystream derives the Poly1305 key, ChaCha20 from counter 1 encrypts, the tag coversaad ‖ pad16 ‖ ct ‖ pad16 ‖ len64(aad) ‖ len64(ct).aeadOpenverifies the tag in constant time (no early-out) and only then decrypts. Callers pass ascratchbuffer for the MAC input (no allocation inside the module).
- Self-contained byte helpers (
cb8/cb8set/le32, file-private) rather than reusingkernel/net/packet.swift'sb8/b8set— under-wmothe whole module compiles together, so the net helpers would collide; keeping crypto independent also lets--gc-sectionsdrop it cleanly while unused. - Wired into the kernel
SWIFT_SRCSso it keeps building in Embedded mode; it is unused/gc'd for now, exactly likedns.swiftwas before net-f wired it up. No kernel paths call it yet. - Host test
tests/crypto_test.swiftasserts the published RFC 8439 vectors: §2.4.2 ChaCha20 of the "Ladies and Gentlemen…" plaintext (key 00..1f, nonce …4a…, counter 1) → the published ciphertext (plus a symmetric round-trip), §2.5.2 Poly1305 of "Cryptographic Forum Research Group" →a8061dc1…27a9, and §2.8.2 the full AEAD seal (ciphertext + 16-byte tag) plusaeadOpenaccepting the valid tag and rejecting a corrupted one. Built/run with$(HOST_SWIFTC)right afternet_testin thetest:target. - This is TLS groundwork only. TLS 1.3 mandates AEAD_CHACHA20_POLY1305; the handshake, key schedule (HKDF), and record layer are deliberately deferred to a later milestone. No networking or syscalls were added here.
net-ipv6 — IPv6 foundation + NDP + dual-stack sockets + RA/EH/multicast + userland + E2E tests (DONE, net/ipv6 branch 2026-06)
Parallel workers delivered the IPv6 slice on top of the net stack (see git log on net/ipv6 for the subagent slices: foundation, protocol, userland, tests, integration). This is the concrete realisation of the "IPv6 later" placeholder in ARCHITECTURE.md "Future network stack model".
- Foundation (ipv6.swift + early netInit).
struct IPv6(two UInt64 for value semantics), byte accessors,ipv6LinkLocalFromMAC(modified EUI-64 → fe80::/10),ipv6SolicitedNodeMulticast,ipv6FromPrefixAndIID(for RA-derived globals),ip6WriteHeader/ip6*accessors, IPv6 pseudo-header checksum (sumIPv6Pseudo+ipv6UpperChecksumfor UDP6/TCP6/ICMPv6).netInitderives link-local from virtio MAC and passes toNetStack(..., ipv6: our6); logs "net: IPv6 link-local configured (EUI-64 from MAC)".gNetnow carries the IPv6;netLocalIPv6/netGatewayIPv6globals for kernel use. (Cross-ref commit "net: IPv6 + NDP + dual-stack sockets (foundation...)"). - NDP (icmp6.swift + stack.swift NeighborCache + onFrame). Full NS/NA (types 135/136):
icmp6WriteNS(with optional SLLA opt),icmp6WriteNA(Solicited|Override flags + TLLA),icmp6NDTarget, flag bits.NeighborCache(fixed Entry table, insert/lookup) in stack; on inbound NA learn target→MAC. On NS for us: reply with NA and learn peer from SLLA.socketSendv6uses NDP cache (falls back to NS+wait for resolution). NDP also learns from any IPv6 src L2 (best-effort). Unsolicited NA (e.g. to all-nodes) also populates. - Dual-stack sockets + VFS.
socket.swift: AF_INET6 paths (socketCreateIPv6,socketCreateTCPIPv6), parallel tablessockRemoteIPv6/sockRemoteMacv6/dgSrcIPv6etc,socketDeliverUDPv6/socketDeliverTCPv6,socketSendv6(NDP resolve +buildUDPv6/buildTCPv6),socketRecvFromIPv6.stack.swiftadds gotUDPv6/ gotTCPv6 + v6 fields in RxOutcome, and full IPv6 dispatch inonFrame.vfs.swift(vfsOpen/vfsConnect etc): detects AF_INET6 via family, routes to v6 socket creators, uses 16-byte IPv6 in connect/send/recvfrom syscalls (new swiftos_*_ipv6 bridge calls). Sockets remain capNet-gated VFS fds; poll/ close uniform. - Protocol enhancements (RA/EH/multicast in kernel/net).
- RA (RFC 4861):
icmp6TypeRA,icmp6WriteRA(base + optional Prefix Information option type=3 with L/A flags, lifetimes),icmp6ParseRA(walks options, extracts hopLimit + first prefix). Instack.onFrame(IPv6 path): on RA,ndp.insert(routerLLA), setraReceived/raHopLimit/raHasPrefix/raPrefix/raFormedGlobal(via ipv6FromPrefixAndIID). (Added in "net/ipv6: add icmp6WriteRA for full RA build/parse roundtrips" + "fuller RA/NA/EH/multicast support"). - Extension Headers (RFC 8200): IPv6 ingress walks next-header chain (up to 4 skips) and advances over Hop-by-Hop (0), Routing (43), Destination Options (60) using HdrExtLen, and fixed 8-byte Fragment (44). This ensures L4 (UDP/TCP/ICMPv6) and NDP/RA delivery even when EHs or HBH options are present in test frames or on the wire. Skips are bounded; malformed truncate safely.
- Multicast acceptance: IPv6 path in
onFrameaccepts our unicast, our solicited-node multicast, the all-nodes link-local (ff02::1, for RA and unsolicited NA), plus loopback-for-test. Enables RA receipt and NDP without a full MLD impl. - Also: buildUDPv6/buildTCPv6 (with v6 pseudo checksums), ICMPv6 echo request/reply over v6, full checksum validation using base-header src/dst (not L4 addrs).
- RA (RFC 4861):
- Userland IPv6 support.
userland/lib/swift_user.{h,c}bridge gained the AF_INET6 + v6 msg variants:swiftos_socket_ipv6/swiftos_socket_stream_ipv6,swiftos_bind(reused),swiftos_sendto_ipv6/swiftos_recvfrom_ipv6(use 16-byte IPv6 layout), stream read/write unchanged.udpecho.swift: argv[1]=="6" → use v6 socket + recvfrom/sendto_ipv6 + printIPv6 (colon-hex groups); logs "listening on 5555 (IPv6)" and "got N bytes from:port". (Commits: "userland: udpecho IPv6 support", "net/ipv6: userland udpecho/tcpecho/nslookup IPv6 support (AF_INET6 + bridge)"). tcpecho.swift: analogous "6" path withswiftos_socket_stream_ipv6/listen/accept(logs IPv6 variant); uses plain read/write for the stream. ( "userland: tcpecho IPv6 support").nslookup.swift: AAAA support + IPv6 result printing (tightened in "tests: tighten ipv6_*_echo + dns for userland IPv6" + "userland: nslookup IPv6 + AAAA support"). All reached via absolute /bin/* paths from packed base (exec.swift); bare names stay busybox.
- Host unit coverage (aggressive).
tests/net_test.swift(built+run inmake testright after page allocator) gained a large IPv6 block after the v4 cases: header parse/build + version/nh/payload accessors, pseudo-header checksum roundtrips for UDP6/TCP6 (corruption detection), ICMPv6 echo writers + checksums (over v6 addrs), full NDP NS wire + on-stack NS→NA reply roundtrip in a dual-stackNetStack, NA parse/flags,ipv6LinkLocalFromMACEUI-64 U/L bit,ipv6SolicitedNodeMulticast, RA parse with prefix-info option (hopLimit + formed global), bad-version/truncation guards, and v6 UDP/TCP delivery fields viaonFrame. (Commit "net/ipv6: aggressively extend host net_test with IPv6 cases" + earlier foundation). All exercised with dual-stackNetStack(mac, ip, ipv6: ...)instances. - E2E QEMU tests (dedicated scripts, wired into make test).
tests/ipv6_smoke_test.sh: boots with-netdev user,ipv6=on, reactive FIFO/await past M7, asserts "net: IPv6 link-local configured" + no panic. Early-boot only (foundation + NDP config path).tests/ipv6_udp_echo_test.sh: on Darwin, where QEMU rejects IPv6 hostfwd literals, requires the smoke test above to pass and reports the AF_INET6 echo path as skipped. On QEMU builds with usable IPv6 hostfwd this script currently boots withipv6=onand exercises an IPv4 UDP echo roundtrip while the NIC is dual-stack, asserting link-local/NDP setup and no-crash behavior. True/bin/udpecho 6+nc -6E2E remains a follow-up once the hostfwd transport is portable.tests/ipv6_tcp_echo_test.sh: analogous for TCP: Darwin falls back to the required smoke test; the non-Darwin body currently validates TCP echo over IPv4 hostfwd underipv6=onand keeps the AF_INET6 echo tightening point local to this script. All three run early inmake test(after virtio_net, before v4 net tests). Hostnet_testremains the aggressive IPv6 protocol oracle; QEMU coverage proves link-local/NDP setup and dual-stack no-crash behavior on this Darwin/QEMU setup.
- Integration / boot / QEMU.
netInitalways configures IPv6 (even on v4-only runs the vars are zero but harmless); ipv6=on only needed for slirp to emit v6 and answer NDP/RA. All test launches that attach virtio-net for net tests now use ipv6=on where the dedicated scripts require it. No new syscalls; dual-stack lives behind the existing socket surface + bridge.build/base.imgstages the IPv6-aware udpecho/tcpecho (and nslookup) ELFs. - Status / deferred. Foundation + NDP + RA/EH/multicast ingress + aggressive host IPv6 tests are green; QEMU coverage currently verifies link-local/NDP and IPv4 echo under an IPv6-enabled NIC on Darwin. Global IPv6 gateway learned via RA (or static); portable AF_INET6 echo hostfwd, SLAAC full, MLD, privacy addrs, frag reassembly, larger conn tables for v6, AAAA in more tools, and lifting stack to userland service are future (post this slice). All prior net-a..h and non-net tests remain green. (See ARCHITECTURE update in same session.)
Threading runtime groundwork (R-series)
rt-a — threads + futex (DONE, 2026-06-07)
The kernel primitives a userland threading runtime (and later Swift-concurrency / Node / the JVM) needs: schedulable EL0 threads that share one address space, plus a futex to block/wake them.
thread_create(entry, arg, stackTop) = syscall 46(processThreadCreate,kernel/user/process.swift): allocates a fresh process-table slot whoseTTBR0is the creator's shared (not cloned) address space, with its own kernel stack and a crafted context that lands in a new EL0 trampolineuser_thread_launch_arg(user_entry.S) — identical touser_thread_launchbut it also delivers the argument in x0, so the thread starts atentry(arg)on the caller-supplied user stack. The thread is parented to the creator's parent so it is a sibling (not a waitpid-reapable child); threads join via futex. Returns a thread id (a pid in the shared table). Because the AS is shared and never freed, no teardown races: a thread exiting just frees its own slot (pIsThreadshort-circuitsprocessExitto self-reap instead of zombify, and drops any futex wait record). The shared AS lives on for surviving threads.futex(uaddr, op, val) = syscall 47(kernel/sched/futex.swift):op=0FUTEX_WAIT blocks the caller iff*uaddr == val;op=1FUTEX_WAKE wakes up tovalwaiters onuaddr(returns the count woken). A small in-kernel wait queue (slot, watched VA) keyed by the user VA — sufficient for a single multi-threaded process, since all its threads share the VA space. The compare-and-block runs underirq_save/irq_restoreso the*uaddrload and the block transition can't race a concurrent WAKE on a preempted sibling. The user word is validated throughuserReadableBufferbefore any access.- Userland bridge:
swiftos_thread_create/swiftos_futex/swiftos_thread_exitand atomic CAS/swap/load/fetch-add helpers (LL/SC the Swift layer can't express directly — justified low-level bridge) inuserland/lib/swift_user.{h,c};SYS_THREAD_CREATE(46)/SYS_FUTEX(47) inuserland/lib/syscall.h. - Demo
/bin/threadsdemo(userland/threadsdemo.swift): spawns 2 EL0 threads that each increment a shared counter 2000× under a 3-state futex mutex (Drepper's "Futexes Are Tricky"), joins them via a futex on a done-counter, and printsthreadsdemo: counter=4000(= 2N). Resolved inexec.swiftand packed into the base image.tests/threads_test.shboots-kernel+ base.img, logs in root/swordfish, runs the demo, and assertscounter=4000. Wired intomake test. - Caveats / follow-ups: fd-table sharing is a snapshot at thread_create (like fork inherit), enough for
the demo's shared stdout; true fd-table aliasing across threads is deferred. A thread's own kernel stack and
the shared AS pages are not reclaimed (same global limitation as processes).
processRunElfstops when the top process exits, so the runtime must join its threads before the main thread returns (the demo does).BOARD=virtualboxstill builds; its boot path parks before the scheduler, so threads don't run there.
Process teardown reclaims frames — the per-command page leak is closed (DONE, 2026-06-07)
- The leak. Process teardown set the slot to
pUnusedbut never returned any frames to the PMM, so every command leaked its whole footprint: the address space (L0/L1 page tables + the L2/L3 tables and every user leaf page), the kernel stack (2 frames), and — onexecve— the replaced old address space (fork clones the shell, the child execs, the clone is dropped). At ~2 MiB per busybox command the OS exhausted RAM after ~100 commands. This was the main barrier to long-running use. address_space_destroy(ttbr0)(kernel/mm/vm.swift). Walks the user half of the tables (L1 index ≥ 2, i.e. VA ≥0x8000_0000) and releases every leaf frame, then frees each L3/L2 table, then the L1 and L0 frames. The kernel/device identity entries (L1 indices 0,1) are 1 GiB block descriptors, not tables — theDESC_TABLEtest skips them, so the shared kernel mapping is never freed. With COW fork, user leaf frames may be shared; teardown drops each leaf's PMM refcount and raw-frees only on the last owner, while page-table frames remain private to the address space. Safe on a partially built space (failedcreateProcess/buildExecImagenow clean up too). If the doomed space is the currently installed TTBR0 (the case when the kernel scheduler reaps a just-exited top-level process, whose tables are still live in the register), it switches to the kernel identity map first so the MMU never walks frames being handed back.process.swiftwiring. A newreapProcess(slot)frees the address space + kernel stack (tracked in a newpKstackarray) and marks the slot unused; it replaces the barepState = pUnusedat all four reap sites (processRunElf,processRunPair,processSpawnChild,processWaitpid).processExecfrees the old address space after switching to the new one (kernel stack is reused across exec, so it is not freed there). A zombie never runs again, so its space/stack are quiescent at reap time.- Test / acceptance.
runReclaimDemo(inmain.swift, on the boot path) records the PMM free-frame count, runs 5 rounds of fork+waitpid (forkdemo), exec-replace (execdemo), and spawn+reap (spawndemo) — the exact teardown paths a shell command takes — and asserts the count is identical before and after. Measured in QEMU:baseline=64747 after=64747(zero leak; before the fix it would drop by hundreds of frames).tests/boot_test.shgreps forreclaim OK: no frame leak across fork/exec/exit/reap. The hostPageAllocatorfree/double-free tests already cover the frame allocator. - Remaining efficiency holes (still future work, by design for now): no page cache; the PMM is O(n)
first-fit (a buddy/free-list refinement is noted in
docs/ARCHITECTURE.md); single core (no SMP). The footprint section above was measured before this feature; steady-state RAM is now flat across commands rather than monotonically shrinking.
Track B — COW fork PMM ownership audit (2026-06-08)
Before adding COW write-fault handling, every pmm_alloc_page / pmmAllocPage /
pmmAllocZeroedPage / pmmAllocPages caller was audited for frame ownership:
- User leaf frames (
elfLoad, process stacks,sbrk,mmap) now start with PMM refcount 1 and are the only frames shared by COW fork. Address-space teardown andmunmapdrop a reference and raw-free only on the last owner. COW write faults allocate a private frame for the writer and drop the old frame's reference. - Page tables, kernel stacks, driver DMA/ring buffers, network socket buffers, DNS scratch, and the ELF
staging buffer are not mapped as COW user leaves. They remain single-owner or permanent kernel/device
allocations and continue to use raw
pmm_free_pageonly where a reclaim path exists. - Fixed during the audit:
address_space_createnow frees a lone L0/L1 allocation if the paired table allocation fails;elfLoadfrees a just-allocated page if mapping it fails;processSbrkrolls back any partially mapped heap pages when growth fails.address_space_clonenow destroys a partially built child address space on link failure, dropping any shared-frame references it already acquired. - Known pre-existing non-COW leak: EL0 thread kernel stacks are still not reclaimed on thread exit, as recorded in the rt-a notes; they are not COW-shared and were not changed in this track.
Timer-backed nanosleep/sleep (2026-06-08)
Why. nanosleep/sleep were silent no-op stubs (userland/compat/stubs.c), so any ported server or
script that slept returned instantly and busy-spun instead of yielding the CPU — wrong for an OS meant to
host server apps. The kernel already had every primitive (100 Hz systemTicks, the pBlocked +
yieldToScheduler block/wake model, and a per-tick hook that runs even while idle), so a real blocking
sleep was a small, self-contained add.
Design. New syscall SYS_NANOSLEEP = 57; ABI x0 = seconds, x1 = nanoseconds, returns 0.
processNanosleep parks the caller in pBlocked with a wake deadline recorded in pWakeTick[slot]
(systemTicks units; 0 = not sleeping) and yields. processOnTick wakes any blocked slot whose deadline has
passed — the scan runs first and unconditionally, so it fires even when currentProc == -1 during the
scheduler's idle wfi. A nonzero pWakeTick is what distinguishes a sleeper from a futex/waitpid/IO
blocker, so the scan never disturbs those. Resolution is one tick (10 ms); a sub-tick request rounds up to
one tick. Sleep always completes fully — blocked syscalls are not signal-interrupted yet, so there is no
unslept remainder (userland zeroes *rem).
Artifacts / test. libc nanosleep/sleep now call the syscall (stubs.c); swiftos_nanosleep added
to the Swift bridge; native /bin/sleepprobe measures an RTC delta around nanosleep(2s) and is registered
in execResolve; busybox CONFIG_SLEEP=y ships a real /bin/sleep. tests/sleep_test.sh asserts
SLEEP_DELTA >= 2 (the old stub gave 0) and that busybox sleep completes end to end.
Cron — deliberately deferred. A real cron/crond is not on the roadmap and was not built: it needs
signal delivery to EL0, a supervisor/init daemon, and crontab storage — none on the critical path. Follow-on
timing surface if a scheduler daemon is ever wanted: SIGALRM/alarm, setitimer/POSIX timers,
signal-interruptible sleep (EINTR + a real rem), then crond/at.
I0 — host-verified tiny Llama2 inference core (DONE, 2026-06-09)
Scope. This is the smallest AI-hosting proof slice: a portable, I/O-free
userland/lib/llama2.swift implementation of the llama2.c checkpoint format,
transformer forward pass, SentencePiece-style BPE tokenizer, and deterministic
greedy generation. It is written so the same source can compile in the host
test runner now and later link into an EL0 /bin/llm demo.
Test model. scripts/fetch-model.sh fetches the tiny TinyStories
stories260K.bin checkpoint and tok512.bin tokenizer on demand into
models/ (gitignored). make model is idempotent and make test depends on
those artifacts before running the host inference test.
Acceptance. tests/llm_engine_test.swift loads the tiny checkpoint,
checks the parsed config (dim=64, layers=5, heads=8, kv=4,
vocab=512, seq=512), and asserts that temperature-0 generation for
Once upon a time matches the upstream llama2.c reference output byte-for-byte
for 64 steps. This pins both tokenizer behavior and the floating-point forward
path without adding a kernel ABI or an in-guest /bin/llm yet.
I1 — /bin/llm runs the inference engine in QEMU (DONE, 2026-06-09)
Scope. A native Embedded Swift EL0 app (userland/llm.swift, /bin/llm)
links the I0 engine (userland/lib/llama2.swift), reads the stories260K
checkpoint + tok512 tokenizer from the read-only base image into anonymous
mmap'd RAM, greedily generates text to the console, and reports tokens/sec.
This proves the engine runs end to end as an isolated EL0 process on the OS.
Pieces. The model files are packed into the base image under /models
(make base-image copies them from ./models). /bin/llm is registered in the
execResolve allow-list (kernel/user/exec.swift). The app links the Unicode
data tables (the BPE tokenizer hashes String keys), like /bin/calc. One
freestanding-math fix was needed for EL0 (no libm): Float.squareRoot() lowers
to a sqrtf libcall, so Mathf.sqrtf is now a pure-Swift Heron iteration — the
host test still matches the reference byte-for-byte, confirming the accuracy;
expf/sinf/cosf were already hand-rolled in I0.
Acceptance. tests/llm_run_test.sh (in make test) boots, logs in as root,
runs /bin/llm, and asserts the generated story matches the llama2.c reference
text and that a tokens/sec figure is reported. Measured ~640–710 tok/s for the
260K model under QEMU TCG emulation with scalar FP and -Osize (an honest
baseline, not native throughput).
Next. I2 replaces the read-into-RAM load with a file-backed read-only mmap
of the weights (the documented "mmap-backed weights" primitive; today's mmap
is anonymous-only). I3 serves generated tokens over TCP via poll().
I2 — file-backed mmap of model weights (DONE, 2026-06-09)
I2a — eager file-backed mmap. New mmap_file(fd, len, prot) [SYS_MMAP_FILE=59]:
a read-only file-backed mmap of a disk-backed base-image file. I2a maps the whole
extent eagerly (the kernel reads it into private frames at mmap time); /bin/llm
switched from read-into-anonymous to this. addressSpaceMmapFile (vm.swift),
vfsFileExtent(fd) (vfs.swift), processMmapFile (process.swift), bridge
swiftos_mmap_file.
I2b — demand paging (lazy). processMmapFile now only reserves the VA range
and records a per-process file-VMA (pFileVmas); no frames are mapped at mmap
time. A translation fault on the region is serviced by processHandleFileFault
→ addressSpaceMapFilePage, which reads just the faulting page from disk, maps
it read-only, and retries. Hooked into the EL0 sync handler (main.swift, ESR
EC=0x24), disjoint from the COW write-fault path. VMAs are reset on exec/fresh
image and copied on fork/thread. A one-shot klog demand-paged file mmap active
marks the path; fileDemandFaults counts serviced faults. This realizes the
documented "mmap-backed weights / page-cache-friendly immutable model bundle"
primitive: resident memory grows only with pages actually touched, and startup
no longer reads the whole model up front.
Tradeoff (honest). Dense inference touches every weight page each token, so
the first forward pass faults in the whole model (~258 single-page reads):
first-token latency rises and steady-state resident still ≈ full model. Measured
~426 tok/s demand-paged vs ~800 eager (QEMU TCG, scalar FP). Demand paging wins
for huge/sparse/over-committed models and future shared-across-cells mappings;
eager wins for dense single-tenant. Exposing eager-vs-lazy as an mmap_file flag
is a natural follow-up. munmap of a lazily-reserved region does not yet
deactivate its VMA (the model-serving path maps once and exits, which is covered).
Acceptance. tests/llm_run_test.sh asserts the file-backed and
demand-paged file mmap active markers and that the generated story still
matches the llama2.c reference. Next: I3 serves tokens over TCP via poll().
I3 — /bin/llmd serves inference over TCP (DONE, 2026-06-09)
Scope. userland/llmd.swift (/bin/llmd) is the model-serving daemon and
the conclusion of the AI-hosting proof arc: the same Swift engine, weights
file-backed mmap'd from /models (I2), served over the network through the
existing capability-gated socket surface. Userland-only; no new kernel ABI is
required on the current VFS-loaded exec path.
Server shape. A poll()-driven loop (the /bin/httpd pattern: listener +
queued connections, one poll() multiplexing all fds). Endpoints:
POST /completion (body = prompt) streams the generated pieces to the socket
as they are produced (HTTP/1.0, Connection: close delimits the body);
GET /health reports liveness + model config; GET /metrics reports
requests, tokens_total, last_ttft_ms, last_tok_s — the first slice of
the AI-serving metrics list in ARCHITECTURE.md. Each request also logs
llmd: served N tokens ttft=X ms rate=Y tok/s on serial. Request parsing
handles multi-segment TCP delivery (bounded read loop until the blank line +
Content-Length bytes arrive). Generation runs inline on the single core; the
KV cache is safely reused across requests because every position is rewritten
before it is attended to.
Measured (QEMU TCG, scalar FP, stories260K). ttft=70 ms on the cold first request — that includes demand-paging the whole model off virtio-blk (I2b) — and ~376 tok/s streaming rate.
Acceptance. tests/llm_serve_test.sh (in make test): boots with a slirp
hostfwd, starts /bin/llmd, then from the host asserts the POSTed completion
matches the llama2.c reference story, /health and /metrics respond with
real counters, and the serial metrics line appeared. With I0–I3 complete, the
flagship claim is demonstrated end to end: swift-os loads an immutable model
bundle, mmaps the weights, and serves deterministic inference over TCP from an
isolated, capability-confined EL0 process.
I4 — Q8_0 int8 quantization; llmd serves stories15M (DONE, 2026-06-09)
Why. The biggest product lever after I3: visibly coherent text (a 15M-param model instead of the 260K toy) in ~3.6× less weight memory (60.8 MB fp32 → 17.1 MB int8). CPU-only int8 is exactly the "small immutable inference appliance" profile.
Quantizer (host). tools/quantize.swift converts a legacy fp32 llama2.c
checkpoint into the llama2.c "version 2" Q8_0 format that upstream runq.c
consumes: 256-byte header (magic ak42, version 2, config, shared flag, group
size), fp32 rmsnorm weights, then per-tensor int8 q[] + fp32 s[] per-layer
interleaved. Quantization math is C-identical in fp32 (scale = max|v|/127,
round half-away-from-zero). GS is picked as the largest power of two ≤ 64
dividing both dim and hidden_dim — runq.c's matmul walks rows in GS steps,
so GS must divide every matmul row length (260K → GS=4, 15M → GS=32). Verified
by feeding the converted files to upstream runq.c itself. make model builds
models/stories260K-q8.bin + models/stories15M-q8.bin via Makefile rules;
fetch-model.sh also fetches stories15M.bin and the full 32000-entry Llama-2
tokenizer.bin.
Engine. userland/lib/llama2.swift gains a LlamaModel protocol (the
fp32 Llama2 and the new QLlama2 both conform; llamaGenerate is generic,
statically dispatched per the kernel protocol guidance) and a faithful runq.c
int8 path: activations quantized per matmul into (int8, scales), int32
accumulation per group scaled by s_w * s_x. One deliberate divergence:
the token-embedding row is dequantized on the fly per token — element-for-
element the same values as runq.c's predequantized table, without spending
vocab*dim*4 bytes (36 MB for 15M) of RAM on a copy. An all-zero activation
group writes q=0, s=0 (same zero contribution as C, without its NaN-cast UB).
TDD. tests/llm_q8_engine_test.swift (host, -O, in make test) pins
both quantized checkpoints to upstream runq.c goldens byte-for-byte
(temperature 0, "Once upon a time", 64 steps) — including the 32000-vocab
tokenizer path. Both matched on the first run; the hand-rolled
expf/sinf/cosf/sqrtf survive the bigger model.
Serving. /bin/llmd now picks the engine by checkpoint magic and defaults
to /models/stories15M-q8.bin + /models/tokenizer.bin (argv can override:
llmd [model] [tokenizer]); /bin/llm stays on the fp32 260K demo (its test
is unchanged). The base image packs the q8 bundle (base.img 4.6 → 22.4 MB).
Measured (QEMU TCG, scalar). fp32 260K console demo: ~492 tok/s. Served
15M-q8 over TCP: ttft=1150 ms on the cold first request (demand-paging all
~17 MB of weights through I2b plus prompt prefill) and ~10 tok/s steady
streaming — a real, visibly coherent TinyStories model served by an isolated
Swift process. tests/llm_serve_test.sh asserts the richer 15M reference text
("She loved to play outside in the sunshine", "It was the sun!"), the
quantized-engine marker (llmd: model int8 Q8_0 GS=32), and live metrics.
I5 — verified model bundles with generation fallback (DONE, 2026-06-09)
Scope. The model-storage model from ARCHITECTURE.md made executable:
/models/<name>/<generation>/{manifest.toml, model.bin, tokenizer.bin} with
integrity verification at load and the verify-and-roll-back policy. Userland +
host tooling only; no kernel change.
Pieces.
userland/lib/modelbundle.swift— I/O-free (host + Embedded, the llama2.swift pattern): a small-TOML-subset manifest parser (key = value,[table],#comments; unknown keys/tables tolerated for forward compatibility — a[signature]table slot is reserved for when an Ed25519 primitive exists), payload verification (size first, then SHA-256 via kernel/crypto/sha256.swift), and the newest-first generation policy.tests/llm_bundle_test.swift(host, inmake test) covers parse, corrupt and size-mismatch rejection, case-insensitive hex, and ordering.tools/modelmanifest.swift— host generator: hashes the payloads and emits the manifest;make base-imagestages generation 1 (the real q8 bundle) and a DELIBERATELY corrupt generation 2 (gen-1 manifest hashes over a truncated model.bin), so every boot demonstrates the fallback./bin/llmdresolves the bundle by default: scan/models/stories15Mvia getdents for numeric generations, try newest first — parse manifest, mmap payloads, verify; a bad generation logsllmd: generation 2 rejected (model size/sha256 mismatch)and the loop falls back, then logsllmd: bundle stories15M generation 1 verified (sha256). argv still overrides with raw paths (no verification) for debugging. A rejected generation's partial mapping stays mapped until exit (lazy-VMA munmap remains a recorded follow-up; verify-model-first ordering bounds the cost to one VMA slot per bad generation).
Measured effect. Verifying the mmap'd weights hashes every byte, which
demand-pages the whole model in at startup: first-request ttft dropped from
1150 ms (I4, cold fault-in) to 90 ms — verified means resident. Steady
rate unchanged (~10 tok/s, QEMU TCG). tests/llm_serve_test.sh asserts the
rejection + verification markers plus the I4 checks.
Still future. Real signatures (needs Ed25519), staging new generations at runtime (needs a writable model store — tmpfs or the persistent update store) and hot reload/drain, per-cell model servers (C6).
I6 — munmap drops file-VMA (demand-paging correctness) (DONE, 2026-06-09)
Bug. processMunmap reclaims the mmap cursor when the bottom region is
freed, so the next mmap reuses the same VA range — but a lazily-reserved file
VMA (I2b) survived munmap. A new mmap_file landing on the recycled VA would
demand-fill its pages from the OLD file's disk extent (the stale VMA matches
first), and repeated mmap_file+munmap cycles leaked VMA slots until the
8-slot table was exhausted (relevant to llmd-style reload loops; I5's bundle
fallback already consumes a slot per rejected generation).
Fix. processMunmap deactivates any file VMA overlapping the unmapped
range. A partial munmap drops demand paging for the whole VMA (materialized
pages stay mapped; untouched pages become fatal on access) — acceptable for the
map-whole/unmap-whole pattern and documented at the code site until a VMA split
is warranted.
Regression. /bin/mmapdemo gains an I6 section: /etc/motd via
mmap_file must match read(); after munmap, /etc/hostname mapped into the
recycled VA must show hostname bytes (I6-OK file munmap drops stale VMA);
then 12 map/unmap cycles prove slot recycling past the 8-slot table
(I6-OK file vma slots recycled). tests/mmap_test.sh asserts both markers;
llm_run_test and llm_serve_test re-validated on the patched kernel.
I7 — Ed25519; signed model-bundle manifests (DONE, 2026-06-09)
Primitive. kernel/crypto/ed25519.swift — RFC 8032 Ed25519 in pure,
Embedded-compatible Swift, self-contained like the other crypto files: field
arithmetic mod 2^255−19 on sixteen 16-bit limbs (the compact TweetNaCl shape,
rewritten in Swift), edwards25519 in extended coordinates, a constant-time
conditional-swap ladder, scalar reduction mod the group order. SHA-512 (RFC
8032's hash) added as kernel/crypto/sha512.swift. Every constant was
generated by exact integer arithmetic (SHA-512 K/H from prime roots; d,
base point, sqrt(−1), L from first principles) rather than transcribed — a
from-memory sqrt(−1) would in fact have been the wrong root. Test vectors were
fetched from rfc-editor.org, not recalled.
TDD. tests/ed25519_test.swift (host, make test): FIPS 180-4 SHA-512
vectors (python3-hashlib cross-checked), RFC 8032 §7.1 TEST 1/2/3/SHA(abc) —
public-key derivation and deterministic signatures byte-for-byte, verification,
tampered-R/tampered-S/tampered-message/cross-key rejection. All green on the
first run after one carry-propagation fix.
Signed bundles. The signature covers every manifest byte before the
[signature] table (appended last by the signer; modelManifestSignedRange
in modelbundle.swift is shared by the host tool and the target verifier).
tools/modelsign.swift (host): keygen / sign (strip + re-sign, idempotent)
/ verify. make base-image generates a dev keypair under models/
(gitignored), signs BOTH generation manifests — gen 2's signature is valid;
its payload hash is what fails, proving the layers act independently — and
ships the public key as the trust root at /etc/swos/model-signing.pub.
Policy. /bin/llmd loads the trust root at startup; when present,
manifests MUST carry a valid signature (generation N rejected (bad manifest signature) otherwise) and the accepted generation logs
verified (ed25519+sha256); without a trust root it stays in integrity-only
mode. The manifest deliberately does not carry the key. llm_serve_test.sh
asserts the trust-root marker and the dual-layer verification line.
Still future. Key rotation / multiple trust roots, signing the base image itself (the A/B story), and revocation.
I8 — signed base image (kernel is the root of trust) (DONE, 2026-06-09)
Scope. The packed base image is now signed, and the kernel refuses to mount
an unsigned or tampered one — the foundation of the A/B-image story. The kernel
itself is the trust anchor (loaded via -kernel or embedded in the EFI loader),
so a single compiled-in public key roots the whole userland.
Format (SWOSBASE v3). tools/packfs.swift gained a signed layout: 72-byte
entries (the v2 fields + a 32-byte per-file content SHA-256; directories carry
zeros) and a 64-byte Ed25519 signature over header|entries|strings sitting
between the string table and the payload. tools/basepack.swift, given the
image-signing seed, hashes each file and signs the metadata (closures keep
packfs.swift crypto-free; swpkg payloads stay v2). The Makefile mints a
dedicated IMAGE-signing keypair (distinct lifecycle from the model key) under
models/ and embeds its public half via kernel/security/trust_root.S
(.incbin build/image_trust_root.bin).
Kernel-grade crypto. kernel/crypto/{ed25519,sha512}.swift were rewritten
onto InlineArray (stack storage, the percpu.swift idiom) and stack temporary
allocations, so verification does no heap allocation beyond one message buffer —
safe on the 256 KiB bump heap at boot. Added a streaming Sha256Stream (also
InlineArray) so file content is hashed in 4 KiB chunks off virtio-blk with
bounded memory regardless of file size; the host test pins it to the one-shot
across tail/block-spanning sizes. ed25519/sha256/sha512 are now compiled into
the kernel image.
Two-layer verification. At mount, buildBaseFromDisk reads
header|entries|strings + the detached signature and ed25519Verifys against
image_trust_root BEFORE building a single vnode (base image signature verified (ed25519) on success; refuses the disk base otherwise). Then content
is verified lazily, once per file on first use: vfsOpen, vfsDiskImageExtent
(exec), and vfsFileExtent (mmap) all call vfsVerifyNodeContent, which
streams the extent and compares to the signed per-entry hash, caching the
result in the vnode. Fail-closed per file (a bad file returns EACCES; the OS
keeps running), not per boot.
Acceptance. tests/signed_image_test.sh (in make test): Case A flips a
byte in the signed metadata → mount refused (signature INVALID, no
mounted from disk); Case B flips a file payload byte → image mounts (metadata
intact) but cat /etc/motd trips content hash mismatch while the shell
survives. base_image_test.swift upgraded to v3 and re-hashes every entry;
boot_test asserts the mount-time signature marker. Verification cost is
negligible at boot (signature over ~5 KiB metadata; content hashed on first use).
Still future. Key rotation / multiple trust roots; signing the kernel image itself + an A/B boot manifest with rollback (loader/update-store territory); revocation.
A/B signed system updates (U-series)
Extends the trust chain (I5–I8: signed model bundles → signed base image, kernel as root of trust) toward ARCHITECTURE.md §"Persistent update store" + the "A/B image discipline" design value: two image slots + an atomic boot manifest, verified slot selection, rollback to the known-good slot. (The roadmap sequences A/B late in Phase 1; brought forward here as the trust-chain capstone — the Ed25519 primitive is ready. Storage-medium + scope forks confirmed with the maintainer: a dedicated writable virtio-blk disk, read-side first.)
U1a — A/B update store: verified slot selection + fallback (DONE, 2026-06-10)
Scope (read side of A/B). Select + verify + fall back. A persistent writable
virtio-blk "update store" disk carries a SWOSBOOT boot manifest + two slots,
each a full signed SWOSBASE-v3 base image. The kernel reads the manifest, picks
the active slot, mounts+verifies it via the unchanged I8 path, and rolls back to
the known-good fallback slot if the active image fails verification. Boot-state
write-back (attempt counter, health confirm, attempt-based rollback persisted
across reboots) is U1b; kernel-image A/B via the loader is U1c.
Format (SWOSBOOT v1). kernel/fs/swosboot.swift — an I/O-free, no-mutable-
global manifest core (parser + CRC32) shared by the kernel (Embedded), the host
builder, and the host test, like the crypto. One 512-byte sector, two copies
(LBA 0/1, double-buffered for U1b's torn-write-safe rewrite; reader picks the
valid copy with the highest sequence). Header {magic, version, slot_count=2,
active_slot, fallback_slot, sequence} + a 2-entry slot table {present, state,
base_lba, length_sectors, generation, attempt_count} + trailing CRC32 over
[0,508). Layout: manifest @ LBA 0–7, slot 0 image @ LBA 8, slot 1 after. CRC32 is
IEEE reflected (poly 0xEDB88320); the canonical check value crc32("123456789")
== 0xCBF43926 is pinned by the host test. Format documented in docs/UPDATE_STORE.md.
Trust boundary (deliberate). The manifest is CRC32-protected, NOT signed: the kernel holds only the public image key and so cannot sign the boot-state it writes at runtime (U1b). Sound because the manifest is not a trust anchor — it only selects among self-authenticating signed images. A store-disk attacker can at worst point "active" at the other (still-signed) slot or induce a boot loop: availability/DoS, never a code-integrity bypass (a forged image still fails Ed25519 at mount). Same posture the base-image disk already has.
Kernel. virtio_blk.swift gains a slot-relative read — blkBaseByteOffset
added to every virtioBlkReadRange, so the unchanged VFS mount/verify/exec/mmap
paths read the active slot transparently (a single choke point); the legacy
single-image disk keeps offset 0. blkFallbackByteOffset holds the known-good
slot, consumed once by virtioBlkUseFallbackBase(). virtioBlkInit now prefers
a SWOSBOOT store disk > a SWOSBASE base disk > the first device.
kernel/fs/updatestore.swift updateStoreInit() (called at the top of
vfsInit) reads both manifest copies, picks the active slot, sets the offsets,
logs the selection. vfsInit mounts the active slot via buildBaseFromDisk
(the I8 path); if it rejects the slot (bad signature/content), it calls
virtioBlkUseFallbackBase() and remounts the known-good slot. No virtio-blk
write yet (U1a is read-only; the disk is writable for U1b). Two new globals
(blkBaseByteOffset, blkFallbackByteOffset) added to docs/SMP_STATE_AUDIT.md —
set once at boot before EL0.
Host + test. tools/updatestore.swift builds the store (places two slot
images, writes the CRC'd manifest, self-parses to verify). tests/ updatestore_test.swift (host, in make test) pins the CRC32 check value +
round-trip + corruption rejection. tests/ab_update_test.sh (in make test):
Case A (active=A) → "active slot A", mounted, exec from slot; Case B (active=B, a
different LBA) → "active slot B", mounted, exec — proves manifest-driven
selection, not "always slot 0"; Case FB (active=B with tampered slot-B metadata)
→ slot B rejected ("base image signature INVALID"), "rolling back to fallback
slot", slot A mounted and serves /etc/motd to a working shell — verified fallback
over a persistent disk.
Gotcha caught. The interactive M7 tty demo gates the boot before login, so an
A/B-selection assertion must await a pre-login marker (mount markers / the tty
prompt), not "swift-os login:", unless it drives the tty. And await is a literal
substring match — the rollback marker is "...failed verification — rolling back to
fallback slot", so the awaited substring must not include a non-contiguous prefix.
Still future (U1b+). virtio-blk write; boot-attempt counter + health-confirm (capability-gated /bin/swos-confirm + syscall) + attempt-based rollback persisted across reboots; staging a new generation into the inactive slot + atomic active flip; kernel-image A/B via the loader (Ed25519 + EFI Block I/O); key rotation.
U1b — persistent boot-state: manifest write-back + boot-attempt counter (DONE, 2026-06-10)
Scope. The writable half U1a lacked: the virtio-blk write path + durable, atomic write-back of the SWOSBOOT manifest, used here to persist a per-slot boot-attempt counter across reboots. (The attempt-based rollback policy + health-confirm that consume this counter are U1c.)
kernel/drivers/virtio_blk.swift:blkDoWrite(VIRTIO_BLK_T_OUT; the data descriptor is device-READABLE — the device reads our bytes) + a one-sectorvirtioBlkWriteSector(sector, buf). Absolute sectors, NOT slot-relative: the manifest at LBA 0/1 lives outside the A/B image slots, so writes skipblkBaseByteOffset.kernel/fs/swosboot.swift:serializeSwosbootManifest— the exact inverse of the parser; the host test pins parse(serialize(m)) == m.kernel/fs/updatestore.swift: after selecting the active slot,updateStoreInitincrements that slot's attempt_count, bumpssequence, and writes the manifest to the OTHER double-buffer copy (torn-write safe — the reader picks the highest valid sequence, so an interrupted write leaves the prior copy intact). A CONFIRMED slot is skipped (forward-compat no-op until U1c sets that state). Marker: "update-store: recorded boot attempt N for active slot X".- No new top-level globals (the driver gained funcs + one
let); SMP audit unchanged at 173 entries.
Durability. A virtio-blk write completes when the device acks (polled used
ring). The acceptance test attaches the store with cache=writethrough so each
completed write is durable to the backing file even across an ungraceful kill.
(A virtio-blk FLUSH for durability without writethrough is future hardening.)
Acceptance. tests/ab_persist_test.sh (in make test): boots the SAME
writable store disk 3× and asserts the attempt counter increments 1→2→3 across
reboots — proving write + atomic double-buffered write-back + reboot persistence.
tests/updatestore_test.swift gains the serialize↔parse round-trip; U1a's
ab_update_test.sh still passes (write-back does not disturb selection/fallback).
Still future (U1c). Attempt-based rollback (switch active↔fallback when an unconfirmed slot exceeds a max-attempts threshold) + health-confirm (a capability-gated /bin/swos-confirm + syscall that marks the active slot CONFIRMED and resets attempts). Then U1d = kernel-image A/B via the loader.
U1c — health-confirm: /bin/swos-confirm pins a slot CONFIRMED (DONE, 2026-06-10)
Scope. The "confirm" half of the boot-state machine: an operator marks a freshly-activated slot healthy so it stops accruing boot attempts (and, once U1d lands, is never rolled back). Attempt-based rollback that consumes the counter is U1d.
- New syscall
SYS_UPDATE_CONFIRM(65), capConsole-gated, dispatched toupdateStoreConfirm()(kernel/fs/updatestore.swift): re-reads the manifest, marks the slot booted this session (tracked in the newupdateStoreActiveSlotglobal) CONFIRMED + resets its attempt_count, persists via the U1b double-buffered write-back. Bridge:syscall.hupdate_confirm()+swiftos_update_confirm(). /bin/swos-confirm(userland/swos-confirm.swift): calls it and prints the result; registered inexecResolve+ staged in the base image. capConsole means root can run it; a guest is refused (EPERM).updateStoreInitrefactored onto shared helpers (updateStoreReadChosen/updateStoreWriteBack), now shared with the confirm path; the selection log shows the slot state (untried/confirmed/failed). One new globalupdateStoreActiveSlot(SMP audit → 174).
Acceptance. tests/ab_confirm_test.sh (in make test): boot 1 drives to a
root shell and runs /bin/swos-confirm → "active slot confirmed healthy" + kernel
"slot A confirmed healthy"; boot 2 (same writable store) → the kernel sees
"active slot A gen 1 confirmed" and records NO new boot attempt. U1a/U1b A/B
tests + the legacy disk path are unaffected.
Still future (U1d). Attempt-based rollback (switch active↔fallback past a max-attempts threshold; mark the exhausted slot FAILED) + stage-into-inactive-slot
- atomic active flip; then kernel-image A/B via the loader (Ed25519 + EFI Block I/O).
U1d — attempt-based rollback: unconfirmed slot fails over (DONE, 2026-06-10)
Scope. Closes the "rollback on failed health check" loop. The counter (U1b)
- confirm (U1c) infrastructure is now driven by a policy: an active slot that is
not CONFIRMED and has reached
maxBootAttempts(=3) boot attempts is presumed unhealthy (it booted but the operator never ran /bin/swos-confirm). The kernel marks it FAILED, swaps active↔fallback in the manifest, and boots the known-good fallback — all persisted via the U1b double-buffered write-back.
kernel/fs/updatestore.swift:updateStoreInitgains the rollback decision before it commits to a slot —maxBootAttempts(alet, no new global). The write-back now persists both the rollback swap and the (new) active slot's attempt increment in one update. This is the "boots-but-never-confirmed" path; the BAD-IMAGE path (Ed25519/content verification failure) stays in vfsInit (U1a, immediate verified fallback at mount). Markers: "active slot X exhausted N attempts — rolling back to slot Y".- A CONFIRMED slot (U1c) is exempt — never counted, never rolled back. A FAILED slot is still a valid rollback target (if both slots are unconfirmable the system fails over back and forth until an operator confirms a good one — honest behavior; an availability concern, documented).
Acceptance. tests/ab_rollback_test.sh (in make test): boots the SAME
store (active=A, both valid, neither confirmed) 4×; slot A records attempts
1/2/3, then boot 4 exhausts them and rolls over to slot B (which records its own
first attempt) — persisted across the reboots. U1a–U1c A/B tests + the legacy
disk path are unaffected.
A/B story complete (read + write + confirm + rollback). Remaining is forward build-out: stage-into-inactive-slot + atomic active flip from a running system, and kernel-image A/B via the loader (Ed25519 + EFI Block I/O); hardening: virtio-blk FLUSH (durability without cache=writethrough).
U1e — promote the inactive slot: /bin/swos-activate (DONE, 2026-06-10)
Scope. The operator "promote" control — switch which slot boots next, from a running system. This is the activation/atomic-flip half of staging; writing a NEW image into the inactive slot (the data half) is a separate piece with a genuine fork (image source + multi-device virtio-blk), surfaced before it is built.
- New syscall
SYS_UPDATE_ACTIVATE(66), capConsole-gated ->updateStoreActivateOther()(kernel/fs/updatestore.swift): makes the inactive slot (1 − booted slot) the active slot, the current slot the fallback, marks the new active UNTRIED + attempts=0 (boots "on trial"), and persists via the U1b double-buffered write-back. ReusesupdateStoreReadChosen/updateStoreWriteBack. Bridge:syscall.hupdate_activate()+swiftos_update_activate(). /bin/swos-activate(userland/swos-activate.swift): calls it, prints the result; registered in execResolve + staged in the base image. root only (capConsole); guest EPERM.- No new globals (reuses
updateStoreActiveSlot); SMP audit unchanged (174).
Operator workflow now complete for slots that already hold images: activate
the inactive slot → reboot → it boots on trial → /bin/swos-confirm if healthy
(U1c), else attempt-based rollback returns to the fallback (U1d).
Acceptance. tests/ab_activate_test.sh (in make test): boot slot A, run
/bin/swos-activate from a shell → "activated slot B (on trial)"; reboot → slot
B is active, UNTRIED, records its first attempt. U1a–U1d + the legacy disk path
are unaffected.
Still future. Writing a new image into the inactive slot from a running
system (target-side swos-update) needs an image-source decision (read-only
payload disk vs network vs tmpfs) + multi-device virtio-blk; then kernel-image
A/B via the loader.
Fix — vfs_disk_test.sh red since I8 (signed base + sparse-disk S2b guard) (DONE, 2026-06-10)
tests/vfs_disk_test.sh had been failing since the I8 commit ("signed base
image"): I8 updated signed_image_test.sh / base_image_test.swift but not
this one. Two layered causes, two layered fixes — both confined to the test
fixture; no kernel, guard, or boot-path change.
1. Unsigned image refused at mount. The test packed its throwaway disk with
basepack <root> <img> (legacy v2, unsigned). Since I8 the kernel embeds an
image-signing trust root and buildBaseFromDisk refuses anything but signed v3
("unsigned base image refused — signed v3 required"), so it fell back to the
compiled-in literals (no real busybox) and the shell never started. Fix: sign
the disk with the same dev image key make base-image mints —
SEED="$ROOT/models/dev-image-signing.seed", require it, pass it as basepack's
4th arg. Also fixed the stale standalone-build fallback (line ~20): basepack now
needs tools/packfs.swift + the crypto sources, mirroring the Makefile
$(BASEPACK) rule.
2. S2b guard panic on the sparse no-console-login disk. Past the mount, the
boot hit panic: S2b secondary EL0 execution guard failed. The cause is not
an SMP bug: this sparse disk carries only busybox (no console-login/ttydemo), so
init runs the milestone EL0 demos straight through, and the post-userland S2b
guard (smpS2bNoSecondaryEl0Execution) requires CPU0 to have actually
dispatched an EL0 process — smpPerCpuEl0SwitchCount(primary) != 0. With every
demo binary missing from the disk ("demo: missing on disk /bin/…"), CPU0 ran
zero EL0 work and the guard tripped. The full-base boot paths (boot_test,
signed_image_test, ab_update_test) carry the demo binaries and pass. This
matches how the same guard's owner resolved it on the parallel SMP line (seed a
demo binary into the sparse disk). Fix: seed /bin/ps, the last demo before the
guard, so runPsDemo supplies the EL0 switch. The guard is correct as designed.
Verification. ./tests/vfs_disk_test.sh green (3/3 stable). No SMP
regression: ./tests/boot_test.sh and SMP_CPUS=4 ./tests/smp_boot_test.sh
both green (the latter logs "S2b OK: no secondary EL0 execution" at -smp 4).
Repo note. Applied on a branch off origin/main (a6391b1), where the bug
lives. The local main line had diverged (~100 commits of SMP S2c–S2h + package
work) and never received I8 signing, so the test already passed there — the fix
belongs on the I8 line.
U1f-1 — secondary read-only virtio-blk device (the A/B update payload) (DONE, 2026-06-10)
Scope. The multi-device foundation for staging a new image from a running system. The virtio-blk driver was single-device; U1f-1 lets it also see and read a second disk — the update payload (a signed SWOSBASE image) attached alongside the SWOSBOOT store. U1f-2 will copy that payload into the inactive slot.
kernel/drivers/virtio_blk.swift:virtioBlkInitnow scans ALL block devices and classifies each by sector-0 magic (store=SWOSBOOT, base/payload=SWOSBASE) instead of returning on the first store. When a store is selected, a separate SWOSBASE disk is recorded as the payload (blkPayloadDevice;blkStoreDevicekeeps the store index so we can return to it). AccessorsvirtioBlkHasPayload(),virtioBlkSelectPayload()(selects the payload and returns its capacity),virtioBlkReselectStore(). The hardware path is reused by selecting between the two disks — fine since I/O is serial on the one CPU. Two new globals indocs/SMP_STATE_AUDIT.md(set once at boot).kernel/fs/updatestore.swift:updateStorePayloadProbe()(called from vfsInit after updateStoreInit) reads the payload's sector-0 header through the secondary path and verifies it is a signed v3 SWOSBASE image, logging "update-store: update payload disk present, N sectors, signed v3 base image", then re-selects the store so the base mounts from it.
Acceptance. tests/ab_payload_test.sh (in make test): boot with the store
disk + base.img attached as a read-only payload; assert the payload is discovered
and read, AND the active slot still mounts from the store (the probe's device
re-selection does not disturb the base mount). U1a–U1e A/B tests + the legacy
disk path unaffected.
Still future (U1f-2). The stage copy: /bin/swos-update reads the payload
disk and writes it into the inactive slot, then the operator runs swos-activate
- reboots. Needs a chunked copy loop (read payload → write store slot) and, for acceptable speed on a multi-MB image under TCG, likely multi-sector virtio requests (the driver does one sector per request today).
Test-harness follow-up. The interactive to_shell serial drive (M7 tty +
login) intermittently drops a typed line on the emulated PL011 (~10-15%), seen
across all to_shell tests (ab_update_test, ab_confirm_test, signed_image_test).
ab_activate_test (U1e) fixed it with a settle + byte-by-byte send; the other
A/B tests still use whole-line printf and should be migrated to that pattern.
U1f-2a — multi-sector virtio-blk transfers (DONE, 2026-06-10)
Scope. The driver moved one 512-byte sector per virtio request, which makes
the U1f-2 stage copy of a multi-MB image untenably slow under TCG (thousands of
round trips). U1f-2a adds a variable-length data descriptor: one request now
transfers up to BLK_MULTI_SECTORS (128 = 64 KiB) consecutive sectors.
kernel/drivers/virtio_blk.swift: a contiguousBLK_MULTI_PAGES-page DMA region (blkMultiBase,pmm_alloc_pages, allocated once like the ring/data pages — one new global indocs/SMP_STATE_AUDIT.md).blkDoMulti(sector, count, write:)drives a header→data→status chain where the single data descriptor iscount*512bytes (device-writable for T_IN, device-readable for T_OUT). Public API:virtioBlkReadSectors/virtioBlkWriteSectors(copy in/ out of a caller buffer) and the no-copyvirtioBlkFillMulti/virtioBlkFlushMulti/virtioBlkMultiMaxfor U1f-2b's disk-to-disk stage copy (blkMultiBase survives a bring-up, so read-from-payload then write-to-store needs no intermediate kernel buffer).virtioBlkReadRange— which backs EVERY base-image read (mount, signature/content verify, ELF load, file-backed mmap) — now pulls whole sector runs per request, capped to the DMA region and capacity. The single-sectorblkDoRead/blkDoWrite(sector-0 classification, manifest LBA 0/1 write-back) are unchanged.
Acceptance. tests/multisector_test.sh (in make test): the multi-sector
read path is verified end-to-end by the base image's own cryptography — a single
misread byte fails one of three checks across chunk sizes: the signed Ed25519
metadata region, a small payload file (/etc/motd), and busybox.elf (~1.1 MB ≈ 18
of the 64 KiB chunks, loaded by execResolve in one virtioBlkReadRange — the ash
shell only launches if that large multi-chunk read is byte-exact). boot_test +
signed_image + the U1a–U1f-1 A/B suite unaffected (all base reads now flow
through the multi-sector path).
U1f-2b — the A/B stage copy: /bin/swos-update (DONE, 2026-06-10)
Scope. Close the staging loop: copy the attached read-only payload disk (U1f-1) into the inactive A/B slot from a running system, so an operator can then swos-activate + reboot onto the new image.
kernel/fs/updatestore.swift:updateStoreStagePayload()(syscall 67SYS_UPDATE_STAGE, capConsole-gated). Reads the chosen manifest, picks the inactive slot (1−booted), brings up the payload and reads its SWOSBASE header — requires a signed v3 image and computes its length (dataOffset@48 +payloadLen@56, rounded up to sectors). Rejects a payload that is truncated on its disk (> payload capacity, EINVAL) or larger than the slot'slength_sectors(EFBIG). Copies payload[0,N) → store[slotBaseLBA,+N) in 64 KiB runs via U1f-2a's no-copyvirtioBlkFillMulti/FlushMulti(read into the driver's DMA buffer from the payload, re-select the store, flush it out — no intermediate kernel buffer; serial on the one CPU). Then marks the slot present- UNTRIED, attempts 0, generation++, persisted via the U1b double-buffered write-back. Copies BYTES only — the staged image's own Ed25519 signature is verified at the NEXT boot's mount (unchanged I8 path), so a corrupt payload simply fails on trial and U1a/U1d return to the known-good slot. No new globals.
/bin/swos-update(userland/swos-update.swift, bridgeswiftos_update_stage/update_stage); registered in execResolve + the Makefile ELF/staging rules.
Acceptance. tests/ab_stage_test.sh (in make test): a store with a valid
active slot A and a deliberately CORRUPT slot B (a same-size copy of base.img with
a signed byte flipped — so it fits the payload exactly but fails verification) +
a valid payload disk. Boot A → shell → swos-update (stage) → swos-activate. Reboot
→ slot B is active AND its image now passes Ed25519 verification and mounts (no
"signature INVALID"): a clean verified mount of the once-corrupt slot proves the
stage copy wrote a valid image. The full operator update workflow is now complete:
swos-update → swos-activate → reboot on trial → swos-confirm (U1c) / rollback (U1d).
Still future. Kernel-image A/B via the loader (Ed25519 + EFI Block I/O);
virtio-blk FLUSH (durability without cache=writethrough); key rotation. Next
free syscall = 63.
U1h — virtio-blk FLUSH: durable boot-state writes (DONE, 2026-06-10)
Scope. Until now, durability of the manifest/stage writes relied on a host
cache=writethrough backend (forced in the A/B tests). U1h negotiates
VIRTIO_BLK_F_FLUSH and flushes the device write cache after each commit, so
boot-state survives a crash under a normal write-back cache.
kernel/drivers/virtio_blk.swift: bring-up now reads device-feature word 0 (R_DEVFEAT/R_DEVFEATSEL= 0x010/0x014) and acceptsVIRTIO_BLK_F_FLUSH(bit 9) when offered, recording it inblkFlushOK(one new SMP-audit global, set per bring-up; reflects the currently-bound device).blkDoFlush()issues aVIRTIO_BLK_T_FLUSH(type 4) request — a header(device-read)+status (device-write) chain, no data. PublicvirtioBlkFlush()(0 also when the device exposes no cache — the write is then already durable) andvirtioBlkFlushSupported().kernel/fs/updatestore.swift:updateStoreWriteBackflushes after the manifest sector write (treating a failed flush as a failed write-back, so a rejected FLUSH stalls rather than silently loses state);updateStoreStagePayloadflushes the staged slot data before the manifest is pointed at it (so a crash can never leave a committed manifest referencing half-written slot bytes).updateStoreInitlogs the durability mode ("write durability via virtio FLUSH").
Acceptance. tests/ab_flush_test.sh (in make test): boots the SAME store
with the default write-back cache (no cache=writethrough) and asserts the
FLUSH marker AND that the boot-attempt counter persists 1→2→3 — which also
verifies the flush request succeeds (a rejected FLUSH would fail the write-back
and stall the counter). Caveat: QEMU writes land in the host page cache, which
survives a kill, so this exercises the negotiate+flush+commit path under the
realistic cache mode but cannot simulate host power loss. boot_test, ab_persist
(writethrough path), and the rest of the A/B suite unaffected. No new syscalls.
U1g-1 — UEFI loader reads the kernel from an ESP file (DONE, 2026-06-10)
Scope. First slice of kernel-image A/B (U1g). The loader compiled the kernel
in as an embedded blob (kernel_blob.S .incbin), which cannot be A/B-staged on
disk. U1g-1 decouples the kernel image from the loader binary: the loader now
reads the kernel from a file on the ESP via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL.
Mechanism chosen with the maintainer: ESP file (Simple File System), not raw
Block I/O — lowest risk, the ESP is already FAT. Later slices add an A/B manifest
- second kernel image + Ed25519 verification.
boot/efi/efi.h: addedEFI_LOADED_IMAGE_PROTOCOL(to reach the boot volume'sDeviceHandle),EFI_SIMPLE_FILE_SYSTEM_PROTOCOL/EFI_FILE_PROTOCOL,EFI_FILE_INFO, the three GUIDs, and typedBootServices->HandleProtocol.boot/efi/loader.c:open_esp_kernel()(HandleProtocol(LoadedImage) → HandleProtocol(SimpleFileSystem) on its DeviceHandle → OpenVolume → Open\EFI\swift-os\kernel.bin→ GetInfo for the size) andread_file_into()(a Read loop, since the File protocol may return short).efi_mainopens the file to learn its size, reserves the right number of pages atKERNEL_LOAD_ADDR, reads it in, and logs "UEFI: kernel loaded from ESP file N bytes". The embedded blob stays as a fallback (file absent/unreadable → "using embedded blob"), so the boot path is never less robust than before.Makefilestageskernel.bintobuild/esp/EFI/swift-os/kernel.bin;scripts/make-disk.shcopies it into the real GPT ESP (::/EFI/swift-os/).
Gotcha caught. First run fell back to the blob: GetInfo needs the full
EFI_FILE_INFO (80-byte prefix + the CHAR16 file name), so an 88-byte buffer
returned EFI_BUFFER_TOO_SMALL — bumped to 512.
Acceptance. tests/uefi_boot_test.sh (in make test, disk + SMP-4 variants)
now also asserts "UEFI: kernel loaded from ESP file"; the kernel boots all the way
to busybox from the ESP-loaded image (single-core and -smp 4). The embedded-blob
fallback keeps the path safe if the file is ever missing.
Still future (U1g-2/3). A kernel A/B manifest on the ESP + a second kernel image + slot selection; then Ed25519 verification of the selected kernel against the compiled-in trust root.
U1g-2 — kernel A/B manifest + slot selection on the ESP (DONE, 2026-06-10)
Scope. Second slice of kernel-image A/B. The loader now reads a small boot manifest from the ESP and chooses between two kernel slots, falling back to the other when the active slot's file is missing/unopenable.
- SWOSKERN manifest (
\EFI\swift-os\kernel-boot, 24 bytes LE): magic "SWOSKERN", version=1, active(0/1), fallback(0/1), generation. Host-authored at image build for now (no CRC; a CRC + double-buffering, like SWOSBOOT, come once the OS writes it at runtime). Two slot images:kernelA.bin/kernelB.bin. boot/efi/loader.c:open_esp_kernelgeneralized toopen_esp_file(path,…);read_kernel_manifest()parses+validates the manifest;efi_mainselects the active slot's path, and if it won't open and a distinct fallback exists, rolls back to the fallback slot (logs "rolling back to slot X"). Logs the active slot ("kernel A/B manifest active slot B gen N") and the slot actually booted ("booted kernel slot A/B"). No manifest → defaults to slot A. The embedded blob remains the final fallback. The generic "kernel loaded from ESP file N bytes" line is kept (so uefi_boot_test still asserts it).tools/kernelboot.swift: host generator (kernelboot <out> A|B [gen]).Makefile/scripts/make-disk.sh: stagekernelA.bin,kernelB.bin, and an active-Akernel-bootinto both the virtual-FAT ESP and the GPT image.
Acceptance. tests/uefi_kernel_ab_test.sh (in make test) edits ESP copies
of the GPT image with mtools: (1) active=B → loader reports "active slot B" +
"booted kernel slot B" and the kernel boots from slot B; (2) active=B but
kernelB.bin deleted → loader rolls back to slot A, "booted kernel slot A", boots.
uefi_boot_test (default active-A manifest) still boots to busybox, single-core and
-smp 4.
Still future (U1g-3). Ed25519 verification of the selected kernel against the
compiled-in trust root (kernel/security/trust_root.S), so a tampered/garbage
slot is rejected at load and triggers fallback — the kernel-image analogue of the
base-image signature check.
U1g-3a — kernel slot SHA-256 integrity verification (DONE, 2026-06-10)
Scope. The loader could select an A/B slot but not tell a corrupt/truncated kernel from a good one (a bad image just crashed after the jump). U1g-3a adds a SHA-256 integrity check: the manifest carries each slot's hash, the loader hashes the loaded image and rejects a mismatch, rolling back to the other slot — the same verify-then-fallback shape as the base-image content check. This is integrity (catches corruption), NOT yet authenticity; the manifest is still unsigned, so a tamperer who rewrites the slot can rewrite its hash too. Authenticity is U1g-3b (Ed25519 over the manifest/kernel).
boot/efi/loader_sha256.h: header-only FIPS 180-4 SHA-256 (the loader has no libc/crypto). Host-tested so the exact code is trusted.- SWOSKERN manifest v2: appends
slotA_size+slotA_sha256(off 24/32) andslotB_size+slotB_sha256(off 64/72); 104 bytes. v1 (no hashes) still parses. boot/efi/loader.c:load_slot(slot, expect_hash)opens→allocates→reads→(if a hash is given) SHA-256-verifies intoKERNEL_LOAD_ADDR, freeing its pages and returning 0 on any failure (missing file, alloc fail, OR hash mismatch — logs "kernel slot X FAILED integrity check (sha256)").efi_maintries the active slot, rolls back to the other on failure, then the embedded blob.FreePagestyped in efi.h so a rejected slot's pages are reclaimed before the retry.tools/kernelboot.swiftv2: reads both kernel files, embeds their SHA-256 (hostkernel/crypto/sha256.swift); now@main(multi-file build disallows top-level code). Makefile/make-disk stage the v2 manifest computed overkernel.bin.
Acceptance. tests/loader_sha256_test.c (host, in make test) checks the C
SHA-256 against FIPS 180-4 vectors. tests/uefi_kernel_ab_test.sh gains a third
case: corrupt kernelA.bin (byte-flipped, so its hash ≠ the manifest's) with
active=A → loader logs the slot-A integrity failure and boots the valid slot B.
Plus the existing active-B and missing-slot cases. uefi_boot_test (default
active-A, now SHA-256-verified) still boots to busybox, single-core and -smp 4.
Gotcha. kernelboot.swift compiled fine standalone but broke once
sha256.swift was added to the build ("expressions are not allowed at the top
level") — a multi-file Swift module needs @main, not top-level statements.
Still future (U1g-3b). Ed25519 signature over the manifest (or the kernel
images) verified against the compiled-in trust root, for authenticity — needs
Ed25519+SHA-512 in the loader (C port of kernel/crypto/{ed25519,sha512}.swift).
U1g-3b — kernel manifest Ed25519 authenticity in the loader (DONE, 2026-06-10)
Scope. U1g-3a gave integrity (a corrupt slot is caught) but not authenticity (the manifest was unsigned, so a tamperer who rewrites a slot can rewrite its hash). U1g-3b signs the manifest and has the loader verify it against the compiled-in image-signing key — the kernel-image analogue of I8's signed base image. This completes the kernel-A/B trust chain: a manifest is honored only with a valid signature; otherwise the loader boots its own embedded blob (never an attacker-chosen slot).
boot/efi/loader_ed25519.h: header-only SHA-512 + Ed25519 verify (RFC 8032), the compact TweetNaCl shape ported from the testedkernel/crypto/{ed25519, sha512}.swiftwith curve constants copied verbatim. Host-tested.boot/efi/efi_pubkey.S: incbinsbuild/image_trust_root.bin(the same image-signing pubkey the kernel embeds) asefi_image_signing_pubkey.- SWOSKERN manifest v3: appends a 64-byte Ed25519 signature over the 104-byte
body (168 bytes).
read_kernel_manifestreturns "trusted" only for v3 with a valid signature; v1/v2 (unsigned) and bad-signature manifests are refused ("kernel manifest signature INVALID" / "unsigned … ignoring").efi_mainboots the embedded blob when there is no trusted manifest. Integrity (U1g-3a) then runs within the trusted manifest, so authenticity + integrity are layered. tools/kernelboot.swiftv3: signs the body with the image-signing seed (hosted25519Sign). Makefile passes$(IMG_SIGNING_SEED); the loader linksefi_pubkey.obj.
Acceptance. tests/loader_ed25519_test.c (host, in make test) checks the C
verify against RFC 8032 §7.1 vectors and that it rejects a tampered sig/message.
tests/uefi_kernel_ab_test.sh gains a fourth case: a byte flipped in the
manifest's signature → loader logs "signature INVALID" and boots the embedded
blob. The active-B, missing-slot, and SHA-256-mismatch cases now run against
signed v3 manifests. uefi_boot_test (default disk, signed v3) verifies the
signature and boots slot A to busybox, single-core and -smp 4 — an end-to-end
check that the loader's embedded pubkey matches the signing key.
Note. Verify-only in the loader; signing stays host-side (Swift). The manifest is still single-copy/no-CRC — runtime writes (CRC + double-buffering) come when the OS can flip the kernel slot. The kernel-image A/B trust chain (sign → verify → integrity → fallback) is now complete.
U1g-4a — kernel reaches + parses the ESP (GPT) boot disk (DONE, 2026-06-10)
Scope. First slice of runtime kernel staging (the kernel analogue of U1f's stage/activate). For the OS to stage a new kernel it must reach the ESP the loader boots from. Two findings shaped this:
- Transport. The ESP/GPT disk was attached
if=virtio= virtio-PCI on-M virt; the kernel drives only virtio-mmio, so it never saw the ESP. Verified AAVMF boots fine from a virtio-mmio disk, so the boot configs now attach the ESP disk on mmio (if=none,id=esp+-device virtio-blk-device) — both firmware and kernel can drive it. - Trust model (decided). Runtime staging will follow U1f's courier model: the OS writes pre-signed-offline artifacts (kernel image + signed manifest); it never signs. (The signed-manifest-vs-writable-selection split is a later slice.)
kernel/drivers/virtio_blk.swift: the device scan now also recognizes a GPT disk by the "EFI PART" magic at LBA 1 (blkBounceIsEfiPart), recording it asblkEspDevice;blkServedDevicetracks the base/store device. AccessorsvirtioBlkHasEsp(),virtioBlkSelectEsp()(selects ESP, returns capacity),virtioBlkReselectServed(). Two new SMP-audit globals.kernel/fs/esp.swift:espProbe()(called aftervfsInit) selects the ESP disk, parses the GPT header (LBA 1) + partition array, finds the ESP-type-GUID partition, logs "kernel-store: ESP partition found at LBA N, M sectors", then re-selects the served disk. Read-only; no mutable globals.- Boot configs (Makefile UEFI flags, disk-run, run-gfx; uefi tests) moved the ESP disk to virtio-mmio.
Acceptance. tests/uefi_boot_test.sh (disk + SMP-4) now asserts the kernel
locates the ESP partition, and still boots to busybox. uefi_kernel_ab_test.sh
(4 cases) unchanged in behavior with the ESP on mmio.
Then (U1g-4b/c/d + U1g-5). FAT32 read/write, activation, and the signed-manifest/writable-boot-state split landed in later U1g slices.
U1g-4b — kernel FAT32 reader: read the kernel A/B manifest from the ESP (DONE, 2026-06-10)
Scope. With the ESP reachable (U1g-4a), the kernel now reads the loader's kernel A/B manifest off the FAT32 ESP — the read half of runtime staging, and the groundwork for the FAT32 writer (U1g-4c).
kernel/fs/esp.swift: a minimal read-only FAT32 inFat32Vol+ helpers —fatReadBPB(BPB at the partition's first sector: bytes/sec must be 512, sec/clus, reserved, #FATs, FATSz32, rootClus → firstDataSector),fatClusterLBA,fatNext(FAT32 chain lookup), andfatFindChild(directory walk matching a path component against the assembled LFN long name OR the reconstructed 8.3 short name, case-insensitively — so it finds "EFI" (8.3), "swift-os" (lowercase 8.3), and "kernel-boot" (LFN, short name "KERNEL~1") robustly).fatReadKernelManifestwalks\EFI\swift-os\kernel-boot, reads the manifest's first sector, validates "SWOSKERN", and returns the active slot + generation.espProbenow logs "kernel-store: ESP kernel A/B active slot A gen N (read from FAT32)". All InlineArray/stack scratch — no heap on the boot path.
Acceptance. tests/uefi_boot_test.sh (disk + SMP-4) now also asserts the
kernel reads the manifest from FAT32 and reports active slot A — so the BPB,
cluster chain, LFN directory walk, and manifest parse are all exercised end-to-end
(the value must match what the loader independently read and booted). The 4-case
uefi_kernel_ab_test is unaffected (the kernel read is log-only).
Then (U1g-4c/d + U1g-5). FAT32 write, activation, attempts, rollback, health confirmation, and mutable boot-state active selection landed in later U1g slices.
U1g-4c — kernel FAT32 writer: stage the inactive slot image (DONE, 2026-06-10)
Scope. The write half of runtime kernel staging: the kernel writes the inactive kernel slot on the FAT32 ESP. Kept deliberately safe — an in-place copy of the active slot's image into the inactive slot (the two files are the same size, so only data sectors are overwritten; no cluster allocation, FAT, or directory changes). A buggy write can only spoil the inactive slot, which the loader's SHA-256 check (U1g-3a) then rejects, falling back to the still-good active slot — so the bootable slot is never at risk.
kernel/fs/esp.swift:fatCopyChainwalks the src (active) and dst (inactive) cluster chains in lockstep, copying sector-by-sector (virtioBlkRead→virtioBlkWriteSector), thenvirtioBlkFlush;fatVerifyChainre-reads both chains and confirms every sector matches (so a no-op write fails the verify).espStageActiveToInactive()(capConsole) finds kernelA/B.bin + reads the manifest's active slot, requires equal sizes, copies active→inactive, flushes, verifies. Logs "kernel-store: staged active slot image into inactive slot, verified (FAT32)".- Syscall 68
SYS_KERNEL_STAGE;/bin/swos-kstage(userland/swos-kstage.swift, bridgeswiftos_kernel_stage/kernel_stage); registered in execResolve + the Makefile ELF/staging rules.
Acceptance. tests/uefi_kstage_test.sh (in make test): a disk copy whose
inactive slot B is a byte-flipped (same-size) copy of the kernel; boot under AAVMF
(ESP on mmio), reach a root shell, run /bin/swos-kstage. The kernel copies
slot A over slot B and verifies — which only passes if the write landed (a no-op
would leave B corrupt and fail the in-kernel verify). Proves the FAT32 write path
end-to-end without touching the bootable slot or the manifest.
Then (U1g-4d/U1g-5). The first activate path used a pre-signed courier
manifest; U1g-5 later moved attempts, health confirmation, and mutable active
selection into kernel-state.
U1g-4d — runtime kernel-slot activate: /bin/swos-kactivate (DONE, 2026-06-10)
Scope. The capstone of runtime kernel staging: flip the active kernel slot from a running system, persisted, so the loader boots the newly-activated slot. Because the OS cannot sign, it follows the courier model — it installs an offline-signed alternate manifest rather than producing one.
- A second manifest
\EFI\swift-os\kernel-boot-alt(active = slot B) is generated bykernelbootat image build, signed with the image-signing key (Makefile + make-disk stage it alongsidekernel-boot). kernel/fs/esp.swift:espActivateOtherKernel()(capConsole) reads the livekernel-bootandkernel-boot-altactive slots, requires the alternate to select the other slot, then copies the alternate's manifest sector overkernel-bootin place (virtioBlkWriteSector+virtioBlkFlush) and re-reads to confirm. Logs "kernel-store: activated kernel slot B for next boot (signed manifest)". No new globals.- Syscall 69
SYS_KERNEL_ACTIVATE;/bin/swos-kactivate(userland/swos-kactivate.swift, bridgeswiftos_kernel_activate/kernel_activate); execResolve + Makefile rules.
Acceptance. tests/uefi_kactivate_test.sh (in make test): boot the disk copy
(active A), reach a root shell, run /bin/swos-kactivate; reboot the SAME disk
(cache=writethrough) → the loader logs "kernel A/B manifest active slot B
(signature OK)" and "booted kernel slot B", with no "signature INVALID" — proving
the flip persisted and the offline signature held.
Superseded by U1g-5d. This courier-manifest flow proved activation end to
end, then U1g-5d moved mutable active selection into the writable boot-state so
activation no longer needs kernel-boot-alt.
Still future. A real new-kernel payload source (today both slots are the same build; staging a genuinely different signed kernel needs a payload disk or an update channel) plus key rotation / revocation.
U1g-5a — loader boot-attempt counter on the ESP (DONE, 2026-06-10)
Scope. First slice of kernel attempt-based rollback (the U1b analogue). The
loader gains its first ESP write: a per-slot boot-attempt counter in a writable,
hash-protected kernel-state file, persisted across reboots. This is the
"writable boot-state" half of the signed-selection split — the kernel images stay
independently signed/hashed, so the boot-state need not be (its SHA-256 only
guards against torn/garbage writes, like SWOSBOOT's CRC).
boot/efi/loader.c: the loader boot-state helpers open\EFI\swift-os\kernel-statewith READ|WRITE|CREATE (EFI File protocol), reads + validates the 512-byte record ("SWOSKSTA", version, seq, attemptA/B, stateA/B, lastBooted, SHA-256 over [0,480)), re-initializes it if absent/corrupt, increments the booted slot's attempt + seq, rehashes, and writes it back (Close flushes). Self-managed — no build/disk staging needed; the loader creates it on first boot. Best-effort: a write failure logs but never blocks boot. Uses the existingloader_sha256.h.efi.hgainsEFI_FILE_MODE_WRITE/CREATE.
Acceptance. tests/uefi_kattempt_test.sh (in make test): boots the SAME
writable disk copy three times under AAVMF (ESP on mmio, cache=writethrough) and
asserts the active slot's counter increments 1→2→3 across reboots — proving the
loader's EFI write lands and persists. The signed manifest (v3) is untouched, so
the existing kernel-A/B tests are unaffected.
Then. U1g-5d moves active into the writable boot-state so activate no
longer needs a pre-signed alternate manifest.
U1g-5b — attempt-based kernel rollback in the loader (DONE, 2026-06-10)
Scope. The U1d analogue for the kernel. The loader now uses the U1g-5a boot-attempt counter to fail over: an unconfirmed active slot that has exhausted its attempts is presumed unhealthy ("boots but never confirmed"), so the loader boots the other slot instead and marks the original FAILED — persisted in the writable boot-state.
boot/efi/loader.c:loader_bump_attemptrefactored intoloader_open_kstateloader_read_kstate(validates / re-inits) +loader_write_kstate(rehash + write).efi_mainreads the kernel-state before loading; if the manifest's active slot is notCONFIRMEDandattempt >= KS_MAX_ATTEMPTS(3) and a distinct fallback exists, it tries the fallback first (logs "kernel slot A unconfirmed after N attempts, rolling back to slot B"), marks the active slotFAILED, and counts the booted slot's attempt. The existing per-slot SHA-256 load + hash-failure fallback is preserved (a bad active image still fails over too). A CONFIRMED slot stops counting (the U1c hook, used in 5c).
Acceptance. tests/uefi_krollback_test.sh (in make test): boot the same
disk copy 4× (ESP on mmio, cache=writethrough); boots 1–3 record attempts
1/2/3 for the unconfirmed slot A, and boot 4 fails over to slot B ("rolling back
to slot B" + "booted kernel slot B" + the kernel starts). uefi_kattempt_test
(3 boots, no rollback) and the signed kernel-A/B tests are unaffected.
U1g-5c — kernel-slot health confirm from userland (DONE, 2026-06-10)
Scope. The U1c analogue for the ESP kernel A/B path. The loader records the
slot it actually booted in the writable kernel-state; /bin/swos-kconfirm
marks that slot CONFIRMED, resets its attempt counter, rehashes the record, and
flushes the FAT32 ESP write. A confirmed slot stops accruing attempts and is not
rolled back by U1g-5b.
boot/efi/loader.c: the boot-state offset 32 is nowlastBooted;efi_mainstoresloaded_slotthere whenever an ESP slot is booted.kernel/fs/esp.swift: addsespConfirmBootedKernel()and the in-place kernel-state read/validate/rehash/write path.- Syscall 70
SYS_KERNEL_CONFIRM;/bin/swos-kconfirm(userland/swos-kconfirm.swift, bridgeswiftos_kernel_confirm/kernel_confirm); Makefile stages it into the base image.
Acceptance. tests/uefi_kconfirm_test.sh (in make test): boot the disk copy
to a root shell, run /bin/swos-kconfirm, then reboot the same writable ESP copy
three more times. The loader stays on slot A, reports boot attempt 0 each time,
and never rolls back.
U1g-5d — mutable kernel active slot in kernel-state (DONE, 2026-06-10)
Scope. Retire kernel-boot-alt. The signed kernel-boot manifest now
authenticates slot sizes/hashes and provides a default active slot; the mutable
active slot lives in the loader-managed, hash-protected kernel-state record.
This matches the SWOSBOOT split: signed immutable bytes plus writable health and
selection state.
boot/efi/loader.c: kernel-state layout addsactiveat offset 36. The loader verifies the signed manifest, resolves active from kernel-state when valid, logs the boot-state active slot, and persists the actually booted slot as active after successful load/rollback.kernel/fs/esp.swift:espActivateOtherKernel()no longer readskernel-boot-altor rewrites the signed manifest. It validateskernel-state, flips active to the other slot, resets that slot to UNTRIED/attempt 0, clearslastBooted, rehashes, writes, flushes, and verifies the sector.Makefileandscripts/make-disk.sh: stop generating/copyingkernel-boot-alt;make uefiremoves stale staged copies.
Acceptance. tests/uefi_kactivate_test.sh now asserts the disk image has no
kernel-boot-alt; after /bin/swos-kactivate, the next boot still reports the
signed manifest default active slot A, then reports kernel-state active slot B
and boots slot B. uefi_kattempt_test, uefi_kconfirm_test, and
uefi_krollback_test cover the adjacent boot-state flows.
Still future. A real new-kernel payload source and key rotation / revocation remain separate follow-ups.
NPM1 — newlib pthread facade probe (DONE, 2026-06-11)
Scope. Add the first C/newlib pthread compatibility slice required by the
Node.js/npm/pm2 runtime track. The facade stays on SwiftOS primitives:
pthread_create maps to thread_create, join/mutex/condition variables/once
use futex waits and wakes, and pthread-specific data is keyed by the current
SwiftOS thread id.
userland/compat/pthread.h: exposes the pthread declarations for compat builds without requiring each port to pass_POSIX_THREADS.userland/compat/stubs.c: implements the weak pthread symbols plus mmap-backed default stacks for newlib-linked C programs./bin/pthreadprobe: proves create/join return values, static mutex init, trylock contention, condition-variable wait/signal,pthread_once, and thread-specific data.
Acceptance. make pthread-test, make docs-test, make clock-test,
make mprotect-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM2 — newlib select/pselect facade probe (DONE, 2026-06-11)
Scope. Add a POSIX select/pselect surface for C runtimes that expect an
fd-set event primitive. The implementation translates fd_set inputs into the
existing SwiftOS poll syscall, maps readiness back into the read/write/except
sets, handles timeout-only calls, and preserves EBADF for invalid descriptors.
userland/compat/stubs.c: implements weakselectandpselectwrappers overSYS_POLL./bin/selectprobe: proves empty-read timeout, pipe read readiness after a write, pselect write readiness, andselect(0, ..., timeout).
Acceptance. make select-test, make docs-test, make pthread-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM3 — newlib fd-flag and socket facade probe (DONE, 2026-06-11)
Scope. Tighten the C/newlib fd and network surface that libuv-shaped
runtimes expect. The compat layer now applies SOCK_NONBLOCK and
SOCK_CLOEXEC on socket, exposes accept4, exposes pipe2, and relies on
the existing fcntl syscall for descriptor/status flag storage. Kernel pipe
read/write now honor O_NONBLOCK with EAGAIN, and the newlib _read,
_write, _close, and _lseek bottom-end stubs translate negative SwiftOS
errors to -1 plus errno.
userland/compat/unistd.h: declarespipe2while preserving the sysrootunistd.hthroughinclude_next.userland/compat/sys/socket.h: declaresaccept4beside the existing socket facade./bin/socketprobe: provespipe2(O_NONBLOCK | O_CLOEXEC),socket(... SOCK_NONBLOCK | SOCK_CLOEXEC ...), guest TCP client exchange, and guest TCP server exchange throughaccept4.
Acceptance. make socket-test, make docs-test, make select-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM4 — newlib eventfd facade probe (DONE, 2026-06-11)
Scope. Add an event notification primitive for libuv-shaped runtimes on the
Node.js/npm/pm2 track. SwiftOS exposes its own eventfd syscall (71), backed by
a fixed VFS event-counter table and ordinary typed handles. The C compatibility
layer provides POSIX-shaped eventfd, eventfd_read, eventfd_write, and
sys/eventfd.h; this is source compatibility, not Linux syscall ABI
compatibility.
kernel/vfs/handle.swift: adds.eventas a typed handle kind.kernel/vfs/vfs.swift: adds event counters, blocking/nonblocking 8-byte read/write semantics,EFD_SEMAPHORE,EFD_CLOEXEC, fstat shape, andpollreadiness.selectinherits readiness through the existing newlib facade./bin/eventfdprobe: proves flags, empty nonblockingEAGAIN, counter poll/read behavior, semaphore reads, and select readiness.
Acceptance. make eventfd-test, make docs-test, make select-test,
make socket-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM5 — newlib signal lifecycle probe (DONE, 2026-06-11)
Scope. Add the first pid-targeted signal lifecycle slice required by the
Node.js/npm/pm2 track. SwiftOS now supports positive-PID kill(pid, 0) probes,
default/ignored signal dispositions through sigaction, signal, and raise,
and kill(child, SIGTERM) termination with waitpid reporting signaled status.
This is still source compatibility, not a complete POSIX signal subsystem:
process groups, blocked-syscall interruption, masks, userspace signal frames,
and libuv signal watchers remain future work.
kernel/user/process.swift: adds pid-aware process termination for nonrunning targets and safely removes ready targets from the EL0 run queue before zombifying them.kernel/signal/signal.swift: tracks SIGTERM alongside SIGINT/SIGPIPE and exposes disposition lookup for process lifecycle control.userland/compat/stubs.c: mapskill,signal,raise, andsigactiononto the SwiftOS syscall ABI with POSIX-styleerrnobehavior./bin/signalprobe: proveskill(getpid(), 0), missing-pidESRCH, SIGTERM ignore/restore old dispositions, child SIGTERM termination, andwaitpidsignaled status.
Acceptance. make signal-test, make docs-test, make eventfd-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM6 — newlib thread synchronization probe (DONE, 2026-06-11)
Scope. Extend the C/newlib thread-runtime slice for libuv-shaped runtimes. SwiftOS now exposes POSIX-shaped unnamed semaphores and pthread read/write locks over the existing futex syscall. This closes a concrete class of Node.js/libuv threading primitives while leaving a full upstream libuv thread audit as future work.
userland/compat/semaphore.h: addssem_tand POSIX semaphore declarations missing from the bare-metal newlib sysroot.userland/compat/pthread.h: enables the newlib reader/writer lock type and prototypes for compat builds.userland/compat/stubs.c: implementssem_init,sem_wait,sem_trywait,sem_timedwait,sem_post,sem_getvalue,pthread_rwlock_*, and rwlock attrs using atomic words plusSYS_FUTEX./bin/threadsyncprobe: proves semaphore gate behavior, timeout reporting, writer exclusion, and concurrent readers under a pthread rwlock.
Acceptance. make threadsync-test, make docs-test, make pthread-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM7 — newlib large mmap probe (DONE, 2026-06-11)
Scope. Add an executable C/newlib proof for the large-mapping slice needed
by Node.js/V8-shaped runtimes before the heavier lazy-reservation design work.
SwiftOS already has an eager anonymous mmap arena; this milestone proves a
multi-MiB mapping can be zero-filled, touched across every page, partially
mprotected, partially unmapped, and reused without corrupting the remaining
live range. It does not claim V8-style overcommit/reserve semantics; the Node.js
catalog blocker is now the narrower lazy mmap reservation policy.
/bin/largemmapprobe: maps 8 MiB through newlibmmap, verifies zero-fill and strided write/read across every page, flips one page RW->RX->RW withmprotect, unmaps the bottom 4 MiB, verifies the next 1 MiB mapping lands in the freed bottom half, and confirms the still-live upper half retained data.make largemmap-test: boots QEMU, logs in, runs the probe, and asserts the large-mmap markers.- Port metadata now records the remaining Node.js memory blocker as lazy mmap reservation policy rather than generic large mmap support.
Acceptance. make largemmap-test, make docs-test, make mprotect-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM8 — anonymous mmap reservation/commit probe (DONE, 2026-06-11)
Scope. Add the first lazy anonymous mmap reservation contract needed by
V8-shaped runtimes. SwiftOS now accepts mmap(PROT_NONE) as virtual-address
reservation without resident frames. mprotect inside that reservation commits
missing pages for readable/writable/executable protections, and
mprotect(PROT_NONE) decommits live pages while preserving the reserved VA.
W^X remains enforced: RWX is still rejected. This resolved the generic lazy
reservation blocker and left the narrower MAP_FIXED/guard-page audit for NPM9.
kernel/user/process.swift: adds per-process anonymous VMA tracking copied across fork/thread creation and reset across exec. The process layer owns PROT_NONE reservation/decommit;kernel/mm/vm.swiftstill owns real leaf mapping and W^X enforcement.userland/compat/sys/mman.handuserland/lib/syscall.h: defineMAP_NORESERVEfor source compatibility. The flag is accepted by the wrapper; the reservation behavior is driven byPROT_NONE./bin/mmapreserveprobe: reserves 16 MiB withPROT_NONE|MAP_NORESERVE, commits a 1 MiB middle window, verifies zero-fill and writes, decommits and recommits it, proves zero-fill again, then commits a reserved JIT page RW->RX and executes it.
Acceptance. make mmapreserve-test, make docs-test,
make ports-catalog-test, ./tests/mmap_test.sh, make mprotect-test,
make largemmap-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM9 — fixed-address mmap guard-page probe (DONE, 2026-06-11)
Scope. Add the fixed-address anonymous mmap contract needed by V8-style
reserved arenas. SwiftOS now passes addr and flags through the C mmap
wrapper. Without MAP_FIXED, addr remains only a hint and the descending
mmap arena chooses the address. With MAP_FIXED, the kernel may replace pages
inside an existing anonymous reservation; MAP_FIXED_NOREPLACE fails with
EEXIST when the target overlaps a reservation or live mapping. Arbitrary
sparse fixed mappings outside an anonymous reservation remain deliberately
unsupported.
kernel/user/process.swift: accepts fixed-address anonymous mappings inside an existing anonymous VMA, decommits replaced live pages before remapping, and preserves W^X before any destructive replacement.userland/lib/syscall.h,userland/compat/sys/mman.h, anduserland/compat/stubs.c: exposeMAP_FIXEDandMAP_FIXED_NOREPLACEand pass mmap flags through the SwiftOS syscall ABI./bin/mapfixedprobe: reserves a PROT_NONE arena, fixed-maps an interior RW window, provesMAP_FIXED_NOREPLACEoverlap rejection, proves MAP_FIXED replacement zero-fill, recommits a guard page, executes a fixed-region RW->RX JIT page, and verifies fixed RWX remains rejected.
Acceptance. make mapfixed-test, make docs-test,
make ports-catalog-test, ./tests/mmap_test.sh, make mmapreserve-test,
make mprotect-test, make largemmap-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM10 — current-process signal handler frame probe (DONE, 2026-06-11)
Scope. Add the smallest tested signal-frame slice needed by
Node.js/npm/pm2-shaped runtimes. SwiftOS now delivers current-process custom C
handlers at syscall-return safe points by building a user-stack signal frame,
entering the registered handler at EL0, and restoring the interrupted trap frame
through a compat sigreturn trampoline. This closes the Node.js catalog blocker
for signal handler frames. Full libuv signal watcher semantics remain future
work: signal masks, process groups, blocked-syscall interruption, and remote
async custom-handler delivery are still not implemented.
kernel/signal/signal.swift: tracks a userspace restorer per disposition and delivers pending custom handlers only when a syscall-return trap frame is available.kernel/user/process.swift: stores/restores a kernel-built user signal frame, guards one active frame per process slot, and resets frame state across fork/thread creation, exec, and reap.kernel/syscall/syscall.swift,userland/lib/syscall.h, anduserland/compat/stubs.c: add SwiftOSsigreturnnumber 76 and pass a compat restorer trampoline withsigaction./bin/signalprobe: now proves custom SIGTERM handler delivery viaraise,sigreturnframe restore, and old-handler reporting before the child termination status checks.
Acceptance. make signal-test, make docs-test,
make ports-catalog-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM11 — libuv async eventfd wake probe (DONE, 2026-06-11)
Scope. Add a focused event-loop wake proof for Node.js/libuv-shaped
runtimes. SwiftOS already had pthreads, eventfd counters, and poll readiness as
separate C/newlib probes; this milestone proves the combined pattern libuv
depends on: a worker thread writes to an eventfd while the main thread is blocked
inside poll, and the main thread wakes, drains the counter, and observes the
fd as no longer readable. This is not a full upstream libuv audit; it closes one
concrete async-wake surface while the catalog keeps the broader libuv thread
audit blocker.
/bin/uvwakeprobe: creates a nonblocking close-on-exec eventfd, starts a pthread worker, waits inpoll(POLLIN), verifies the worker'seventfd_writewakes the waiter with the expected counter value, joins the worker, and verifies a drained zero-timeout poll.make uvwake-test: boots QEMU, logs in, runs the probe, and asserts the cross-thread wake and drained-poll markers.
Acceptance. make uvwake-test, make docs-test,
make ports-catalog-test, make eventfd-test, make threadsync-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM12 — Node.js V8 lite-mode jitless policy (DONE, 2026-06-11)
Scope. Settle the first SwiftOS V8 policy for Node.js. The pinned Node
24.16.0 configure.py documents --v8-lite-mode as a constrained-environment
mode that implies no JIT support, so the initial SwiftOS Node.js recipe keeps
that flag as the accepted jitless profile. This avoids making executable-code
generation a prerequisite for the first runnable Node package; optional V8 JIT
enablement remains a future profile decision. Node.js is still blocked on the
full libuv thread audit before the runtime can be claimed runnable.
ports/lang/nodejs/Port.json: documents--v8-lite-modeas the chosen jitless V8 profile while keeping the static/no-bundled-npm/no-corepack recipe shape.ports/catalog.json: removes the genericV8 JIT or jitless policyblocker from Node.js and keepsfull libuv thread auditas the remaining runtime blocker.tests/swport_recipe_test.swiftandtests/swport_catalog_test.swift: guard the recipe's V8-lite/static policy and the catalog blocker transition.
Acceptance. make ports-recipe-test, make ports-catalog-test,
make docs-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM13 — pthread barrier probe for libuv native path (DONE, 2026-06-11)
Scope. Cover the pthread barrier primitive selected by Node's vendored
libuv 1.52.1. SwiftOS newlib exposes PTHREAD_BARRIER_SERIAL_THREAD and
pthread_barrier_* declarations, so libuv's Unix thread layer uses its native
pthread barrier branch rather than the internal mutex/cond fallback. The compat
layer now provides process-local pthread_barrierattr_* and reusable
pthread_barrier_* behavior over the existing futex-backed thread primitives.
This closes one concrete libuv thread primitive while the catalog keeps the
broader full libuv thread audit blocker.
userland/compat/pthread.handuserland/compat/stubs.c: enable_POSIX_BARRIERSand implement process-local barrier attrs, zero-count rejection, reusable barrier phases, one serial-thread return per phase, busy destroy rejection, and cleanup./bin/uvbarrierprobe: proves libuv's native barrier shape with two worker threads plus the main thread across two reusable phases.make uvbarrier-test: boots QEMU, logs in, runs the probe, and asserts the barrier attr/native-path and reusable-phase markers.
Acceptance. make uvbarrier-test, make docs-test,
make ports-catalog-test, make threadsync-test, make pthread-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM14 — libuv local socketpair probe (DONE, 2026-06-11)
Scope. Cover the AF_UNIX socketpair(SOCK_STREAM) primitive used by
Node's vendored libuv 1.52.1 for local stream/process pipe paths. SwiftOS still
does not provide a Linux socket ABI, but the VFS now exposes a narrow local
full-duplex pair over two existing pipe queues. Each returned fd carries normal
read/write POSIX rights, supports SOCK_NONBLOCK and SOCK_CLOEXEC, reports
read/write readiness through poll, and reports peer close through
POLLHUP/POLLERR. This closes one concrete libuv local-stream primitive
while the catalog keeps the broader full libuv thread audit blocker.
kernel/vfs/vfs.swiftandkernel/syscall/syscall.swift: add SwiftOS syscall 78 and a full-duplex pipe-pair description that participates in the existing fd rights,read,write,poll,fcntl, close, and S4b VFS accounting paths.userland/compat/stubs.c: implementssocketpair(AF_UNIX, SOCK_STREAM, 0, fds)over the SwiftOS syscall, including nonblocking/close-on-exec flags andSO_TYPEmetadata./bin/uvsocketpairprobe: proves unsupported-domain errors, flags,SO_TYPE, nonblocking empty reads, bidirectionalread/writeandsend/recv, and peer-close readiness.make uvsocketpair-test: boots QEMU, logs in, runs the probe, and asserts the local-pair markers.
Acceptance. make uvsocketpair-test, make docs-test,
make ports-catalog-test, make socket-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM15 — libuv timed condition wait probe (DONE, 2026-06-11)
Scope. Cover the pthread_cond_timedwait path used by Node's vendored
libuv 1.52.1 in deps/uv/src/unix/thread.c. libuv initializes Unix condition
variables with pthread_condattr_setclock(CLOCK_MONOTONIC) and then passes
monotonic absolute deadlines to pthread_cond_timedwait; SwiftOS previously
accepted the condattr clock but did not provide the timed wait implementation.
The C/newlib compat layer now records process-local condvar clock attributes
out-of-band and supports realtime plus monotonic timed waits over the existing
mutex, condition-sequence, and nanosleep primitives. This closes one concrete
libuv thread primitive while the catalog keeps the broader full libuv thread audit blocker.
userland/compat/stubs.c: implementspthread_cond_timedwait, preserves condattr clock selection despite newlib's 32-bitpthread_cond_t, returnsETIMEDOUTfor expired absolute deadlines, and reacquires the mutex before returning./bin/uvcondprobe: proves a libuv-styleCLOCK_MONOTONICtimeout and a worker-thread signal that wakes the waiter before its deadline.make uvcond-test: boots QEMU, logs in, runs the probe, and asserts the timed-condition markers.
Acceptance. make uvcond-test, make docs-test,
make ports-catalog-test, make threadsync-test, make pthread-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM16 — libuv signal watcher self-pipe probe (DONE, 2026-06-11)
Scope. Cover the setup and dispatch shape used by Node's vendored libuv
1.52.1 signal watcher on Unix. SwiftOS already supports sigaction, raise,
current-process handler frames, and default SIGTERM termination. The C/newlib
compat layer now also exposes a pthread_sigmask facade over the existing
no-op sigprocmask mask surface, and the new probe exercises the libuv-style
path where a signal handler writes a compact message into a nonblocking pipe
that the event loop polls. This closes one concrete signal-watcher primitive
while the catalog keeps full signal-mask enforcement, remote async handler
delivery, and the broader full libuv thread audit blocker.
userland/compat/stubs.c: addspthread_sigmaskwith pthread-style error returns and validatessigprocmaskoperations when a new mask is supplied./bin/uvsignalprobe: provespthread_sigmask(SIG_SETMASK, ...), a libuv-shaped signal lock pipe,sigaction(SIGTERM, SA_RESTART), handler writes into a nonblocking signal pipe,poll(POLLIN), message drain, and disposition restoration.make uvsignal-test: boots QEMU, logs in, runs the probe, and asserts the signal-watcher markers.
Acceptance. make uvsignal-test, make docs-test,
make ports-catalog-test, make signal-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM17 — libuv pthread_atfork probe (DONE, 2026-06-11)
Scope. Cover the pthread_atfork prepare/parent/child callback ordering
that Node's vendored libuv uses to reinitialize process-global state after
fork. SwiftOS still exposes its own POSIX-like syscall surface rather than a
Linux ABI, but the C/newlib compat layer now keeps a small process-local
atfork registry and routes fork/the current vfork alias through the same
handler path. This closes one concrete libuv process primitive while the
catalog keeps the broader full libuv thread audit blocker.
userland/compat/stubs.c: replaces the old no-oppthread_atforkwith a bounded handler registry, reverse-orderpreparecallbacks, registration-orderparent/childcallbacks, parent cleanup on failedfork, and child-side compat lock reset before child callbacks./bin/uvatforkprobe: proves two-handler ordering, parent/child memory isolation after fork, and a pipe report from the child back to the parent.make uvatfork-test: boots QEMU, logs in, runs the probe, and asserts the atfork ordering markers.
Acceptance. make uvatfork-test, make docs-test,
make ports-catalog-test, make signal-test, ./tests/cow_test.sh,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM18 — libuv mutex type probe (DONE, 2026-06-11)
Scope. Cover the mutex attribute types used by Node's vendored libuv
1.52.1 Unix thread wrappers. uv_mutex_init() uses
PTHREAD_MUTEX_ERRORCHECK when available, and uv_mutex_init_recursive()
requires PTHREAD_MUTEX_RECURSIVE; SwiftOS previously accepted only normal
and default mutex types. The C/newlib compat layer now keeps a small
process-local mutex metadata table keyed by pthread_mutex_t *, preserving
the existing 32-bit futex word while adding owner tracking for error-check
mutexes and recursion depth for recursive mutexes. This closes one concrete
libuv thread primitive while the catalog keeps the broader full libuv thread audit blocker.
userland/compat/stubs.c: acceptsPTHREAD_MUTEX_ERRORCHECKandPTHREAD_MUTEX_RECURSIVE, records typed mutex metadata out of band, returnsEDEADLKfor same-thread relock of error-check mutexes, returnsEPERMfor foreign unlocks, and maintains recursive lock depth./bin/uvmutexprobe: proves invalid type rejection, error-check lock behavior, cross-thread unlock rejection, recursive lock/trylock depth, and post-release cross-thread acquisition.make uvmutex-test: boots QEMU, logs in, runs the probe, and asserts the libuv mutex-type markers.
Acceptance. make uvmutex-test, make docs-test,
make ports-catalog-test, make pthread-test, make threadsync-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM19 — libuv thread-name probe (DONE, 2026-06-12)
Scope. Cover the pthread thread-name helpers used by Node's vendored libuv
1.52.1 Unix thread layer. uv_thread_setname() and uv_thread_getname() call
pthread_setname_np and pthread_getname_np on the generic Unix path; SwiftOS
previously exposed the newlib declarations but did not implement the symbols in
the compat layer. The C/newlib facade now keeps bounded, process-local names for
created pthread records plus the main thread, using the 16-byte limit that libuv
selects for generic Unix/Linux-shaped pthread names. This closes another
concrete libuv thread primitive while the catalog keeps the broader full libuv thread audit blocker.
userland/compat/pthread.h: exposespthread_setname_npandpthread_getname_npto compat builds independent of feature-test macro choices in the bare-metal newlib sysroot.userland/compat/stubs.c: stores per-thread names, rejects overlong names withERANGE, returnsERANGEfor undersized get buffers, and returnsESRCHafter a thread record has been joined and released./bin/uvthreadnameprobe: proves default main-thread name, main-thread set/get, name length errors, missing-thread errors, parent-set worker names, worker self-set names, and joined-thread cleanup.make uvthreadname-test: boots QEMU, logs in, runs the probe, and asserts the libuv thread-name markers.
Acceptance. make uvthreadname-test, make docs-test,
make ports-catalog-test, make pthread-test, make uvmutex-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM20 — libuv semaphore probe (DONE, 2026-06-12)
Scope. Cover the POSIX semaphore behavior used by Node's vendored libuv
1.52.1 Unix semaphore wrappers. SwiftOS already provided process-local
sem_* primitives for the broader C thread-sync facade; this milestone ties
that surface directly to the libuv audit by proving init/destroy, empty
sem_trywait, realtime absolute sem_timedwait timeout, cross-thread
sem_post wakeup, counting post/wait semantics, and overflow rejection. This
closes another concrete libuv thread primitive while the catalog keeps the
broader full libuv thread audit blocker.
/bin/uvsemprobe: proves the libuv-shaped POSIX semaphore paths over the existing newlib compatsem_*implementation plus pthread worker wakeup.make uvsem-test: boots QEMU, logs in, runs the probe, and asserts the semaphore markers.- Catalog and command/API docs now list
uvsemprobealongside the other Node/libuv compatibility probes.
Acceptance. make uvsem-test, make docs-test,
make ports-catalog-test, make threadsync-test, make pthread-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM21 — libuv rwlock probe (DONE, 2026-06-12)
Scope. Cover the POSIX read/write lock behavior used by Node's vendored
libuv 1.52.1 Unix rwlock wrappers. SwiftOS already provided process-local
pthread_rwlock_* primitives for the broader C thread-sync facade; this
milestone ties that surface directly to the libuv audit by proving attr/init,
static initializer use, writer exclusion, concurrent readers, and a blocked
writer waking once readers release the lock. This closes another concrete libuv
thread primitive while the catalog keeps the broader full libuv thread audit
blocker.
/bin/uvrwlockprobe: proves the libuv-shaped pthread rwlock paths over the existing newlib compat implementation plus pthread/sem worker coordination.make uvrwlock-test: boots QEMU, logs in, runs the probe, and asserts the rwlock markers.- Catalog and command/API docs now list
uvrwlockprobealongside the other Node/libuv compatibility probes.
Acceptance. make uvrwlock-test, make docs-test,
make ports-catalog-test, make threadsync-test, make pthread-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM22 — libuv thread stack probe (DONE, 2026-06-12)
Scope. Cover the stack-size path used by Node's vendored libuv 1.52.1
uv_thread_create_ex implementation. That path calculates a usable thread
stack from getrlimit(RLIMIT_STACK), getpagesize(), libuv's 8192-byte floor,
PTHREAD_STACK_MIN, and pthread_attr_setstacksize before creating a pthread.
SwiftOS already exposed the underlying pieces; this milestone ties them to the
libuv audit with a dedicated C/newlib probe and makes getpagesize() explicit
in the compat <unistd.h> header.
/bin/uvthreadstackprobe: proves the libuv-shaped stack limit/page-size calculation, pthread attr bounds, a rounded requested-stack thread, and anRLIMIT_STACK-sized thread.make uvthreadstack-test: boots QEMU, logs in, runs the probe, and asserts the stack-sizing markers.- Catalog and command/API docs now list
uvthreadstackprobealongside the other Node/libuv compatibility probes.
Acceptance. make uvthreadstack-test, make docs-test,
make ports-catalog-test, make pthread-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM23 - libuv process spawn handshake probe (DONE, 2026-06-12)
Node's child_process path enters libuv's Unix uv_spawn implementation, where
the parent blocks signals around fork, the child maps stdio with dup2, and a
close-on-exec error pipe tells the parent whether execvp succeeded. That
contract matters for Node, npm install scripts, and PM2 child lifecycles, so this
milestone captures it in a dedicated C/newlib probe before attempting a real
Node runtime.
/bin/uvspawnprobe: proves the successfulexecvppath with EOF on the close-on-exec error pipe, argv/stdout capture throughdup2, andwaitpidexit status.- The same probe also covers the failed-
execvppath where the child writes-errnoto the error pipe and exits with status 127. make uvspawn-test: boots QEMU, logs in, runs the probe, and asserts the libuv-shaped spawn markers.- Catalog and command/API docs now list
uvspawnprobeas the process-spawn bridge for Node, npm, and PM2 planning.
Acceptance. make uvspawn-test, make docs-test,
make ports-catalog-test, make signal-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM24 - libuv key/once/thread identity probe (DONE, 2026-06-12)
Node and libuv use the Unix thread layer for one-time initialization, thread-local request/runtime state, worker identity comparisons, joins, and detached helper threads. SwiftOS already had the underlying pthread facade; this milestone ties the remaining key/once/identity pieces directly to libuv-shaped wrappers so the Node runtime audit can retire another thread-layer gap.
/bin/uvkeyonceprobe: provesuv_once-style one-time initialization, thread-local key create/get/set/delete behavior, thread self/equality checks, joined worker completion, and detached worker completion.make uvkeyonce-test: boots QEMU, logs in, runs the probe, and asserts the key/once/thread identity markers.- Catalog and command/API docs now list
uvkeyonceprobealongside the other Node/libuv compatibility probes.
Acceptance. make uvkeyonce-test, make docs-test,
make ports-catalog-test, make pthread-test, make uvthreadstack-test,
./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM25 - execve envp and libuv environment handoff probe (DONE, 2026-06-12)
Node, npm, and PM2 all rely on process.env plus custom child-process
environment handoff. Libuv's Unix spawn path can override environ in the child
before execvp, and newlib's getenv/setenv/unsetenv must see the same
environment vector that execve placed on the new process stack. SwiftOS
previously accepted an envp argument at the syscall boundary but did not copy
it into the replacement image.
execve(path, argv, envp)now packs both argv and envp from the caller and builds both vectors on the new user stack.- The newlib crt0 now initializes global
environfrom the incomingenvpbefore callingmain, so libc environment helpers operate on the inherited environment. execvpenow passes its explicit environment through SwiftOS path search instead of falling back to the ambientenviron./bin/uvenvprobeplus/bin/envchildprove parentgetenv/setenv/unsetenv, libuv-styleenvironoverride beforeexecvp, child-sidemain(..., envp)/environagreement, and parent environment preservation after the child exits.- Catalog and command/API docs now list
uvenvprobeas the Node/npm/PM2 environment handoff bridge.
Acceptance. make uvenv-test, make docs-test,
make ports-catalog-test, make uvspawn-test, ./tests/boot_test.sh, and
SMP_CPUS=4 SMP_DTB=build/virt-smp4.dtb ./tests/smp_boot_test.sh.
NPM26 - first Node.js cross-build attempt / configure frontier (DONE, 2026-06-15)
The NPM1–NPM25 probes individually validated every libuv/newlib primitive the catalog lists under the Node.js "full libuv thread audit" blocker. The next step in discharging that blocker is to actually drive Node's own build and record the first concrete wall, rather than add more isolated probes. This milestone stands up the real build driver and asserts the current frontier.
- New
scripts/build-node.shis the growing cross-build entry point forports/lang/nodejs. It reads the pinned source URL + sha256 directly fromPort.json(so script and recipe cannot drift), fetches and verifies the Node 24.16.0 distfile, extracts it, and runs upstreamconfigure.pywith the exact argument vector recorded in the recipe (--dest-cpu=arm64 --dest-os=swiftos --cross-compiling --fully-static --without-dtrace --without-etw --without-npm --without-corepack --v8-lite-mode). - New
make node-configure-proberuns the driver. The distfile sha256 (f511d32e3876cb54fa6ddccaa8dd46649ae6ebe9e499c57531c5ca56e7ad4548) matches the recipe pin, confirming the scaffoldedPort.jsonsource is correct. - Frontier found. Vanilla
configure.pyrejects--dest-os=swiftos:swiftosis not in its fixedvalid_ostuple (win, mac, solaris, freebsd, openbsd, linux, android, aix, cloudabi, os400, ios, openharmony). Splicingswiftosinto that tuple by hand only exposes the wall immediately behind it: GYP fails because noswiftosflavor exists across GYP, libuv (deps/uv), and V8 (deps/v8). Each selects platform backends by OS name (libuv linux=epoll, bsd=kqueue, sunos=event ports; there is no generic POSIX event backend), so aswiftostarget requires a deliberate platform port across all three trees. The recipe's--dest-os=swiftosis therefore aspirational; the catalog's "full libuv thread audit" blocker resolves into a concrete platform-port series (configure flavor → GYP flavor → libuv backend → V8 platform), not a single switch. - The probe asserts this state:
build-node.shtreats "configure rejects swiftos at thevalid_oswall" as a PASS, and fails loudly if configure ever succeeds or fails elsewhere (frontier moved → recipe needs advancing). This keeps the build driver honest as later milestones clear each wall.
Next (NPM27). Decide the platform strategy — add a first-class swiftos
flavor to configure.py + GYP and a libuv backend that uses our poll-based
event path (we have poll, eventfd, futex; no epoll), versus masquerading
as linux and shimming. Then re-run make node-configure-probe to advance the
frontier to the next wall (expected: GYP/libuv backend selection).
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test.
NPM27 - Node configure passes via linux masquerade; libuv backend wall (DONE, 2026-06-15)
Strategy decision for the platform wall found in NPM26: for the first build pass,
masquerade as linux and close the resulting gaps in newlib/compat, rather
than standing up a first-class swiftos platform across configure + GYP + libuv
- V8 (deferred — that is the larger, cleaner long-term port). Two findings unblocked configure:
- Recipe carried dead flags.
ports/lang/nodejs/Port.jsonpassed--without-dtraceand--without-etw, which Node 24.16'sconfigure.pyno longer defines. configure forwards unknown args to GYP, so GYP aborted withgyp: --without-etw not found while trying to load --without-etw. Both flags are removed from the recipe;--without-npm,--without-corepack,--fully-static, and--v8-lite-moderemain valid. - Masquerade works at configure time. With
--dest-os=linux --dest-cpu=arm64 --cross-compiling --fully-static --v8-lite-mode(CC=aarch64-elf-gcc, CXX=aarch64-elf-g++),configure.pynow reportsconfigure completed successfully. The recipe args andbuild-node.shwere updated to this set;build-node.shmaps the eventual swiftos target toNODE_DEST_OS(defaultlinux). - Frontier moved into the build. libuv's linux backend
(
deps/uv/src/unix/linux.c) hard-includes<sys/epoll.h>,<sys/inotify.h>, and<sys/syscall.h>. The probe compiles a one-line TU per header with the SwiftOS include path and confirms all three are ABSENT — SwiftOS haspoll/eventfd/futex but noepoll. So the next wall is libuv's event backend, not configure. make node-configure-probenow asserts this state: configure must succeed and the epoll-class headers must be absent; it fails loudly if either changes.
Next (NPM28). Steer libuv to its existing posix-poll.c backend (which uses
poll, present on SwiftOS) instead of shimming epoll, by adjusting the libuv
GYP backend selection for this target, then advance build-node.sh past the
libuv compile to the next wall (expected: further newlib/compat gaps in libuv
core or V8 platform glue, and the host-mksnapshot cross-build step).
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test, make ports-recipe-test.
NPM28 - libuv linux-backend header surface (DONE, 2026-06-15)
The plan from NPM27 was to steer libuv to its posix-poll.c backend, but
inspection of deps/uv/uv.gyp showed the linux backend (src/unix/linux.c) is
monolithic: it bundles the epoll event loop, inotify fs-events, and procfs
cpu/memory queries. posix-poll.c only ships on aix/os400 and supplies just the
event loop, so swapping to it would drop libuv's cpu/mem/fs functions and create
a wave of undefined symbols. Decision: keep OS==linux and supply the missing
Linux headers as shims, emulating epoll over poll (SwiftOS has poll/eventfd/
futex, no epoll) rather than shimming epoll 1:1.
- New
userland/node-compat/holds the Linux-API shims, deliberately separate fromuserland/compatso adding epoll/inotify/etc. cannot change feature detection for the other source ports (nginx, curl, ...) that build against the shared compat layer.build-node.shputs it on the include path ahead ofuserland/compatfor the Node build only. Headers added (declarations only; behaviour deferred to the companion implementation):sys/epoll.h,sys/inotify.h,ifaddrs.h,netpacket/packet.h,net/ethernet.h,sys/prctl.h,sys/syscall.h,syscall.h,dlfcn.h, plus#include_nextshadow headers that addMAP_POPULATE(sys/mman.h),IFF_UP/RUNNING/LOOPBACK(net/if.h), andAF_PACKET/PF_PACKET(sys/socket.h). - Two build-config requirements identified. libuv keys its loop struct
platform fields (epoll fd, inotify watchers, io_uring) on the compiler-defined
__linux__, whichaarch64-elf-gccdoes not set, so the build must pass-D__linux__. newlib gatespthread_rwlock_t/pthread_barrier_ttypedefs on_POSIX_READER_WRITER_LOCKS/_POSIX_BARRIERS(matching the existingNEWLIB_COMPAT_CFLAGS), so those-Ds are required too. - Result: with the shims + those
-Ds,deps/uv/src/unix/linux.c-- the file carrying every Linux-only dependency -- now compiles to an object.make node-configure-probeasserts this (configure succeeds AND linux.c compiles against node-compat); it fails loudly if the surface regresses. - Surface enumerated for NPM29+. A full sweep of libuv's unix sources to .o
shows the remaining work splits cleanly: (a) a constant long-tail in 8 other
files (
cpu_set_t/CPU_*+pthread_*affinity_npin thread.c,CMSG_*in stream.c,sys/sendfile.hin fs.c,linux/errqueue.hin udp.c,rusagefields +SYS_close/SYS_gettid+FIONBIO/MSG_CMSG_CLOEXECin core.c,SA_RESETHANDin signal.c,TIOCGPTNin tty.c,SSIZE_MAXin strscpy.c); and (b) a 13-function implementation surface:epoll_create1/epoll_ctl/ epoll_pwait,getifaddrs/freeifaddrs,inotify_init1/add_watch/rm_watch,prctl,syscall,dlopen/dlsym/dlclose/dlerror.
Next (NPM29). Close the constant long-tail so all of libuv's unix layer
compiles, then (NPM30) implement the 13-function shim — epoll emulated over
poll/eventfd, getifaddrs from the SwiftOS net stack, inotify/syscall
returning -ENOSYS, dlopen failing cleanly — and link libuv.a.
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test, make ports-recipe-test.
NPM29 - libuv unix layer fully compiles under the masquerade (DONE, 2026-06-15)
Closed the Linux constant/type long-tail so every libuv unix source compiles to
an object (not just the linux backend from NPM28). Added shims to
userland/node-compat:
sched.h:cpu_set_t+CPU_SETSIZE/CPU_ZERO/SET/CLR/ISSET/COUNT(static inlines) +sched_get_priority_max/min.pthread.h:pthread_get/setaffinity_np,pthread_get/setschedparam.sys/resource.h: a full BSDstruct rusage(timevalru_utime/ru_stime+ all named counters) reusing compat's include guard so it supersedes compat's minimal rusage for the Node build only (libuv readsru_utime.tv_secetc.).sys/socket.h:CMSG_FIRSTHDR/CMSG_NXTHDR,MSG_CMSG_CLOEXEC,MSG_ERRQUEUE,struct mmsghdr+recvmmsg/sendmmsg.sys/stat.h(UTIME_NOW/OMIT),sys/ioctl.h(FIONBIO,TIOCGPTN,_IOC/_IO/_IOR/_IOW/_IOWR),dirent.h(scandir/alphasort),limits.h(SSIZE_MAX),signal.h(SA_RESETHAND),sys/sendfile.h(sendfile),sys/syscall.h(SYS_close/SYS_gettid),linux/errqueue.h(struct sock_extended_err,SO_EE_OFFENDER,SOL_IP/IP_RECVERR/...).netinet/in.h:IPPROTO_IPV6, IPv4/IPv6 multicast + (source-)membership option constants,struct ip_mreq/ip_mreq_source/ipv6_mreq/group_source_req,extern in6addr_any.- Build also needs
-D_UNIX98_THREAD_MUTEX_ATTRIBUTES=1(newlib gatesPTHREAD_MUTEX_RECURSIVE/ERRORCHECKon it), added alongside the NPM28-Ds.
make node-configure-probe now compiles all 34 libuv unix sources to objects and
asserts zero failures, then enumerates the still-undefined external shim surface:
epoll_create1/ctl/pwait, inotify_init1/add_watch/rm_watch, getifaddrs/ freeifaddrs, sendfile, recvmmsg/sendmmsg, syscall, dlopen/dlsym/dlclose/ dlerror. (in6addr_any is a data symbol to provide at link too.)
Next (NPM30). Implement that shim surface in a node-compat translation unit —
epoll_* emulated over poll+eventfd; getifaddrs from the SwiftOS net stack
(or empty list); inotify_*/syscall/sendfile/recvmmsg/sendmmsg returning
-ENOSYS so libuv falls back; dlopen family failing cleanly; define
in6addr_any — then link libuv.a and advance to the V8 platform glue.
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test, make ports-recipe-test.
NPM30 - libuv links into a static AArch64 ELF on SwiftOS (DONE, 2026-06-15)
Implemented the shim surface in userland/node-compat/node_compat.c, archived
libuv.a, and linked a minimal libuv program — libuv is now usable on SwiftOS
through the linux masquerade.
- epoll emulated over poll(). Each
epoll_create1allocates a realeventfd(so the descriptor is unique and libuv'sclose()works) and a dynamic interest list;epoll_ctlADD/MOD/DEL maintains it;epoll_pwaitbuilds apollfd[], callspoll(), and translatesreventsback to epoll events with the storedepoll_data. SwiftOS has poll/eventfd/futex but no epoll, so this is emulation, not a 1:1 shim. (sigmaskis ignored — libuv passes NULL; an empty interest list waits viapoll(NULL,0,timeout).) - ENOSYS / clean fallbacks so libuv uses portable paths: inotify (no fs
watching),
sendfile/recvmmsg/sendmmsg(read/write + recvmsg loops), rawsyscall, and thedlopenfamily (static-only OS, returns a clear error).getifaddrsreturns an empty list for now;in6addr_anyis defined. - POSIX functions newlib lacks, implemented over what SwiftOS has:
pread/pwritevia save/seek/io/restore,dup3viadup2+FD_CLOEXEC,scandirviaopendir/readdir+qsort,fdatasync→0 (tmpfs),pathconf→4096; and no-op/ENOSYS forsched_yield/sched_getcpu/sched_get_priority_*,pthread_get/setaffinity_np(reports CPU 0),pthread_get/setschedparam,setgroups,getpwuid_r/getgrgid_r/lchown/futimens/utimensat. make node-configure-probenow runs the full chain: configure (linux masquerade) → compile all 34 libuv unix sources → archivelibuv.a→ linkbuild/uvhello.elf(uv_loop_init/uv_run/uv_loop_close + uv_version_string) and assert it is a static AArch64 ELF with no undefined symbols.
These shims live in node-compat (isolated from the shared userland/compat); a
few of the generic ones (pread/pwrite/scandir/dup3) could be promoted to the
shared layer later if other ports need them.
Next (NPM31). Run uvhello.elf in QEMU to prove the epoll-over-poll event
loop works at runtime (not just links), wired through the base image like the
other probes. After that, the V8 platform build (host mksnapshot cross-build,
the largest remaining wall).
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test, make ports-recipe-test.
NPM31 - epoll-over-poll emulation runs in QEMU (DONE, 2026-06-15)
NPM30 proved libuv links; this proves the SwiftOS-authored epoll emulation
runs. Rather than drag the whole Node distfile + 1 MB uvhello.elf into the base
image, a self-contained probe links the same node_compat.c epoll translation
unit and exercises the API on hardware (QEMU).
- New
userland/epollprobe.c(/bin/epollprobe):epoll_create1→epoll_ctl(ADD)an eventfd forEPOLLIN→ assert a 50 ms wait times out with zero events →write()the eventfd → assertepoll_waitreturns exactly one event carrying the rightdata.fdandEPOLLIN→ drain,epoll_ctl(DEL), and assert no further events. It linksnode_compat.o(the NPM30 epoll-over-poll implementation) via a newNODE_COMPAT_CFLAGS(node-compat shims + the masquerade-Ds) and is wired into the base image like the other NPM probes. - Runtime bug found and fixed: the static
epoll_tablelives in BSS (zero-initialised), but free-slot detection usedbacking_fd < 0; since 0 is a valid fd, every slot looked occupied andepoll_create1returnedEMFILE. Added an explicitusedflag (0 = free, the BSS default). Known first-pass limitation: instances are not reclaimed when libuvclose()s the backend fd (no epoll_close hook); 16 concurrent instances is ample for current use. make epoll-testboots the base image and asserts the markersepollprobe: idle timeout OK,epollprobe: readable event OK,epollprobe: ctl del OK,EPOLLPROBE-OK.
Next (NPM32+). The V8 platform build under the masquerade — host-toolset
mksnapshot cross-build, V8's GN/gyp platform assumptions, and the C++ newlib
gap surface. The largest remaining wall.
Acceptance. make epoll-test, make node-configure-probe, make docs-test,
make ports-catalog-test.
NPM32 - V8 recon: blocked on a missing target C++ standard library (DONE, 2026-06-15)
Bounded reconnaissance of the V8 build before committing to it. The decisive finding came from a single compile probe rather than an hours-long build:
- The
aarch64-elfGCC toolchain (Homebrewaarch64-elf-gcc16.1.0) is bare-metal: it ships no libstdc++ — no<vector>/<memory>/<atomic>C++ headers and nolibstdc++.afor the target (g++ -print-file-name= libstdc++.areturns the bare name; the only C++ headers on the machine are the host LLVM libc++ for macOS). Its#include <...>search path for C++ is just the GCC builtin C headers. - V8 is overwhelmingly C++ and needs a C++ standard library even when built
-fno-exceptions -fno-rtti(std::vector, std::unique_ptr,<atomic>,<type_traits>,operator new/delete,__cxa_*static guards). So V8 is blocked on a missing C++ runtime for aarch64-elf+newlib — a prerequisite that sits before any GYP/mksnapshot work. make node-configure-probenow ends with an NPM32 recon check: it tries to compile#include <vector>with the targetg++and reports the V8 C++-stdlib blocker when (as today) that fails; if a target C++ stdlib is later present it instead says the V8 build can proceed.
Path forward (the V8 prerequisite, not yet chosen). Provide a C++ standard
library for the target: (a) rebuild the cross toolchain with
--enable-languages=c,c++ and a newlib-targeted libstdc++ (the well-trodden
arm-none-eabi approach — those toolchains ship libstdc++ over newlib), or
(b) cross-build LLVM libc++/libc++abi for aarch64-elf+newlib. Either is a
sizable sub-project (toolchain/runtime work) that must land before V8 compiles.
Mksnapshot cross-exec and V8's GYP platform assumptions remain to be probed once
a C++ stdlib exists.
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test.
NPM33 - C++ standard library for aarch64-elf+newlib (V8 prerequisite) (DONE, 2026-06-15)
Cleared the NPM32 blocker by giving the target a C++ runtime. New
scripts/build-cxx-toolchain.sh rebuilds GCC from source — matching the
Homebrew version (16.1.0) — with --enable-languages=c,c++ --with-newlib,
building libstdc++ against the newlib already in ./sysroot, and installs
the c/c++ compilers + libstdc++ into the same ./sysroot prefix (gitignored,
like the newlib sysroot). After it runs, sysroot/bin/aarch64-elf-g++ exists
with sysroot/aarch64-elf/lib/libstdc++.a.
- Two issues found and folded into the script:
- The installed driver looked for its assembler/linker in
$prefix/aarch64-elf/binbut binutils live in the Homebrew prefix, so it fell back to the hostasand miscompiled target assembly. The script now symlinksaarch64-elf-{as,ld,ar,nm,ranlib,strip,objcopy,objdump}intosysroot/aarch64-elf/bin. - Linking any C++ program pulled an undefined
_getentropyfrom libstdc++ (std::random_device). Added a_getentropysyscall stub touserland/lib/newlib_syscalls.cbacked bySYS_RANDOM(virtio-rng).
- The installed driver looked for its assembler/linker in
- Validated: a C++ program using
std::vector/std::atomic/std::unique_ptrcompiles (hosted,-fno-exceptions -fno-rtti; libstdc++ rejects-ffreestanding) and statically links to an AArch64 ELF (build/cxxhello.elf) with no undefined symbols.build-node.shnow prefers this toolchain for both CC and CXX and ends with an NPM33 assert that compiles+links that C++ program.
Next (NPM34+). The V8/Node compile itself: host-toolset mksnapshot
cross-build, V8's GYP platform assumptions (is_linux), and any remaining C++
newlib gaps surfaced at compile/link. The largest remaining wall.
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test.
NPM34 - V8 reconnaissance: C++ compiles, no fundamental blocker (DONE, 2026-06-15)
Bounded recon of the V8 compile with the new C++ toolchain, before committing to
the (hours-long) full build. Probe-compiled representative V8 sources directly
(build/v8-probe/, gitignored) rather than running gyp+make.
- V8's C++ compiles against the libstdc++/newlib toolchain. With C++20,
-fno-exceptions -fno-rtti, V8 include dirs, anduserland/{node-compat,compat}on the include path, thesedeps/v8/src/baseTUs compile clean:bits.cc,division-by-constant.cc,cpu.cc,platform/condition-variable.cc,platform/semaphore.cc,utils/random-number-generator.cc. Notablycondition-variable.ccpulls in Abseil (deps/v8/third_party/abseil-cpp, vendored) and still compiles — so V8 + Abseil's C++ is viable here. No fundamental C++-runtime blocker. - Remaining gaps are the familiar header-shim class (same as libuv), seen in
platform-posix.cc/platform-posix-time.cc/sys-info.cc:MADV_DONTNEED,PRIO_PROCESS,PTHREAD_STACK_MIN,RTLD_DEFAULT,__NR_gettid,struct tm.tm_gmtoff/tm_zone,RLIMIT_*/getrlimit. Several already exist in node-compat (e.g. sys/resource.h constants). - Include-ordering is the central NPM35 task. node-compat's value comes from
shadowing newlib headers (it augments them via
#include_next), which needs node-compat ahead of the system dirs; but putting it ahead with-isystembreaks libstdc++'s own#include_next <stdlib.h>chain (cstdlib fails to find stdlib.h).-idirafterfixes libstdc++ but then the node-compat augmentations aren't picked up where newlib already has a (thinner) header. Reconciling these — per-header shadow vs fallback — is the main work to get V8 compiling, not a toolchain or runtime limitation. - mksnapshot looks feasible: config.gypi has
host_arch=arm64,target_arch=arm64, so V8's host-toolset mksnapshot (built with the host clang/libc++, not our newlib) can bake an arm64 target snapshot on this host.
Next (NPM35+). Wire the node-compat/compat include strategy into Node's V8 build (gyp cflags), add the remaining base-platform constant shims, then drive the actual V8 + Node compile (host mksnapshot, then target objects) — the long multi-milestone haul. No fundamental blocker identified; the path is tractable.
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test.
NPM35a - V8 base/platform layer compiles under the masquerade (DONE, 2026-06-15)
First concrete step of the V8 compile: V8's OS-interface layer
(deps/v8/src/base/platform) — where the Linux header/constant gaps
concentrate — now compiles against the new C++ toolchain.
- Include strategy nailed down. Put
userland/node-compatthenuserland/compaton the include path with-isystem(so node-compat's#include_nextaugmentations of newlib headers take effect), but do NOT-isystemthe newlib dir itself — the toolchain already places it after the C++ headers, so libstdc++'s#include_next <stdlib.h>(from<cstdlib>) still resolves. Explicitly-isystem-ing newlib was what broke earlier C++ probes. - Shims added to node-compat for the base/platform gaps:
MADV_*+madvise(sys/mman.h),RTLD_DEFAULT/RTLD_NEXT(dlfcn.h),__NR_gettid(sys/syscall.h),pthread_getattr_np(pthread.h), and newsys/auxv.h+linux/auxvec.h(getauxval/AT_HWCAP). Implementations innode_compat.c:madviseno-op,pthread_getattr_npreports an 8 MiB default stack,getauxvalreturns 0 (AArch64 baseline, no optional CPU bits). - Two build knobs:
-D__TM_GMTOFF=tm_gmtoff -D__TM_ZONE=tm_zone(newlib gates thosestruct tmfields behind these macros; V8 reads them by the standard names); and anextern "C"guard added to the shareduserland/compat/stdlib.h(itsmemaligndecl clashed with newlib's C-linkage one in C++ TUs — a latent bug, now fixed harmlessly for C consumers). make node-configure-probenow compiles 6 representative V8 base/platform TUs (bits, cpu, sys-info, platform-posix, platform-posix-time, condition-variable — the last pulls in vendored Abseil) and asserts they build. V8 + Abseil C++ is viable; no fundamental blocker.
Next (NPM35b+). Wire this include strategy + defines into Node's V8 gyp
cflags, then drive the full V8 + Node compile: Torque/bytecode generators, the
host-toolset mksnapshot (host clang, bakes the arm64 snapshot), the thousands
of target TUs (expect more header-shim whack-a-mole outside base/platform), and
the final link. The long multi-hour, multi-milestone haul; the groundwork
(toolchain, libuv, include strategy, base/platform) is in place.
Acceptance. make node-configure-probe, make docs-test,
make ports-catalog-test.
D-series — persistent /data storage (durable SQLite), 2026-06-16
Why. Hosting our own site (nginx + Let's Encrypt + Node/Strapi + SQLite)
needs storage that survives reboot. The bring-up FS was deliberately two-tier
(read-only signed base + RAM tmpfs; "data loss on reboot acceptable by design").
This series adds a third, persistent writable tier at /data and is an
explicit, reviewed change to that hard decision (CLAUDE.md updated to three-tier).
Decisions recorded.
- New tier lives on a dedicated, separate virtio-blk disk (id
swosdata), not the base/ESP disks, so the signed base stays immutable. The kernel scan (virtioBlkInit) identifies it positively by anSWDATAFSsector-0 magic. - No FS journaling (consistent with the project stance). datafs is a small
inode-table + block-bitmap filesystem. Crash-safety comes from honest
fsyncplus the application's own journaling (SQLite's rollback journal), not from FS journaling. The superblock is written only in sector 0 of block 0, so the D0 raw boot-counter (sector 2) and the FS metadata never overlap. - File size cap is single-indirect (one index block -> ~4 MiB/file at 4 KiB blocks) for now; double-indirect is a later extension if needed.
Milestones (all on branch claude/funny-ishizaka-2b024a).
- D0 (
acd659d): second writable virtio-blk "data" disk + raw read/write/ flush range; boot self-test proves a counter survives reboot. Gate:make data-persist-test. - D1 (
7deacfb):kernel/fs/datafs.swifton-disk FS, mounted at/data, mirrored into VNodes;vfs.swiftroutes create/open/read/write/lseek/ ftruncate/mkdir/unlink/rmdir/rename to datafs. Gate:make datafs-test. - D2 (
4a61aef):fsync/fdatasync(SYS_FSYNC=86) andsync(SYS_SYNC=87) flush the data disk to media; newlib stubs wired. Gate:make datafs-fsync-test. - D3 (
4bcb6d4): the packagedsqlite3shell baked into the base image at/bin/sqlite3;vfsFcntlaccepts POSIX record locks (F_GETLK/F_SETLK/F_SETLKW = newlib 7/8/9) as no-op success so SQLite's unix VFS proceeds. Gate:make datafs-sqlite-test— create+insert into/data/app.db, reboot, SELECT the row back.
Open items.
- SYS_FSYNC/SYS_SYNC are syscall numbers 87/88 (security_info_ex from the sudo arc is 86); the earlier 86 collision was resolved when main was merged in.
make docs-testhas pre-existing failures unrelated to this series (ptyprobe/ptysigprobe command entries and severalswiftos_*PTY/waitpid Swift bridge entries from the HC34/HC36 sessions). The D-series reference entries (sqlite3 command; fsync/sync/openpty/pty_set_foreground syscalls) are documented.
NPM35b - full Node build attempt: build-driver mechanics + two macOS-host walls (IN PROGRESS, 2026-06-15)
First real make of Node under the masquerade (on branch node-v8-build).
Established the build-driver mechanics and hit two substantial environment walls.
Mechanics established (work):
- gyp generates
out/and the build runs. The masquerade target compiler (sysroot/bin/aarch64-elf-{gcc,g++}) is set viaCC/CXXat configure; the host toolset (Torque, js2c, the snapshot host tools — must run on macOS) needsCC_host=cc CXX_host=c++or it wrongly uses the cross compiler and dies on-pthread. - Target-only flag injection works:
make CFLAGS="…" CXXFLAGS="…"is routed by gyp-make to the target toolset only (host usesCFLAGS_host), confirmed — our-isystem userland/node-compat -isystem userland/compat+ the-Dknobs appear on target compiles and NOT on host ones. So no common.gypi patch needed. - Configured with
--without-snapshot --without-node-snapshot --without-node-code-cache --without-inspector --without-intlto shrink the first build (no ICU, no inspector, snapshotless V8 — pairs with--v8-lite-mode).
Two walls (both point away from a macOS build host):
- Host tools build libuv as Linux on macOS. Node still builds a host libuv
(
obj.host/libuv) even with snapshots off; underOS=="linux"gyp picksdeps/uv/src/unix/linux.cfor both toolsets, so hostcc(macOS clang) tries to compile the Linux backend and fails onnetpacket/packet.h/syscall.h. gyp uses a singleOSfor source selection across toolsets, so a macOS host can't produce the Linux-shaped host tools. - Toolchain has no threads.
scripts/build-cxx-toolchain.shbuilt GCC--disable-threads, so the targetaarch64-elf-gccrejects-pthread(which gyp's linux config adds) — and V8 is heavily threaded, so it needs--enable-threads=posixanyway. The toolchain must be rebuilt with threads.
Recommended pivot (fork for the user). Cross-building Node for Linux on
macOS fights the toolchain at the host-toolset layer. The conventional, far more
tractable path is to run the Node/V8 build inside a Linux build environment
(Docker) — host=linux builds the host tools natively, target = the SwiftOS
aarch64 masquerade — which removes the entire host-as-linux class. Independently,
the cross toolchain should be rebuilt --enable-threads=posix (V8 needs threads;
also fixes -pthread). All the SwiftOS-side groundwork (node-compat shims,
include strategy, libuv port, base/platform) carries over unchanged.
Acceptance. make node-configure-probe (groundwork still green); full Node
build deferred pending the build-environment decision.
H-series — bare-metal Hetzner ARM bring-up, 2026-06-16
Why. The real deployment target is the user's Hetzner ARM cloud VM
(ssh root@swiftos.tech -p 651, currently Ubuntu 24.04 aarch64, fully wipeable).
SwiftOS today assumes the QEMU virt board (device-tree firmware, virtio-mmio,
GICv2, virtio-blk). The Hetzner VM presents a different device model (ACPI
firmware, virtio-PCI, GICv3, virtio-scsi). This series writes the missing
drivers/boot support so SwiftOS boots as the actual OS of that VM, reachable
over SSH. Stages H0–H6; develop against a local QEMU profile that reproduces the
VM device model, use the real server only for final bring-up.
H0 — Hetzner-faithful local QEMU profile + firmware investigation (DONE, 2026-06-16)
Deliverables.
make hetzner-run(Makefile): boots the current UEFI disk under a QEMU profile matching the live VM —-M virt,gic-version=3 -cpu max -m 4G -smp 2, ACPI on (noacpi=off), boot disk on virtio-scsi-pci, plus virtio-net-pci and virtio-rng-pci. This reproduces all four gaps (ACPI / virtio-PCI / GICv3 / virtio-scsi) locally so H1–H5 can be developed without the server.- Findings recorded here (this decides H5's approach).
What the loader/kernel sees on the Hetzner profile (QEMU 11.0.1, firmware
edk2-stable202408-prebuilt.qemu.org, representative of the VM's EDK2/BOCHS):
- EFI loader works over virtio-scsi-pci. Firmware boots
BOOTAA64.EFIfrom the GPT/ESP atPciRoot(0x0)/Pci(0x1,0x0)/Scsi(0x0,0x0); the loader reads the kernel slot from the ESP via the firmware Simple File System with NO change. Decision input for H3: the loader can readbase.imgfrom the ESP the same way (transport-agnostic via firmware) — the ESP-ramdisk route is viable and we need not write a virtio-scsi kernel driver just to mount the root FS. - FDT configuration table: ABSENT. Under ACPI mode the edk2 firmware does
not publish the FDT table (
gFdtTableGuid, the standardb1b621d5-f19c-41a5-...GUID — verified correct). Loader prints "device tree NOT in config table". The kernel's RAM scan then finds no DTB and keeps QEMU-virt compiled-in defaults. Decisive for H5: there is NO FDT fallback on the real target; H5 must parse ACPI (RSDP→XSDT→MADT for GIC, SPCR for UART, MCFG for ECAM, GTDT for timer). (Note: withacpi=offthe QEMU virt firmware does publish an FDT table — that is the existinguefi-run/disk-runprofile, which stays working. The dual path is firmware-mode driven.) - ACPI 2.0 table: PRESENT (
EFI_ACPI_20_TABLE_GUID→ RSDP). So the RSDP is reachable as an EFI configuration table; the loader already probes it and can forward its pointer to the kernel for H5. - CurrentEL 1, MMU on at handoff — same as the existing UEFI path.
- Memory: largest conventional region base
0x4800_0000, size0xF460_E000(~3.9 GiB) for-m 4G. RAM base is still0x4000_0000; firmware reserves0x4000_0000–0x4800_0000. The compiled-inramSize(256 MiB) is wrong for this VM — H5/ACPI (or the EFI memory map) must supply real RAM size. - No GOP framebuffer — headless, serial-only (PL011 @
0x0900_0000, matches). - Kernel panics at GICv2 init as expected: with DT/ACPI giving nothing it
keeps the GICv2 defaults and faults reading the GICv2 CPU interface at
0x0801_0000(FAR_EL1=0x08010000, ESR0x96000050= data abort) — there is no MMIO GICC under GICv3. This is the concrete H1 evidence (GICv3 needed).
Re-plan note (H5). H0 resolves the open H5 question: the FDT-config-table fallback is NOT available on the ACPI VM, so H5 is "parse minimal ACPI", not "consume the FDT table". The loader already locates the RSDP; the remaining work is XSDT/MADT/SPCR/MCFG/GTDT parsing in the kernel and forwarding the RSDP from the loader (currently it only prints present/absent).
Acceptance. make hetzner-run boots the loader under the VM device model and
the survey above is reproduced; findings committed. (A clean kernel boot is not
expected until H1/H5 land.)
H1 — GICv3 driver (detect v2/v3; redistributor + ICC_* CPU interface) (DONE, 2026-06-16)
Goal. The Hetzner VM (and -M virt,gic-version=3) present a GICv3: a
distributor + per-CPU redistributors + a system-register CPU interface
(ICC_*). The kernel had a GICv2-only MMIO driver and faulted at the GICv2 GICC
window (0x0801_0000) on a GICv3 machine (H0). Make the GIC driver dual-path
(detect, don't replace) so the same kernel drives both, and prove interrupts
work under the GICv3 profile.
What changed.
kernel/drivers/gic.swiftrewritten as a dual-path driver. Version is detected fromID_AA64PFR0_EL1.GIC(bits [27:24]; nonzero ⇒ GICv3 sysreg interface). This is a CPU register, so it is fault-free — an early attempt to readGICD_PIDR2(offset0xFFE8) instead aborted on the GICv2 distributor (QEMU's v2 GICD has no register there; v2 ID regs live at0xFE8). Lesson recorded: do not probe v3-only MMIO offsets to detect the version.- GICv3 init: distributor
GICD_CTLR = ARE | EnableGrp1 | EnableGrp0(QEMU/ Hetzner run the GIC in the single-security-state DS=1 view, so the NS group bits are settable from EL1); per-PE redistributor wake (clearGICR_WAKER.ProcessorSleep, waitChildrenAsleep), SGIs/PPIs → Group 1 in the SGI frame; system-register CPU interface (ICC_SRE_EL1.SRE=1,ICC_PMR_EL1=0xFF,ICC_BPR1_EL1=0,ICC_CTLR_EL1=0,ICC_IGRPEN1_EL1=1). - Ack/EOI via
ICC_IAR1_EL1/ICC_EOIR1_EL1; SGI generation viaICC_SGI1R_EL1(TargetList = the 8-bit CPU mask, single cluster Aff1/2/3=0). - SPI routing uses
GICD_IROUTER(64-bit/INTID, valid under ARE) instead of the v2GICD_ITARGETSR; SGI/PPI enable/priority go to this PE's redistributor SGI frame (found by matchingGICR_TYPERaffinity to MPIDR). - The same surface (
gicInit,gicInitCpuInterfaceForCurrentCpu,gicEnableInterrupt,gicAcknowledge,gicEndInterrupt, the SGI helpers,gicSoftwareGeneratedInterruptSelfTest) branches on the detected version, so the SMP per-CPU init and the IPI substrate are correct on both.
- GICv3 init: distributor
kernel/arch/aarch64/io.h: ICC_*msr/mrsbridges +read_id_aa64pfr0_el1(Embedded Swift cannot emitmsr/mrs; this is the documented low-level bridge exception, like the existing MMIO/cntp_*shims).- HAL:
platform.gicRedist(default0x080A_0000— the GICR base on QEMU virt GICv3 and the Hetzner VM).fdt.swiftrecognisesarm,gic-v3and records the second reg range as the redistributor (via agicIsV3flag kept in thePlatformInfo.flagsword — a storedBoolbetween theUIntfields broke the struct's 8-byte alignment and alignment-faulted at M1 with the MMU off; recorded as a strict-align gotcha for that struct).
Acceptance. make gicv3-test (tests/gicv3_test.sh) boots the kernel on
-M virt,gic-version=3 -cpu max -smp 2 and asserts interrupts are live
multi-core, before any base FS / userland: M2 GIC: GICv3 … (detection), S2a
per-CPU timer-IRQ heartbeat for CPU0 and the secondary, S1 secondary
online, and S3b SGI/IPI substrate (ICC_SGI1R). On both GICv2 and GICv3 the
boot then reaches the identical point (the pre-existing no-base.img S2
userland guard — out of scope here). Regression: the GICv2 path is unchanged
(M2 GIC: GICv2, same markers); the host fdt_test + qemu_virt_hardware_map
gates still pass. Wired into make test.
Bonus. make hetzner-run (the ACPI/PCI/GICv3 profile, no DTB) now also
clears GIC init — it detects GICv3 via the CPU register, uses the default GICD/
GICR bases (correct for the VM), brings the secondary online over PSCI, and
reaches the same S2 point. So H1 directly removes the GIC blocker on the real
target; virtio-PCI (H2/H3) and ACPI platform config (H5) remain.
Not done here. Full SMP+userland validation on GICv3 (the S2–S5 / userland
suite under gic-version=3) is deferred until a root FS boots on the GICv3
profile (after H3); the existing suite still runs on GICv2. The GIC primitives
themselves (per-CPU timer IRQ + SGI on two CPUs) are proven by gicv3-test.
H2 — PCIe ECAM enumeration + virtio-PCI transport (DONE, 2026-06-16)
Goal. The Hetzner VM (and -M virt,gic-version=3 -cpu max) expose virtio
devices over PCIe, not virtio-mmio. Enumerate PCI config space, drive a modern
virtio-pci device, and introduce a transport abstraction so a device driver
works on either transport. Port the simplest device (virtio-rng) first.
Addresses (QEMU virt high-ECAM == the Hetzner VM, from the DTB pcie node).
- ECAM config space:
0x40_1000_0000(256 GiB), 256 MiB. - 32-bit MMIO window (BARs): CPU
0x1000_0000..0x3eff_0000(pci==cpu). - 64-bit MMIO window:
0x80_0000_0000(512 GiB) — where UEFI firmware places modern virtio BARs (BAR4 is a 64-bit memory BAR).
What changed.
kernel/mm/vm_early.c: the identity map reached only the first 3 GiB and TCR IPS was 36-bit (64 GiB max PA) — neither could touch the 256 GiB ECAM. Raised IPS to 40-bit (max/cortex-a72 both support ≥40-bit PARange) and added two device blocks:l1_table[256]for the ECAM 1 GiB block, and a second L1 table atl0_table[1]mapping the first 4 GiB of the 64-bit PCI window. Gotcha recorded: under UEFI the firmware-assigned BAR landed at0x80_0000_8000, which faulted (level-0 translation) until the 64-bit window was mapped.kernel/drivers/pci.swift: ECAM accessors (needs 8/16-bit MMIO — added to io.h), BAR sizing + assignment (assign in the 32-bit window when unassigned on the-kernelpath; reuse the firmware base under UEFI), and a virtio capability walk (COMMON/NOTIFY/ISR/DEVICE → mapped addresses). Device matching handles both modern ids (0x1040+type) and transitional ids (0x1000..0x103F, type in the PCI Subsystem ID) — QEMU'svirtio-rng-pciis transitional (0x1af4:0x1005) yet exposes the modern caps.kernel/drivers/virtio_transport.swift:VirtioTransport— one control-plane surface (reset/status, VERSION_1 negotiation + FEATURES_OK, queue setup with ring addresses, notify doorbell, ISR ack) overmmio | pci. The virtqueue ring memory is identical, so only this plane branches.virtio_rng.swiftrefactored ontoVirtioTransport: tries virtio-mmio first, then virtio-pci.platform.pcieEcamBase(default0x40_1000_0000; 0 disables).
Acceptance. make virtio-pci-test (tests/virtio_pci_test.sh) boots
-M virt,gic-version=3 -cpu max with -device virtio-rng-pci and asserts the
kernel enumerates the ECAM, assigns the BAR, resolves the caps, and runs a full
virtqueue round trip (descriptor → avail → notify → used) returning entropy:
H2 OK: virtio-pci rng exchanged a queue, bytes 32. Emitted during early driver
bring-up, no base image needed. Wired into make test.
Validated on the real-target path too. make hetzner-run (UEFI / ACPI /
GICv3, firmware-assigned BARs in the 64-bit window) reaches the same H2 OK,
exercising the firmware-BAR-reuse + 64-bit-window-mapping path. Regression:
virtio-mmio rng still exchanges a queue (H2 OK: virtio-mmio …); GICv2/GICv3
direct boots unaffected (MMU/IPS change verified).
Open for H5/H6. The ECAM base is a compiled-in default (correct for both targets); ACPI MCFG parsing (H5) should supply it on the real server rather than assume it. virtio-net over PCI is H4.
H3 — root filesystem from RAM (ESP-ramdisk), no block driver (DONE, 2026-06-16)
Goal. The Hetzner VM's boot disk is virtio-scsi over PCIe, which the kernel does not drive. Rather than write a virtio-scsi driver just to mount the read-only base FS, the UEFI loader reads the packed base image from the ESP into RAM and hands the kernel a ramdisk; the kernel mounts the read-only base from RAM (/tmp is RAM anyway, so this fits the FS design). Acceptance: boots to login with NO block driver bound.
What changed.
boot/efi/loader.c:load_base_ramdiskopens\EFI\swift-os\base.imgon the ESP (firmware Simple File System — works over virtio-scsi-pci, as H0 found),AllocatePages(AllocateMaxAddress, 0x8000_0000)to keep it below 2 GiB (the kernel identity-maps only the first 1 GiB of RAM as normal memory), reads it in, cleans the dcache, and passes base/size to the kernel.- Entry ABI:
boot.Spreserves x4/x5 (ramdisk base/size) alongside the existing x0–x3 (dtb + framebuffer);kernel_maingains two params and callsramdiskInit. The QEMU-kernelpath leaves x4/x5 = 0 → no ramdisk. kernel/fs/ramdisk.swift: the RAM base-image source.ramdiskReadRangemirrors the virtio-blk read contract the VFS expects — 0 on success, a negative errno on a short/out-of-range read (the bug that first broke the mount was returning a byte count instead of 0). Bounds are overflow-safe.kernel/vfs/vfs.swift:vfsImageReadRangeserves the base image from the ramdisk when present (else virtio-blk); the twovirtioBlkAvailable()mount guards now also accept a ramdisk.buildBaseFromDiskstill prefers a virtio-blk base when one is attached (swosbaseCount > 0) and uses the ramdisk only when no block base disk is present — so existing virtio-blk boots are unchanged and the ramdisk activates on the Hetzner-style profile.- Build:
make-disk.sh+ theuefitarget stagebase.imgon the ESP (in\EFI\swift-os). The GPT disk is ~96 MiB; base.img is ~41 MiB.
Acceptance. make h3-ramdisk-test (tests/h3_ramdisk_test.sh) boots the GPT
disk under UEFI on the Hetzner profile (GICv3, boot disk on virtio-scsi-pci,
no virtio-blk), drives the tty demo + login, and asserts: loader staged base.img
into RAM, M11b: no virtio-blk disk attached, the RAM base verified
(ed25519) + M11c mounted, swift-os login: reached, and a command served from
the RAM base ran. So H0–H3 now boot the real-target device model end-to-end to a
login prompt with no kernel block driver. Wired into make test.
Regression. The QEMU -kernel path (x4/x5 = 0, virtio-blk base) is unchanged
— it binds the virtio-blk base (M11b: virtio-blk disk …), mounts (M11c), and
runs the userland (S5f OK). gicv3-test / virtio-pci-test still pass (they
exercise the new entry ABI).
Open for H4/H6. /data (datafs) on the real server still needs a PCI block
path (virtio-blk-pci or virtio-scsi) — out of scope for the read-only root. H4
brings virtio-net over PCI + SSH.
H4 — virtio-net over PCI + SSH reachable (DONE, 2026-06-16)
Goal. Port virtio-net to the PCI transport and prove a bounded SSH command end-to-end over it under the Hetzner network/IRQ model (GICv3 + virtio-net-pci).
What changed.
kernel/drivers/virtio_transport.swift: extended for multi-queue devices — per-queue notify doorbells (notifyAddrs, since virtio-net has rx=0/tx=1 with distinctqueue_notify_off), 64-bit device-feature read/write (deviceFeatures/setDriverFeatures/setFeaturesOk), and device-config reads (configRead32, for the MAC).negotiateVersion1now builds on these.kernel/drivers/virtio_net.swift: the NIC binds overVirtioTransport— tries virtio-mmio first, then virtio-pci (virtioPciFindDevice(deviceType: 1)). Only the control plane (status/features/queue setup/notify/ISR/MAC) moved to the transport; the RX/TX buffer-pool + zero-copy logic is unchanged. Removed the now dead per-device MMIO register/feature constants.- GICv3 SPI fix (gic.swift): the UART RX interrupt (SPI 33) was silent on
GICv3 — SPIs default to Group 0 in
GICD_IGROUPR, but EL1 only takes Group 1 (we setICC_IGRPEN1).gicv3EnableInterruptnow sets the SPI'sGICD_IGROUPRbit to Group 1. H1 had only exercised the timer (PPI, via the redistributorGICR_IGROUPR0) and SGIs; the UART/NIC SPIs were the first real GICv3 SPI consumers and surfaced this. (PPIs were already Group 1, so this is SPI-only.)
Acceptance. make h4-ssh-pci-test (tests/h4_ssh_pci_test.sh) boots GICv3
with the NIC + RNG on PCIe (the Hetzner net/IRQ device model), and asserts the
full path: the guest brings the NIC up over PCIe and gets a DHCP lease
(net-dhcp OK), autostarts /bin/sshd, seeds KEX entropy from virtio-rng-pci,
and a host OpenSSH client runs a bounded /bin/id over the network (QEMU
hostfwd → guest :22) — publickey auth accepted, session exec completed status 0, ssh exit 0, output principal=1(root). The root FS rides on virtio-blk here
for a fast boot (RAM-base boot is the separate H3 gate); this gate isolates
"virtio-net over PCI + SSH". Wired into make test.
Regression. virtio-net over mmio still works (DHCP + ARP + ICMP on QEMU
virt); gicv3-test / virtio-pci-test / h3-ramdisk-test still pass. The
GICv3 SPI fix also benefits every SPI consumer (UART RX, NIC) on the Hetzner
profile — e.g. the H3 ramdisk login over serial now takes keystrokes via the
real UART IRQ.
Status. H0–H4 now boot the Hetzner device model end-to-end and are reachable over SSH in QEMU. Remaining: H5 (derive platform config from ACPI on the real firmware — no FDT, per H0) and H6 (bring-up on the real server).
H5 — platform config from ACPI (no device tree) (DONE, 2026-06-16)
Goal. On the real Hetzner VM there is no FDT (H0) — the firmware publishes only ACPI. Derive the platform map from ACPI so the kernel does not depend on a device tree.
What changed.
- The UEFI loader (
boot/efi/loader.c) forwards the ACPI RSDP pointer to the kernel in x6 (it already located it viaEFI_ACPI_20_TABLE_GUID);boot.Spreserves x6 andkernel_mainpasses it toplatformInit. kernel/arch/aarch64/acpi.swift: a minimal parser. RSDP → XSDT → tables: MADT (GICD base + version → GICv3; GICR base; one CPU per enabled GICC, with MPIDR.Aff0 and the PSCI enable mask), MCFG (PCIe ECAM base), SPCR (console UART), FADT (PSCI conduit HVC/SMC from the ARM boot flags). Like the FDT parser it runs with the MMU off, so every field is assembled from non-inlined byte reads (rd8) — unaligned multi-byte access to Device-typed RAM faults.platformInit(dtbPhys, acpiRsdp)now prefers ACPI when an RSDP was passed (the real firmware path), else the device tree, else defaults. The whole ACPI apply happens with the MMU off, because the ACPI tables sit high in RAM (~5 GiB on the VM,RSDP @ 0x1_3CB4_3018) and are unmapped once the MMU is on. CPU topology is copied throughapplyAcpiTopology, marked@_optimize(none), so the eight adjacentcpuAff0_*stores are not coalesced into a wider unaligned access (the FDT path defers this to post-MMU; the ACPI path can't).- Gotcha repeated from H1: adding a
UIntfield (ecamBase) toPlatformInfoperturbed its size and triggered an unaligned vectorized store at M1 (MMU off, strict-align) — on both the ACPI and FDT paths. Fix: don't grow that struct; the MCFG parse writesplatform.pcieEcamBasedirectly (one aligned global store). The boot-log "M9 OK: discovered from device tree" klog is now conditional onplatformDiscoveredFromAcpi.
Acceptance. make h5-acpi-test (tests/h5_acpi_test.sh) boots the GPT disk
under UEFI on the Hetzner device model (ACPI firmware, GICv3, virtio-PCI) and
asserts M9 OK: hardware discovered from ACPI (not "device tree"), the exact
derived map (gic 0x0800_0000 redist 0x080A_0000 uart 0x0900_0000 ecam 0x40_1000_0000), then the whole stack on those values: GICv3 (M2 GIC: GICv3),
the secondary CPU online via PSCI (S1 OK), a virtio-pci queue (ECAM,
H2 OK), and a DHCP lease over virtio-net-pci. Wired into make test.
Regression. The device-tree paths are unchanged — direct -kernel and the
acpi=off UEFI boot still log "M9 OK: hardware discovered from device tree"
(the klog several tests assert); gicv3-test/virtio-pci-test still pass.
Status. H0–H5 complete: the kernel boots the Hetzner device model
end-to-end — GICv3, virtio over PCIe, RAM-base root FS, SSH-reachable — deriving
its platform map from ACPI with no device tree. Remaining: H6 (bring-up on
the real swiftos.tech server: build the GPT image, dd it onto the boot disk
via the provider rescue system, confirm with the user before the destructive
step, iterate over serial/VNC until SSH reaches SwiftOS).
QW-series — quick-win hardening (post-M13 remediation)
QW6 — one shared enum Errno: Int32 (DONE, 2026-06-18)
Goal. Collapse the duplicated per-subsystem errno let constants and the
scattered inline negative literals into a single source of truth, without
touching a single numeric value or the Int-at-the-syscall-boundary ABI. This
is the safe slice of the "typed errors internally, one flat status at the
boundary" pattern: the kernel names errors with an enum at call sites, but the
trap still returns a plain Int — no throws/Result crosses the boundary
(see the error-handling note in docs/ARCHITECTURE.md).
Mechanism. New kernel/errno.swift defines enum Errno: Int32 covering
every errno value in use (EPERM -1 … EHOSTUNREACH -101) with POSIX-style
case names, plus var code: Int { Int(rawValue) } (@inline(__always)) for the
frame[0] boundary form. A raw-value enum carries no witness/existential cost in
Embedded Swift — .rawValue is a plain integer load — so this is a compile-time
constant table with no runtime or allocation cost and adds no shared mutable
state (SMP-safe by construction). The file is dependency-free (no MMIO/syscall/
heap), linked first in SWIFT_SRCS and standalone-compilable by the host test.
Migration. Deleted the 15 private let err* in vfs/vfs.swift and the 6
let netErr* in net/socket.swift; both now use Errno.*.code. Inline errno
literals migrated in syscall, sched/futex, user/process, pkg/store,
fs/esp, fs/updatestore, mm/vm (the @_cdecl map/mmap/munmap/mprotect fns
return Int32, so they use .rawValue), tty/tty, drivers/virtio_rng,
crypto/sysrng.
Deliberately left as raw numbers (not errnos / not the errno ABI):
- non-errno numeric returns — sbrk break, time value, resolve-IPv4, and the
mmap base VA encoded in
[-4095,-1]insyscall.swift; the boundary writeframe[0] = UInt(bitPattern: result)is unchanged. - internal sentinels — slot/pid/index
-1("not found / no free slot") inprocess.swift(pickReady/allocSlot/createProcess), thepkg/storefind-helpers and theInt32read-range codes, and the esp/updatestore slot variables. - driver-internal status codes (
-1..-4) invirtio_blk/virtio_input, which callers interpret internally (!= 0) and which never cross the trap as errnos — mapping e.g.-3toESRCHwould be semantically wrong.
Acceptance. New host unit test tests/errno_test.swift (make errno-test,
also wired into make test next to handle_test) pins the exact raw value of
every case — they are ABI — and the .code boundary form. Gates:
make {errno,socket,eventfd,smp}-test PASS; make build clean single-core and
at -smp 4. The grep gate
grep -nE 'let (err|netErr)[A-Za-z]+ *= *-[0-9]+' kernel/vfs/vfs.swift kernel/net/socket.swift
returns nothing.
QW4 — orderly power control: shutdown/reboot, Ctrl+Alt+Del, panic auto-reboot (DONE, 2026-06-18)
Goal. Give the OS a real power-control surface so a headless server can be
cycled cleanly and recovers itself if the kernel wedges: shutdown/reboot
commands, Ctrl+Alt+Del on a real keyboard, and a 90 s auto-reboot after a kernel
panic with an on-screen countdown.
Mechanism — PSCI. All paths funnel through PSCI (the same conduit S1 uses for
CPU_ON), dispatched per the firmware-discovered platform.psciMethod
(HVC on QEMU virt). Two new no-argument wrappers in
kernel/arch/aarch64/io.h — psci_call0_hvc/smc — issue SYSTEM_RESET
(0x8400_0009, warm reboot) and SYSTEM_OFF (0x8400_0008, power off → QEMU
exits). New kernel/power/power.swift holds powerReset/powerOff (each does
vfsSyncAll() first, then the PSCI call), the powerControl(command:) syscall
backing, powerCtrlAltDelReboot(), and panicReboot(seconds:).
Syscall + commands. SYS_REBOOT = 90 (reboot(cmd): 0=reset, 1=off), gated
on capConsole. Userland bridge swiftos_reboot/swiftos_poweroff
(userland/lib/swift_user.{h,c} + sys_reboot inline in syscall.h). Two new
programs /bin/reboot and /bin/shutdown (userland/{reboot,shutdown}.swift),
packed into the base image.
Ctrl+Alt+Del. Implemented on the virtio-input keyboard only — a serial
console is a raw byte stream with no real modifier concept. virtio_input.swift
now tracks Ctrl (evdev 29/97) and Alt (56/100) alongside Shift; Del (111) while
both are held calls powerCtrlAltDelReboot(). (USB HID can hook the same path
once enumeration lands.)
Panic auto-reboot. exceptionHandler (the EL1 fault path — also where Swift
traps land) now ends with panicReboot(seconds: 90) instead of spinning. The
countdown polls CNTPCT_EL0 directly rather than relying on the timer interrupt,
because a panic is taken with IRQs masked (DAIF set on exception entry) and the
kernel may be wedged. Per the project logging policy it does NOT touch the
disk — a faulted kernel must not write to /data — it just prints the countdown
to UART, records to the klog ring, and resets. EL0 (userland) faults are
unaffected: they still kill the process, not the kernel.
Logging. Reboot/poweroff/CAD/panic-countdown events log at warn/panic to the
in-RAM klog ring + UART (no disk writes from the kernel). Serial capture is the
durable record; clean reboot/shutdown additionally sync() before the PSCI call.
Acceptance. New make reboot-test (tests/reboot_test.sh): drives to a root
shell (capConsole) and proves /bin/reboot issues SYSTEM_RESET and the machine
actually resets (boot prompts reach the M7 marker a 2nd time); an unprivileged
user (caps=14) running /bin/reboot is refused and the box does not reset;
/bin/shutdown issues SYSTEM_OFF and QEMU exits on its own. The panic countdown
- reset was verified manually with a temporary EL1 fault injection (5 reboot
cycles observed, since reverted).
make buildclean.
QW5 — rights = intersection on capability transfer (DONE, 2026-06-18)
Goal. Adopt the L4/seL4 delegation rule on IPC handle transfer: an ipc_send
sender can hand its peer fewer rights than it holds by computing
effective = held ∩ requested at transfer time and installing a fresh, attenuated
handle in the receiver — monotonic attenuation, never widening. The IPC twin of the
spawn-time attenuation already at vfs.swift (childEntry.rights = attenuate(...)).
See docs/CAPABILITIES.md §4.2.
ABI. No new syscall. The send msg struct gained a trailing unsigned int requested_rights at offset 20; buf/len/handle_fd keep offsets 0/8/16 untouched, so
the kernel's existing LE parse for those fields is unchanged. ipcSendMsgSize grew
20 → 24. The ipc_send wrapper took a new trailing requested_rights parameter;
the SWIFTOS_RIGHTS_ALL_INHERIT (0xFFFFFFFF) sentinel is the identity intersection
("grant everything I hold"), so every existing caller updated to pass it is
byte-for-byte unchanged in behavior. Found and updated all in-tree callers
(grep ipc_send userland/): forkdemo, c4b_sockxfer, spawndemo, argvdemo, qw2_ipc,
qw4_badge, drvinputd, drvsvcdemo.
Kernel. vfsIpcSend reads requested = Rights(rawValue: le32(m, 20)) and, in the
move-commit block, installs moved.rights = attenuate(moved.rights, to: requested)
into endpoints[ep].handle. The existing .transfer precondition on the source
entry is untouched — the intersection only narrows what crosses and can never conjure
.transfer/.write the sender lacks. vfsIpcRecv is unchanged: it already installs
the endpoint's stored entry into a fresh fd, which is now the attenuated one. All under
the existing vfsLock window (no new globals) → -smp 4 boot unaffected.
Acceptance. New make qw5-rights-intersection-test (tests/qw5_rights_intersection_test.sh
/bin/qw5-rightsxferfromuserland/qw5_rightsxfer.c): a parent opens/dev/zeroO_RDWR (READ|WRITE|TRANSFER), forks, andipc_sends it requesting onlyREAD|TRANSFER(dropping WRITE). The child proves a read succeeds, a write fails (WRITE attenuated away), and the parent proves its source fd was invalidated (move semantics). MarkerQW5: PASS. Hosthandle_test(make c5-device-rights-test) still green;make smp-testgreen.make buildclean. (Pre-existingdocs-testfailures for/bin/{acme,reboot,shutdown}and a few API_REFERENCE bridges are unrelated and predate this milestone.)
QW4 — endpoint badges so one server endpoint can serve many clients (DONE, 2026-06-18)
Goal. Adopt the L4/seL4 badge pattern: a server-chosen UInt32 on the IPC
send-capability (not the endpoint) so one receiving endpoint shared among many
clients can tell them apart with no side-channel identity lookup — the structural
confused-deputy defense in docs/CAPABILITIES.md §4.2. Pure, backward-compatible
addition: unbadged callers (badge == 0) are unchanged.
Where the badge lives.
kernel/vfs/handle.swift:struct HandleEntrygainedvar badge: UInt32 = 0(last init param, defaulted — every existing call site stays valid). The file stays dependency-free sotests/handle_test.swiftstill compiles it stand-alone.kernel/vfs/vfs.swift:struct Endpointgainedvar badge: UInt32 = 0, the per-message carrier.vfsIpcSendcopiesendpoints[ep].badge = sender.badge(the send handle's badge) undervfsLock;vfsIpcRecvwrites it back to a new trailing out-badge VA and clears it on consume.vfsIpcReplyRecvalso clears it on consume so no stale badge leaks to a later recv.resetEndpointSlotForReusezeroes it viaEndpoint().- New
vfsIpcBadge(fd:badge:)stamps a send-end endpoint handle (rejects a non-endpoint / recv-end fd withEINVAL), all undervfsLock.
Syscall number. ipc_badge = 93. (The QW4 prompt said 90, but 90/91/92 were
already taken by reboot/ipc_call/ipc_reply_recv from intervening milestones;
93 is the next free number.)
ABI. The ipc_recv msg struct grew 24→32 bytes with a trailing out_badge VA
(0 = don't report). The kernel always reads 32 bytes and the ipc_recv wrapper
now routes through ipc_recv_badged(..., 0), so kernel and ABI move together and
every caller sends 32 bytes — old 3-arg ipc_recv callers are byte-for-byte
compatible. New wrappers: ipc_badge(fd, badge) and ipc_recv_badged(fd, buf, cap, out_handle_fd, out_badge).
Acceptance. New make qw4-badge-test (tests/qw4_badge_test.sh + /bin/qw4-badge
from userland/qw4_badge.c): two endpoint pairs, a distinct badge stamped into each
send handle (0xA1, 0xB2), and ipc_recv_badged reports each correctly; a third
unbadged send reports 0; badging a recv-end fd is rejected. Markers
QW4-BADGE-{RECVEND-REJECTED,A1,B2,UNBADGED-ZERO}-OK + QW4 OK. Passes single-core
and -smp 4. The existing make ipc-socket-transfer-test (c4b) still passes through
the now-32-byte recv struct (back-compat). Host handle_test extended (fresh entry
badge == 0, stamped entry round-trips). make build clean.
QW3 — endpoint owner-tagging + orphan-zombie reaper, and a PCIe-table teardown leak fix (DONE, 2026-06-18)
Goal. Adopt the L4/seL4 owner-tagging + deterministic reclamation-on-death discipline for IPC endpoints, and stop leaking process slots for orphaned children that are reparented to the kernel and then exit with no waiter.
Part (a) — Endpoint ownerProc + reclamation on death (kernel/vfs/vfs.swift).
struct Endpointgainedvar ownerProc = -1, mirroringDeviceGrant.ownerProc.vfsEndpointCreatestamps the creating process as owner (undervfsLock).releaseEndpointsOwnedBy(slot)(new) is called fromvfsProcessCloseAllafter the FD-close loop, as a deterministic owner-tagged backstop. It funnels through the existingresetEndpointSlotForReuse(preserving the bump-allocatedbufPtrfor reuse) and is idempotent. This is defense-in-depth: the FD-close path already reclaims endpoints whose ends were all FDs of the dying slot. Ownership transfer across IPC/fork is a follow-up (creator-owns is sufficient here).
Part (b) — Orphan-zombie reaper leak (kernel/user/process.swift). The real
leak: when a process P with a live child C was reaped, reapProcess
reparented C to the kernel (pParent = -1); when C later exited,
wakeParent was a no-op (parent -1) and nothing at runtime reaped a
-1-parented zombie, so C permanently consumed one of the 16 maxProc slots
until reboot. Fix:
reapProcessnow reaps already-quiesced zombie children directly (re-scanning, since a reap can recursively reparent/reap descendants) instead of only reparenting; still-live adopted children are reparented to the kernel and flagged in a newpReparentedOrphanarray.schedule()collects a runtime-reparented orphan zombie (pParent == -1 && pReparentedOrphan) the instant it quiesces. The flag gates this so it never races an orchestrator (processRunElf/processRunPair/S5 helpers) waiting on a born-top-level (parent: -1at creation) zombie it reaps itself — those are not flagged. SMP-safe: general EL0 (forks) is CPU0-homed, so the in-scheduler reap fires only on the dispatching CPU under the existing IRQ-mask + quiesce discipline.- No new syscalls; the ABI is unchanged.
processLiveSlotCount/vfsEndpointInUseCountare kernel-internal observability only.
Root-cause fix surfaced by the test (kernel/mm/vm.swift). On non-VirtualBox
boards address_space_create allocates an l0[1] PCIe-64-bit-MMIO-window L1
table (l1pci, the H2 device window mirrored into every space), but
address_space_destroy only walked l0[0] and never freed l1pci — one
leaked frame per address-space teardown. This was pre-existing on main (the
runReclaimDemo self-test was already red, ~8 frames/round across fork/exec/spawn
churn; confirmed on a clean tree before touching it). address_space_destroy now
frees the l0[1] table (guarded on a valid table descriptor, so the VirtualBox
path — which leaves l0[1] empty — is a no-op). This is required for QW3's
frame-baseline assertion and also turns reclaim green again.
Acceptance. New make orphan-reap-test (tests/orphan_reap_test.sh +
userland/orphandemo.c + in-kernel runOrphanReapDemo in kernel/main.swift).
The self-test runs 20 rounds of the orphan scenario — a parent forks a child
(which owns and abandons an IPC endpoint) and exits without waiting, so the child
is reparented to the kernel and later exits — and asserts that live process slots,
PMM frames, and endpoint slots all return to baseline. PASS single-core and
-smp 4 (slots 0→0, frames 60901→60901, endpoints 0→0). reclaim OK (was FAIL);
make smp-test still PASS; make build clean (no new warnings).
QW2 — blocking IPC park/wake (DONE, 2026-06-18)
Goal. Replace the vfsIpcRecv busy-spin with a true L4/seL4-family
rendezvous: a receiver that finds an empty endpoint parks its process slot
in a fixed-size waiter table and is woken directly by ipc_send (or by the
last sender closing), instead of cycling through the run queue on every timer
tick.
Kernel (kernel/vfs/vfs.swift). A module-level
endpointRecvWaiters[maxEndpoints × maxRecvWaitersPerEndpoint] array (4
slots per endpoint, Int32, allocation-free, always under vfsLock) replaces
the busy-loop:
ipcRecordWaiter/ipcClearWaiterSlot/ipcClearEndpointWaiters/ipcWakeWaitershelpers — all undervfsLock.ipcForgetSlot(_ slot: Int)— mirrorsfutexForgetSlot; drops a dying slot from every endpoint's waiter list so a reused slot cannot be spuriously woken.vfsIpcRecvloop — undervfsLock, if no message and senders are alive, recordsprocessCurrentSlot()in the waiter table, callsprocessPrepareBlockOnFutex()(setspBlocked) before releasing the lock, thenvfsUnlock+processYieldAfterPreparedFutexBlock(). This is the same lock-discipline as the futex park/wake backend and closes the lost-wakeup window on SMP. If the per-endpoint waiter table is full, falls back to the oldvfsUnlock + processYieldForIO()path (correctness preserved, no wake guarantee for the overflow case). Spurious wakes are safe — the loop always re-validateshasMsg/sendRefsunder lock.vfsIpcSend— callsipcWakeWaiters(ep)afterendpoints[ep].hasMsg = true(still undervfsLock).releaseDescriptionEOF wake — when a send-end description is released andsendRefstransitions to 0,ipcWakeWaiters(ep)is called; the woken receiver re-checkssendRefs == 0and returnserrPipe.resetEndpointSlotForReuse— callsipcClearEndpointWaiters(ep)so a reused slot cannot inherit stale waiter records.
Process teardown (kernel/user/process.swift). ipcForgetSlot(slot/me)
is called from both teardown paths that already call futexForgetSlot —
processRemoteTerminate and the thread-exit branch in processExit — so a
slot freed before delivery cannot be woken after reuse.
No ABI changes. Same syscall numbers, same message layouts (SEND 20 bytes,
RECV 24 bytes), same userland headers. The change is purely internal and
invisible to userland except that ipc_recv no longer burns CPU while
waiting.
Lock ordering. vfsLock → processRunQueueLock(cpu) (via
markProcessReadyOnHomeCpu inside processWakeFromFutex). The reverse
ordering (processRunQueueLock → vfsLock) does not exist anywhere in the
codebase, so no deadlock is possible.
Acceptance. New make qw2-blocking-ipc-test (tests/qw2_blocking_ipc_test.sh
userland/qw2_ipc.c). Two scenarios exercised at-smp 4:
- Recv-then-send: child prints
QW2-RECV-PARKED, parks onipc_recvbefore any message; parent sleeps 200 ms and sends 5 bytes; child receives and printsQW2-RECV-OK 5. - EOF wake: child closes its own copy of the send end, parks on
ipc_recv; parent closes the send end →sendRefsreaches 0 → child wakes witherrPipe (-32)and printsQW2-EOF-OK. Final markerQW2 OK. Running at-smp 4means a lost cross-CPU wakeup causes the child to hang and the await to time out. PASS at-smp 4;make ipc-socket-transfer-test(C4b) andmake smp-teststill PASS;make buildclean.
QW1 — ipc_call / ipc_reply_recv synchronous request/reply (DONE, 2026-06-18)
Goal. Add the L4/seL4-family call / reply_recv verbs in our 256-byte
byte-message model so a server hot loop is a single ipc_reply_recv per request
(reply to the previous request, block for the next), with caller-blocking and
request/reply correlation done by the kernel via a transient reply port — instead
of the hand-built two-endpoint duplex drvsvcdemo uses. Byte buffers stay; no
register frame, no VMOs/badges/multi-handle transfer (still C4a future work).
Syscall numbers. ipc_call = 91, ipc_reply_recv = 92. (The QW1 prompt
assumed 90/91, but QW4's reboot took 90 first, so the next free pair is 91/92.)
51/52/53 (endpoint_create/ipc_send/ipc_recv) are unchanged.
Reply port (kernel/vfs/vfs.swift). A module-level
replyPorts[maxReplyPorts=16] table — the synchronous-RPC counterpart to the
single-slot Endpoint. Each port's 256-byte buffer is allocated lazily once and
kept attached across free (mirrors allocEndpoint), so the hot path never calls
swiftos_kernel_alloc. The port is named to the server only as a kernel-internal
token (generation << 32) | (index + 1) (0 = "no reply" sentinel), carried to the
receiver in the new Endpoint.replyToken field. decodeReplyPort validates the
token on every reply (in range, inUse, generation-matched), and the reply phase
additionally requires the port to belong to the server's own endpoint and to be
awaiting — so a user cannot forge a token or reply to another caller's port. The
generation is bumped per alloc and persisted across free, so a freed token never
revalidates.
vfsIpcCall(fd, &msg)(modeled onvfsIpcSend+vfsIpcRecv's block loop): validates the send end (.write+.transfer, endpoint send end, liverecvRefs, slot free), validates the optional moved handle, mints a port (errNoSpaceif none free), delivers the request bytes ± moved handle into the endpoint slot exactly asipc_send, stampsreplyToken,ipcWakeWaiters(ep), then parks on the reply port via the QW2 path (processPrepareBlockOnFutexundervfsLock, thenprocessYieldAfterPreparedFutexBlock). On wake it re-validates the port (slot/caller/token), copies the reply (≤reply_cap), installs any replied handle as a new fd, frees the port, returns the reply byte count.vfsIpcReplyRecv(fd, &msg)— reply phase (skipped on token 0, the first turn): validate token + endpoint ownership, copy ≤256 reply bytes + move the optional reply handle into the port, markhasReply,processWakeFromFutexthe parked caller. Receive phase: the same QW2 park/wake loop asvfsIpcRecv, plus it writes the new request'sreplyTokento*out_reply_portso the server can reply next turn.
Lifecycle / failure modes.
- Server death before reply. When the last receiver closes and
recvRefshits 0,releaseDescriptioncallsreplyPortsWakeForEndpointEOF(ep); the woken caller re-checksrecvRefs == 0and returnserrPipe, then frees its own port. - Caller death.
replyPortForgetSlot(slot)(mirrorsipcForgetSlot) is wired into both teardown paths (processRemoteTerminate, thread-exit) and reclaims any port the dying caller parked on, releasing an uncollected replied handle so its description ref balances. A later server reply finds a stale token →errInvalid(clear "gone" state, never a danglingcallerSlot). - Bogus/forged token →
errInvalid; busy single-slot channel →errAgain.
S4b accounting. vfsS4bAccountingSelfTestLocked now counts a reply port's
moved handle toward descRefs exactly as it does an endpoint's in-flight handle,
plus a sanity walk (bufPtr != 0 when inUse, replyLen ∈ [0,256]), so the
refcount invariant stays balanced. docs/SMP_STATE_AUDIT.md covers the new
replyPorts (and the previously-undocumented QW2 endpointRecvWaiters) globals.
ABI. userland/lib/syscall.h adds SYS_IPC_CALL/SYS_IPC_REPLY_RECV (91/92)
and static inline ipc_call / ipc_reply_recv wrappers. The msg structs lead with
the u64 fields so the trailing int needs no struct padding the kernel must skip
(CALL = 44 bytes, REPLY_RECV = 60 bytes, byte-for-byte the kernel's LE parse).
Acceptance. New make ipc-call-test (tests/ipc_call_test.sh +
userland/ipc_call_test.c), at -smp 4: a server child runs the one-syscall
hot loop; the parent issues several ipc_calls and asserts each reply correlates
(reply N correlated), a pipe write end round-trips caller→server→caller
(handle round-tripped), a bogus reply-port token is refused (EINVAL), and a
server that exits without replying fails EPIPE (not a hang/panic). The ping-pong
is self-synchronizing (each side blocks for the other), so no sleeps are needed in
the correlation path. PASS at -smp 4; make smp-test (S4b balanced),
qw2-blocking-ipc-test, orphan-reap-test, ipc-socket-transfer-test still
PASS; make build clean.
Known pre-existing gap (not QW1). make smp-state-audit is red on this branch
independent of QW1: its SMP_STATE_AUDIT.md manifest has not been maintained since
pre-USB/datafs, so the scanner reports ~57 globals missing across unrelated
subsystems (sysrng, usb_xhci, datafs, virtio_gpu, …) plus 2 stale entries. QW1
documents its own state (replyPorts, endpointRecvWaiters); the broader drift
needs a separate doc-sync pass.
SU-series — reflash-free static-site updates (post-M13)
Goal: update the static site swiftos.tech serves (in-kernel nginx) on a running
box without rebuilding swift-os.img and re-flashing the whole image via Rescue
dd. Reuses persistent/data(datafs + fsync), Ed25519/SHA-256, bounded-exec sshd, and the key-baking pattern from image/pkg signing. The site content trust anchor is an Ed25519 signature on the bundle; the trigger is gated by the operator SSH key in the bounded-exec allowlist (no new kernel capability).
SU-A — persistent docroot + boot seed/recovery (DONE, 2026-06-18)
nginx's production docroot moved from the read-only baked /usr/share/nginx/html
to /data/www/current (base/usr/etc/nginx/nginx-prod.conf). A new native Swift
/bin/swupdate (userland/swupdate.swift) provides swupdate seed, run by
swos-init (seed_site()) on every boot before any service:
- Fresh / empty
/data→ recursively copies the baked default site into/data/www/current(fsync), so a freshly-flashed box still serves a site. - Crash recovery of an interrupted atomic swap. Generations live as real dirs
under
/data/www/(current,next,prev) — datafs has no symlinks, and rejectsrenameonto a populated dir (errNotEmpty,vfsRename), so the swap always renames into a fresh name (O(1): a dir's children track it by inode number, unchanged by rename). If a power loss lands between the two swap renames (current→prevdone,nextstaged), the next boot'sseedfinishes it (next→current); else it rolls back (prev→current).
swupdate is freestanding Embedded Swift over NUL-terminated [CChar] / [UInt8]
buffers — it deliberately avoids Swift String, whose ==/interpolation pull in
Unicode-normalization tables that aren't linked in the userland runtime.
Gate make site-seed-test (3 boots, fresh data disk): boot 1 seeds + nginx serves
the baked default byte-for-byte; boot 2 stages a mid-swap crash state; boot 3's
seed recovers it and nginx serves the new content — all on /data, surviving
reboot, no reflash. nginx-data-test still PASS (shared boot path unchanged).
SU-B — signed SWSITE bundle format + offline apply (DONE, 2026-06-18)
A static site is published as a signed SWSITE bundle and applied to a running
box with /bin/swupdate apply-local <bundle.swsite> (the HTTPS-fetch trigger is
SU-C). The trust anchor is an Ed25519 signature; the content never travels as
scp/writable-root.
- Bundle =
[64-byte Ed25519 sig over body][body]. The body header carries magicSWSITE01, version, entry count, string-table/blob offsets, and a SHA-256 over the payload region; then fixed 24-byte entry records (name/blob offsets+lens, type file|dir, mode), a string table of relative path names, and the blobs. Entries are pre-order (a dir precedes its contents). Layout is defined once intools/sitepack.swiftand mirrored byte-for-byte byuserland/swupdate.swift. - Host tool
tools/sitepack.swift(build/sitepack):create <dir> <out> --seedwalks a directory and writes the signed bundle;verify <bundle> --pubkeychecks it. Reuseskernel/crypto/{ed25519,sha256,sha512}.swift. The site-signing keypair ismodels/dev-site-signing.{seed,pub}(minted bymodelsign keygen, like the image key); the public half is baked at/etc/swupdate/site-root.pub. - Apply (
swupdate apply-local, links the same crypto): verify Ed25519 against the baked pubkey → verify payload SHA-256 → bounds + inode-budget check (maxSiteEntries = 64, since current+next+prev ≈ 3× the site against datafs's 256 inodes) → reject any unsafe (../absolute) entry name. Only then unpack into/data/www/next(fsync) and atomically swap (current→prev,next→current, sync). A bad bundle is refused beforenextis touched, socurrentis never disturbed.
Gate make site-bundle-test (image built INCLUDE_SITE_TEST=1, which bakes a signed
test bundle + a tampered copy under /usr/share/swupdate-test; production images carry
neither): a tampered bundle is rejected and the docroot stays byte-identical to the
baked default; a valid bundle is applied and nginx serves the new content; the new
content survives reboot. Assertions are over curl (QEMU serial stdout is buffered);
applies are backgrounded so the slow console can't swallow a queued command.
SU-C — HTTPS fetch + the operator SSH path (DONE, 2026-06-18)
swupdate site <https-url> pulls a SWSITE bundle over TLS 1.3 and applies it, so an
operator updates the site with a single SSH command:
ssh root@box /bin/swupdate site https://host/site.swsite
- swupdate links the same TLS 1.3 stack as
/bin/tlsget(TLS_SWIFT_SRCS) plus ed25519+sha512. It parseshttps://host[:port]/path(byte-wise — still no Swift String), resolves a literal IPv4 directly or a name viaswiftos_resolve, drives the sans-IOTLS13Clientover the socket (handshake → GET → read+decrypt the whole response), strips the HTTP headers, and feeds the body to the SU-BapplyBundleBytes. - Trust split. The trigger is gated by the operator SSH key — bounded-exec sshd
already allows
/bin/*andparseExecArgvforwardssite <url>as argv. The content is gated by the Ed25519 signature. TLS is MITM-open (cert unverified), which is acceptable because the signature is the authenticity anchor: a MITM serving a different bundle fails verify. Documented as such.
Gate make site-update-test: boots with a host HTTPS server (python, self-signed,
reached at 10.0.2.2 via slirp — same pattern as acme-mock-test), drives the
console past the tty demo, logs in and starts nginx, then runs swupdate site over a
pinned-key OpenSSH exec (host known_hosts derived from the baked host seed). A
tampered URL is rejected (ssh exits nonzero, docroot unchanged); a valid bundle is
fetched, verified, swapped in, and served within seconds; the update survives reboot.
QEMU can't catch every HW path, so swupdate site should also be run on the real box.
User-facing docs: swupdate in docs/COMMAND_REFERENCE.md, sitepack in
docs/HOST_TOOL_REFERENCE.md, and the operator runbook "Update The Hosted Static
Site (Reflash-Free)" in docs/UPDATE_GUIDE.md.
SU-T — fast host coverage for the SWSITE trust path (DONE, 2026-06-19)
SU-A/B/C shipped with QEMU acceptance gates only (site-{seed,bundle,update}-test),
which are slow, not in make test, and only exercise the happy path plus a
signature flip. The trust-critical parsing — the byte-for-byte SWSITE layout shared
between the host packer and the on-box reader, and the path-traversal defense — had
no fast, hostile-input coverage. Two additions close that, both host-only (no QEMU,
sub-second, wired into make test):
make sitepack-test(tests/sitepack_test.swift) is an INDEPENDENT third implementation of the SWSITE reader: it packs the fixtures withsitepack create, re-parses the bundle from scratch, and reconstructs the tree byte-for-byte — catching any drift from the layoutswupdatereads. It then drivessitepack verifyagainst a flipped signature, a flipped payload byte, the wrong pubkey, and a truncated file, asserting each is rejected.make swsite-test(tests/swsite_test.swift) unit-tests the device-side parsers directly. To make them testable without the syscall/crypto/TLS deps, the pure logic moved out ofuserland/swupdate.swiftinto a new freestanding moduleuserland/lib/swsite.swift(added toSWUPDATE_SWIFT_SRCS): the layout/ inode-budget validatorswsiteParseEntries,safeName,le32/magicMatches, and the SU-CparseHTTPSURL/parseIPv4Bytes/httpBody.applyBundleBytesnow callsswsiteParseEntriesand maps itsSWSiteLayoutErrorto the same operator messages, so behavior is unchanged. The test hits hostile input the integration tests never produce:..//absolute/..-component entry names (rejected as.unsafeName), entry counts of 0 and > the 64 inode budget, offsets that run past the buffer, malformed/non-https/bad-port URLs, and non-200 / headerless HTTP.
The on-box behavior path (apply-local/site) is still covered by the SU-B/SU-C
QEMU gates; SU-T only adds fast pure-logic coverage underneath them. Re-run
make build && make site-bundle-test site-update-test once on the embedded
toolchain to confirm the swsite.swift split still links and boots.
TH-series — aggressive coverage of untested trust boundaries (post-M13)
A three-agent audit of test coverage (QW-series + kernel core + concurrency/ durability/drivers/net) found that the suite proves the code works on the happy path but barely exercises adversarial/negative input. The TH-series adds fast host unit tests (and, where they surface a real bug, the minimal fix) for the highest-risk untested boundaries, one milestone at a time.
TH1 — ELF loader + copyin/copyout hostile-input coverage (DONE, 2026-06-19)
The EL0 trust boundary — kernel/user/elf.swift:elfLoad (parses attacker-supplied
ET_EXEC images from disk/the package store) and kernel/user/user_access.swift
(every syscall's copyin/copyout guard) — had zero negative tests; both ran only
on trusted binaries / well-behaved processes inside QEMU.
- Real bug found + fixed in
elfLoad. Three bounds useda + b > size-style checks wherea/bare attacker-controlled u64 fields (e_phoff + table,p_offset + p_filesz,p_vaddr + p_memsz). Embedded Swift's+traps on overflow, so a crafted ELF with a near-UInt.maxfield crashed EL1 — a DoS on any box handed a bad binary. Rewrote all three overflow-safe (compare against the remaining space; guardpVaddr+pMemszbefore formingvaEnd). Behavior for valid images is unchanged. tests/elf_loader_test.swift(make elf-loader-test) linkselfLoadagainst a fake address space + PMM (real host frames, so copy-to-user actually writes) and asserts: a valid image loads (entry, page count, perms, bytes copied), and reject for truncated/bad-magic/ELFCLASS32/big-endian/non-ET_EXEC/wrong-machine, phdr table past EOF,p_fileszpast EOF,filesz>memsz, the three integer-overflow fields, PMM exhaustion, plus the "executable wins" shared-page upgrade and an empty PT_LOAD skip. Proven non-vacuous: built against the pre-fix loader it dies with SIGTRAP on the overflow cases.tests/user_access_test.swift(make user-access-test) pins the copyin/copyout guards against a fake mapping: kernel-range / low-device / past-window VAs,count > Int.max, ranges overrunning the window, unmapped pages, a range straddling a mapped→unmapped boundary, the writable/COW-resolve path, anduserCStringNULL/ bad-maxLen/kernel-range — all without dereferencing a fake VA. (These guards were already correct; the test is regression armor.)
Both are host-only (sub-second, wired into make test). Validated on the embedded
toolchain: make build compiles the fixed loader and console-login + swift- coreutils boot tests still exec real ELFs. Remaining audit findings (datafs crash
injection, A/B wrong-key, IPC capability boundaries, SMP atomics, futex/signal,
driver malformed-device, DNS pointer-loop, panic-loop guard) are queued as later
TH milestones.
TH2 — QW5 capability attenuation: the escalation direction (DONE, 2026-06-19)
qw5_rights_intersection_test only proved the downgrade direction — a sender
holding READ|WRITE|TRANSFER granting only READ|TRANSFER. That shows narrowing
works, not that widening is impossible, which is the actual security property
(monotonic attenuation: moved.rights = attenuate(held, to: requested) =
intersection, kernel/vfs/vfs.swift:3008). A bug that honored requested directly,
or flipped intersection to union, would have passed the old test.
userland/qw5_rightsxfer.c now runs three scenarios through one helper and only
prints QW5: PASS after all three pass:
- downgrade — hold R|W|T, request R|T -> receiver loses WRITE (unchanged);
- escalation — hold only R|T (open
/dev/zeroO_RDONLY), request R|W|T -> WRITE must NOT appear (a right the sender never held can't be conjured); - all-inherit — hold only R|T, request
SWIFTOS_RIGHTS_ALL_INHERIT(0xFFFFFFFF) -> receiver still gets only R|T. Each receiver asserts a read of /dev/zero succeeds but a write is DENIED (the kernel checks the WRITE right before /dev/zero's accept-everything write), and the sender's source fd is invalidated (move, not copy). O_RDONLY yields READ|TRANSFER (posixRights always adds .transfer, so the move is permitted; vfs.swift:1256). Non-vacuous: if escalation leaked WRITE the child printsQW5: FAIL ... rights were widenedandQW5: PASSnever appears. Verified in QEMU (make qw5-rights-intersection-test, PASS). Next IPC milestones: QW1 reply-port double-reply/forgery/generation-after-free, QW4 stale-badge-on-reuse.
TH3 — QW1 reply-port: a double reply to a used token is rejected (DONE, 2026-06-19)
ipc_call_test proved a bogus reply-port token (0xDEADBEEF, out of range) is
rejected EINVAL, but not the sharper capability case: a real, previously valid
token replayed after it was already answered. That is what the generation counter
!hasReplyguard exist to stop (decodeReplyPort + vfs.swift:3334) — a server must not be able to reply twice, nor reuse a consumed/stale token to wake a caller a second time.
userland/ipc_call_test.c gains Scenario 4: a dedicated server receives req1
(captures tok1), replies to it while receiving req2 (tok2), then attempts a
second reply to tok1 and asserts it returns EINVAL (the reply phase refuses it
before blocking), then replies to tok2 so the caller is released and exits with the
verdict. The caller issues two correlated ipc_calls; the sequence is
self-synchronizing (each call blocks for its reply), so it is robust at -smp 4.
Non-vacuous: a honored double reply makes the server print double-reply NOT rejected / the caller print double-reply scenario FAILED, both caught by the
test script, and IPC-CALL OK never appears. Verified in QEMU at -smp 4 (make ipc-call-test, PASS). Remaining IPC items: cross-endpoint reply (the endpoint == ep guard), generation-after-slot-reuse, and QW4 stale-badge-on-reuse.
TH4 — QW4 badge: per-message tracking + slot-reuse hygiene (DONE, 2026-06-19)
qw4_badge_test proved badges distinguish clients, but used three separate
endpoint pairs — so it never exercised the badge's lifecycle on a single endpoint:
the per-message update/clear and the freed-slot reset (endpoints[ep].badge = 0 on
recv at vfs.swift:3102/3397; Endpoint() zeroing on slot reuse). userland/ qw4_badge.c adds two single-endpoint checks:
- mixed — re-stamp one send handle A1 -> 0 -> B2 and confirm each recv reports the current badge (catches a "sticky" endpoint badge that fails to update or clear between messages);
- reuse — badge an endpoint, exchange a message, close it (freeing its slot),
then create a fresh endpoint (which reuses the slot) and confirm an unbadged
send reports 0 (the freed slot's badge must not bleed into its reuse).
New markers
QW4-BADGE-MIXED-OK/QW4-BADGE-REUSE-CLEAN-OKare asserted by the test script; a sticky or bled badge makes the program exit before printing them and the run fails. Verified in QEMU (make qw4-badge-test, PASS).
IPC/capability track complete (TH2 QW5 escalation, TH3 QW1 double-reply, TH4 QW4 badge). Still open from the audit: cross-endpoint reply + generation-after-reuse (QW1), and the non-IPC tracks — A/B wrong-key, datafs crash injection, SMP atomic contention, futex/signal races, driver fault injection, DNS pointer-loop, and the panic-loop guard.
TH5 — signed base image: a valid signature by the WRONG key is refused (DONE, 2026-06-19)
signed_image_test proved a base image is refused when a SIGNED byte is flipped
(Case A) and a file is rejected when its payload is flipped (Case B) — but both only
break well-formedness. Neither tested the actual forgery threat: an image that is
well-formed and carries a valid, internally consistent Ed25519 signature, just by
a key the kernel does not trust. A trust anchor that checked only signature
well-formedness (not the key) would have shipped undetected.
New Case C re-packs the SAME build/base-root with basepack under a random
attacker seed (dd /dev/urandom, 32 bytes — any 32 bytes is a valid Ed25519 seed),
producing a valid v3 signed image by an untrusted key. It asserts the forged image
differs from the trusted base.img (the seed took effect), boots it, and requires
vfs: base image signature INVALID with no M11c: read-only base mounted marker —
the kernel's compiled-in trust root (trust_root.S incbin of image_trust_root.bin)
rejects the wrong key exactly like a corrupt signature. Standalone make signed-image-test added (deps build + base-image, which provide basepack +
base-root). Verified in QEMU, PASS. Remaining audit tracks: datafs crash injection,
SMP atomic contention, futex/signal races, driver fault injection, DNS pointer-loop,
panic-loop guard.
TH6 — panic auto-reboot loop guard (real fix + test, DONE, 2026-06-19)
panicReboot (kernel/power/power.swift) auto-rebooted a faulted kernel forever:
nothing counted consecutive panic-reboots, so a kernel that faults again before it
finishes booting would PSCI-reset → fault → reset … in an invisible loop. The audit
(G1) flagged the missing guard; this adds it and proves it end to end.
- Fix. A small cookie (magic + count) in a fixed, reset-surviving RAM cell at
0x4007_0000— the gap between RAM base and the kernel image (PHYS_BASE = ramBase + 0x80000), below the-kernelreload and below the PMM-managed region, so neither the image reload nor the allocator clobbers it.panicRebootbumps the count (flushed past the cache withdc_cvac/dsb_syso it survives on real caching HW too); once it reachesmaxConsecutivePanicReboots(3) it HALTS for an operator instead of resetting.panicLoopMarkHealthyBoot()clears the counter at the steady-state milestone (start ofrunInit), so an isolated post-healthy runtime fault reboots-and-recovers as before — only a tight pre-healthy loop trips the limit. The cell is RAM-only: a faulted kernel still never touches disk. - Test.
make panic-loop-testbuilds a test-only kernel variant via a recursive make withEXTRA_SWIFT_DEFS="-D PANIC_LOOP_INJECT"(a new empty-by-default knob in SWIFT_FLAGS; production kernel.elf is untouched and carries no injector). The#if PANIC_LOOP_INJECThook faults on every boot, early in kernelMain (after PSCI/MMU/heap, before any interactive stage).tests/panic_loop_test.shboots it WITHOUT-no-rebootso PSCI SYSTEM_RESET actually warm-resets (one QEMU process, serial accumulates) and asserts exactly 3 injections then the halt marker — which also proves the cookie survives the warm reset (otherwise the count would never accumulate and it would loop forever). Verified: panic-loop-test PASS, production build + console-login boot still PASS (the guard/healthy-reset don't perturb a normal boot). Remaining audit tracks: datafs crash injection, SMP atomic contention, futex/signal races, driver fault injection, DNS pointer-loop.
TH7 — DNS compression-pointer DoS: verified safe by design + armored (DONE, 2026-06-19)
The audit (P2 net) flagged that dnsSkipName "could infinite-loop on a
compression-pointer cycle" — the classic DNS decompression-bomb DoS. On inspection
this is a false alarm for our implementation: the in-kernel resolver
(kernel/net/dns.swift) never follows compression pointers — dnsSkipName treats a
0xC0 pointer as a 2-byte terminator and returns, and dnsParseResponse locates the
A record by walking fixed-size answer records, reading the RDATA directly. Every
loop strictly advances (label by ≥1, answer index by 1) or returns -1 on overrun, so
a cyclic/forward pointer can never loop. No bug, no fix — but the property was
untested.
Added four adversarial cases to tests/net_test.swift (section 24–27) that pin it:
a question name that is a self-referential pointer (the test merely completing
is the proof it terminates — a follow-the-pointer parser would hang here) which must
still find the A record after it; a label whose length runs past the message end; a
reserved-bits label (0x80, not a pointer, len > 63); and a compression pointer
truncated to one byte at the end — each must return 0, not hang or over-read. Proven
non-vacuous (a forced-wrong expectation makes check print FAIL). Host-only, already
in make test. Remaining audit tracks: datafs crash injection, SMP atomic
contention, futex/signal races, driver fault injection.
TH8 — direct-futex boundary probe + an unreachable-boundary finding (DONE, 2026-06-19)
pthread already drives futex on the happy path; nothing tested SYS_FUTEX directly
at its boundaries. userland/futexprobe.c (make futex-test, run at -smp 4 so
the wait/wake handoff crosses CPUs) covers, via raw svc syscalls:
- val-mismatch fast path — FUTEX_WAIT with
*uaddr != valreturns 0 at once, never blocking; - wake-empty — FUTEX_WAKE on an address with no waiters wakes nobody (0), never faults;
- multi-waiter wake / no lost wakeup — N threads FUTEX_WAIT on one word, then setting it + one FUTEX_WAKE releases every one. Robust by construction: the word is set before the wake, so a not-yet-parked waiter still exits via the fast path, and the join can only hang if a genuinely-parked waiter's wakeup is LOST.
Finding — the 16-slot queue-full EAGAIN path is effectively unreachable.
futexWaitOn returns EAGAIN when its 16-entry wait table is full, but a probe that
tries to fill it discovered pthread_create fails at ~the 13th thread:
maxProc = 16 (kernel/user/process.swift) caps total processes/threads, and the
live system already holds several, so you exhaust thread slots before the 16-slot
futex table — the EAGAIN branch is defensive/dead under the current cap. Recorded
rather than tested; if maxProc ever rises above maxFutexWaiters, add the
oversubscription case. Verified make futex-test PASS at -smp 4. Remaining audit
tracks: datafs crash injection, SMP atomic contention, signal races, driver fault
injection.
TH9 — multi-signal default-terminate coverage + a corrected audit claim (DONE, 2026-06-19)
signal_test exercised the default-terminate path for exactly one signal
(SIGTERM). The audit worried that "only SIGINT/TERM/PIPE are delivered;
SIGSEGV/SIGKILL etc. are defined but never delivered." Reading processKill
(kernel/user/process.swift) shows that claim is too strong: kill(otherPid, sig)
to a process that is NOT currently on-CPU takes a DIRECT teardown
(pExit = 128 + sig) for ANY valid signal — the SIGINT/TERM/PIPE restriction is
only on the async-pending delivery path (signalDeliverToForeground/CurrentFrame,
i.e. Ctrl-C and raise-to-foreground). So kill(child, SIGKILL|SIGSEGV) already
terminates correctly.
userland/signalprobe.c now forks a nanosleeping child and kills it with SIGINT,
SIGKILL, and SIGSEGV in turn, asserting each yields WIFSIGNALED with
WTERMSIG == sig (the 128+signo status). Non-vacuous: a signal that failed to
terminate would hang waitpid and the probe would never print SIGNALPROBE-OK.
Verified make signal-test PASS. An exploratory in-kernel change to add SIGKILL to
the async-delivery list was reverted — it has no reachable test (the direct-teardown
and self-kill paths already handle SIGKILL), so shipping it would be an unverified
no-op. Genuinely open signal items (left for a focused milestone): kill of a
process running on ANOTHER CPU returns EBUSY (no remote teardown), and per-process
(vs process-global) dispositions. Remaining audit tracks: datafs crash injection,
SMP atomic contention, driver fault injection.