SwiftOS Performance And Sizing Guide
This guide explains how to reason about current SwiftOS performance, resource usage, sizing, and benchmark evidence. It is written for operators, application authors, package maintainers, and reviewers who need to distinguish supported measurements from roadmap goals.
SwiftOS is still QEMU-first and serial-first. Many current performance numbers are useful for regression detection and relative comparison, not for production capacity promises. Treat QEMU TCG results, especially AI inference throughput, as correctness and integration evidence unless a test explicitly defines a performance guard.
Use this guide with:
- Observability Guide for available signals.
- Testing Guide for validation gates and failure reading.
- Networking Guide for network test profiles.
- AI Hosting Guide for
/bin/llmdhealth and metrics. - Service Guide for service readiness and authoring.
- Configuration Reference for QEMU and build knobs.
- Risk Remediation Roadmap for SMP and observability hardening.
Current Performance Model
| Area | Current reality |
|---|---|
| Primary runtime | QEMU virt on AArch64 |
| Default memory | 256 MiB in documented QEMU profiles |
| CPU model | Single-core default; SMP hardening includes gated S5f run-any EL0 placement tests |
| User programs | Static EL0 binaries |
| Storage | Read-only packed base image plus RAM-backed /tmp |
| Networking | virtio-net with QEMU user networking in tests |
| AI inference | CPU TinyStories inference paths under QEMU |
| Persistent metrics store | Not implemented |
| Production capacity claims | Not established yet |
The most important rule: compare measurements within the same host, QEMU version, build mode, memory size, CPU count, and boot profile. Do not compare a QEMU TCG number from one laptop with a different host and call it a product capacity limit.
What You Can Measure Today
| Signal | How to read it | Useful for |
|---|---|---|
| Boot markers | Serial log or ./tests/boot_test.sh |
Boot regressions and milestone health |
| Memory and CPU snapshot | /bin/top -b -n 1 |
Current process and system resource view |
| Process list | /bin/ps |
Which programs are alive |
| HTTP readiness and status | Host curl plus serial log |
Service availability |
| LLM health | GET /health |
Bundle and model readiness |
| LLM request metrics | GET /metrics and serial llmd: served ... |
Relative serving regression checks |
| Network throughput guard | tests/net_zero_copy_throughput_test.sh |
Bounded HTTP burst regression guard |
| Full acceptance suite | make test |
Broad functional regression evidence |
Choose A Measurement Package
Start with the claim you want to make. A useful performance note names the claim, the environment, the focused proof, and the boundary of what the result does not prove.
| Claim or concern | Collect | Focused proof | Do not claim |
|---|---|---|---|
| Boot still reaches the product baseline | Serial log, git log -1 --oneline, QEMU command |
./tests/boot_test.sh |
Real hardware boot time |
| Artifact footprint changed | ls -lh build/kernel.elf build/kernel.bin build/base.img build/swift-os.img |
make build base-image or make disk base-image |
Runtime memory pressure from size alone |
| Memory use changed | top -b -n 1 before and after the workload |
Workload test plus ./tests/top_test.sh |
Persistent memory trend or production capacity |
| HTTP path regressed | Host request output, serial markers, host/QEMU version | ./tests/httpd_test.sh, bash ./tests/net_zero_copy_throughput_test.sh |
Real NIC throughput |
| LLM serving changed | /health, one /completion, /metrics, serial llmd: served ... line |
./tests/llm_serve_test.sh |
Hardware-independent tokens/sec |
| Local inference changed | Model generation, serial output, optional before/after top |
./tests/llm_run_test.sh |
Production LLM throughput |
| SMP-sensitive behavior changed | SMP_CPUS, boot profile, per-CPU top output, S5 markers |
make s5-run-any-placement-test or active SMP gate |
Completed load-balancing policy |
| Package footprint changed | .swpkg, payload image, package-store image sizes, installed paths |
Matching package fixture and install test | Persistent package upgrade/rollback behavior |
| Documentation performance claim changed | Exact guide section and validation command | git diff --check, make docs-test |
Any behavior not covered by the cited test |
Example HTTP regression evidence:
git log -1 --oneline
qemu-system-aarch64 --version
./tests/httpd_test.sh
bash ./tests/net_zero_copy_throughput_test.sh
Example LLM serving evidence:
./tests/llm_serve_test.sh
curl -fsS http://127.0.0.1:8080/health
curl -fsS -X POST --data "Once upon a time" http://127.0.0.1:8080/completion
curl -fsS http://127.0.0.1:8080/metrics
What Not To Claim Yet
Current SwiftOS documentation should not claim:
- Production multi-core throughput.
- Real hardware performance outside documented QEMU and VirtualBox notes.
- Persistent storage write performance.
- Production TLS trust performance or certificate validation cost.
- Production LLM serving capacity.
- Stable API-level service-level objectives.
- A completed scheduler load-balancing policy.
- A finished per-cell resource accounting model.
Those are roadmap topics. Current docs may describe working smoke paths, acceptance criteria, and relative guardrails.
Baseline Host Profile
Record the test environment before sharing results:
git log -1 --oneline
git status --short --branch
qemu-system-aarch64 --version
make tools-check
Also record:
- Host machine and CPU model if relevant.
- QEMU acceleration mode if known.
- Memory size passed to QEMU.
SMP_CPUSvalue.- Boot path: direct
-kernel, UEFI disk, graphical smoke, or VirtualBox. - Whether model, package, base image, or disk artifacts were rebuilt.
- The exact test or manual command.
Use Support Guide when handing results to another person.
Fast Smoke Versus Performance Guard
SwiftOS tests use two different ideas:
| Test kind | Meaning |
|---|---|
| Smoke test | Proves behavior is present and coherent |
| Guard test | Fails when a bounded performance-sensitive path regresses too far |
Examples:
| Command | Kind | Notes |
|---|---|---|
./tests/boot_test.sh |
Smoke | Verifies required boot and userland markers |
./tests/httpd_test.sh |
Smoke | Verifies HTTP behavior and concurrency |
./tests/llm_serve_test.sh |
Smoke plus metrics | Verifies serving and exposes relative metrics |
bash ./tests/net_zero_copy_throughput_test.sh |
Guard | Bounded concurrent HTTP burst and network path marker |
make test |
Full regression suite | Functional confidence across many areas |
A smoke test passing does not establish a throughput target. A guard test establishes only the bounded condition encoded in the test.
Inspect CPU And Memory
Inside the guest, use batch mode for reproducible logs:
top -b -n 1
top -b -n 2 -d 1
top reads SYS_SYSINFO and SYS_PROCSTAT and reports:
- Uptime.
- Task count.
- Aggregate CPU busy/idle percentage.
- Discovered CPU count and per-CPU busy percentages.
- Total and free memory.
- Kernel image and heap footprint.
- Per-process state, principal, CPU time, and resident bytes.
Use ps for a simpler process list:
ps
ps -f
ps aux
Current limits:
- Process count is small and fixed by current kernel tables.
- There is no persistent historical metrics database.
- There is no per-cell view yet.
- S5 per-CPU utilization is an observability signal; it is not a completed load-balancing or capacity contract.
- CPU numbers under QEMU are relative indicators, not production capacity promises.
Boot And Image Sizing
Useful artifact sizes:
ls -lh build/kernel.elf build/kernel.bin build/base.img build/swift-os.img
Use this after changes that affect:
- Kernel source.
- Userland programs staged into
/bin. - Base files under
base/. - Model bundles.
- Package payloads.
- UEFI loader or disk layout.
Relevant gates:
make build
make base-image
make disk
./tests/boot_test.sh
UEFI_BOOT=disk ./tests/uefi_boot_test.sh
If a change increases artifact size, explain why in the review or release note. SwiftOS prioritizes small trusted core and lightweight static images.
Memory Sizing
The common QEMU profiles use:
-m 256M
This is enough for the checked-in boot, userland smoke programs, networking tests, package tests, and TinyStories inference paths. Larger future workloads such as Node.js, JVM, database ports, and full Swift runtime support will need deliberate sizing work.
When testing memory-sensitive changes:
- Record QEMU memory size.
- Capture
top -b -n 1. - Run the focused workload.
- Capture
top -b -n 1again. - Run the focused acceptance test.
- If the change touches shared memory management, run
make test.
Useful checks:
./tests/mmap_test.sh
./tests/cow_test.sh
./tests/threads_test.sh
./tests/top_test.sh
Network Performance
Current network performance evidence is QEMU slirp based. Use it for regression detection and protocol-path confidence, not real NIC capacity claims.
Functional network checks:
./tests/virtio_net_test.sh
./tests/httpd_test.sh
./tests/tcp_echo_test.sh
./tests/udp_echo_test.sh
./tests/tcp_connect_test.sh
./tests/dns_test.sh
./tests/tls_test.sh
Throughput guard:
bash ./tests/net_zero_copy_throughput_test.sh
That guard boots /bin/httpd, sends a bounded concurrent HTTP burst from the
host, verifies all responses, checks that the run completes within the script's
limit, and asserts the kernel reported the expected network zero-copy/batched
path marker.
Environment override:
NET_ZC_HOST_PORT=18082 bash ./tests/net_zero_copy_throughput_test.sh
When reporting network performance, include:
- Host QEMU version.
- Host port and guest port.
- Test script name.
- Number of requests and concurrency when relevant.
- Serial marker lines.
- Whether the result came from QEMU slirp or another backend.
Service Performance
For simple services, use readiness markers first, then one or more host requests.
HTTP server:
./tests/httpd_test.sh
LLM server:
./tests/llm_serve_test.sh
Manual LLM service metrics:
curl -fsS http://127.0.0.1:8080/health
curl -fsS -X POST --data "Once upon a time" http://127.0.0.1:8080/completion
curl -fsS http://127.0.0.1:8080/metrics
Expected metric keys:
requests
tokens_total
last_ttft_ms
last_tok_s
Use last_ttft_ms and last_tok_s for same-host regression comparisons. Do
not present them as hardware-independent service-level objectives.
AI Inference Performance
The TinyStories inference paths prove:
- Native EL0 inference can run.
- Model data can live in the base image.
- Verified model-bundle generations can be selected and rejected.
/bin/llmdcan serve HTTP completions and expose metrics.
They do not prove production LLM throughput. Under QEMU TCG, inference is expected to be slow.
Focused checks:
./tests/llm_run_test.sh
./tests/llm_serve_test.sh
When comparing LLM changes:
- Use the same host and QEMU version.
- Use the same model generation.
- Warm up with one request if measuring steady-state behavior.
- Record
/metrics. - Keep the serial
llmd: served ...line. - Confirm bundle verification still happens.
SMP And CPU Count
The default product contract is still conservative around broad multi-core EL0
execution. SMP foundations and hardening work exist, several tests run with
SMP_CPUS=4, and S5f proves a gated run-any EL0 placement policy, but
performance claims should still reflect the active roadmap state.
Useful gates:
SMP_CPUS=4 ./tests/smp_boot_test.sh
SMP_CPUS=4 UEFI_BOOT=disk ./tests/uefi_boot_test.sh
make s1-test
make s4-resource-stress-test
make s5-run-any-placement-test
When recording CPU-count-sensitive results:
- Include
SMP_CPUS. - Include whether direct or UEFI boot was used.
- Include boot markers for CPU discovery and online state.
- Avoid claiming production load balancing or throughput scaling. S5f proves placement coverage in a gated acceptance path, not a complete CPU policy.
Package And Image Footprint
Packages let you test optional software without permanently growing the base image.
Useful commands:
make package-fixture
make package-overlay-test
make package-store-test
make package-local-install-test
Record:
.swpkgsize.- Payload image size.
- Package store image size.
- Installed paths.
- Guest command used for proof.
- Whether base image size changed.
Use packages for optional tools when they do not need to be boot-critical.
Suggested Measurement Recipes
Boot Smoke With Artifact Sizes
make build base-image build/virt.dtb
ls -lh build/kernel.elf build/kernel.bin build/base.img
./tests/boot_test.sh
Resource Snapshot Around A Command
Inside the guest:
top -b -n 1
/bin/llm
top -b -n 1
For automated LLM evidence, prefer:
./tests/llm_run_test.sh
HTTP Service Regression Check
./tests/httpd_test.sh
bash ./tests/net_zero_copy_throughput_test.sh
LLM Serving Regression Check
./tests/llm_serve_test.sh
For a manual run with metrics, boot the network profile, start /bin/llmd, and
run:
curl -fsS http://127.0.0.1:8080/metrics
Reporting Template
When reporting a performance observation, include:
Revision:
Host:
QEMU version:
Boot path:
QEMU memory:
SMP_CPUS:
Artifact sizes:
Command or test:
Result:
Serial markers:
Comparison baseline:
Notes:
Example:
Revision: 68b140f docs: add SwiftOS concepts guide
Boot path: direct -kernel
QEMU memory: 256M
SMP_CPUS: 1
Command or test: bash ./tests/net_zero_copy_throughput_test.sh
Result: PASS, 32 HTTP requests completed inside guard limit
Serial markers: net-zc OK: ...
Comparison baseline: previous local main on same host
Performance Review Checklist
Before merging a performance-sensitive change:
- State what resource or latency path should improve or stay bounded.
- Run the focused functional test.
- Run the relevant performance guard if one exists.
- Capture
top -b -n 1when memory or CPU behavior matters. - Record artifact sizes when image footprint changes.
- Run
make testfor shared kernel, VFS, networking, scheduler, package, or ABI changes. - Update docs if the user-visible expectation changes.
Roadmap Boundaries
Future performance work includes:
- Production SMP load balancing and CPU policy.
- Per-cell and per-service resource accounting.
- Persistent metrics export.
- Stronger service supervisor health and restart metrics.
- Real hardware performance profiles.
- Production package repository and update metrics.
- Larger runtime targets such as native Swift apps, Node.js, and the JVM.
Until those land, keep performance claims precise: name the artifact, test, host, QEMU profile, and evidence.