Skip to content

ADR-003: Deployment & OTA Update Strategy

Accepted (2026-03-02)

paiOS runs on embedded AI devices (e.g., RK3588-based boards) deployed in the field, often without physical access. These devices need a reliable and secure mechanism for receiving software updates. Key requirements:

  1. Atomic updates: no partial update states that could brick devices
  2. Automatic rollback: failed updates revert to a known-good state without user intervention
  3. Offline-capable: updates must work without internet (e.g., via USB), not only over the air
  4. Minimal downtime: updates applied in background, single reboot to activate
  5. Security: cryptographically signed images to prevent malicious firmware injection
  6. Build integration: seamless creation of update bundles from the OS building tool (ADR-002)

We adopt RAUC (Robust Auto-Update Controller) for A/B partition-based atomic updates with Ed25519 image signing.

This ADR covers embedded hardware deployments where paiOS controls the full OS image (e.g., paiBox, paiScribe). For users running just the paiEngine in a Docker container (e.g., on a home server or cloud VM), updates follow Docker’s native mechanism (docker pull + restart), which handles atomicity and rollback via image tags out of the box.

DeploymentUpdate mechanismManaged by
Embedded hardware (paiBox, paiScribe)RAUC A/B partition swap (this ADR)paiOS
Docker container (engine-only)docker pull + restartDocker / user

The rest of this ADR focuses on the embedded case.

The device storage is partitioned into two root filesystem slots (A and B). One slot is active (booted), the other is inactive (target for updates). A persistent /data partition holds user data, AI models, and configuration across updates.

┌─────────────────────────────────────────────┐
│ Device Storage │
├──────────┬──────────┬──────────┬────────────┤
│ Boot │ Slot A │ Slot B │ Data │
│ (U-Boot) │ (rootfs) │ (rootfs) │ (/data) │
│ │ [active] │ [target] │ [persist] │
└──────────┴──────────┴──────────┴────────────┘

Why A/B over package-based updates (APT)?

CriteriaA/B (RAUC)Package-based (APT)
Atomicity✅ Full image swap❌ Partial state possible
Rollback✅ Hardware-level, instant❌ Complex, unreliable
Reproducibility✅ Identical to build output⚠️ Drift over time
Downtime⚠️ Reboot required✅ Live updates possible
Disk usage⚠️ 2× rootfs space✅ Single rootfs

For embedded devices where reliability trumps flexibility, A/B partitioning is the clear choice.

Option A (chosen): We do not reserve separate appfs slots. Extensions, custom apps, and user content use the persistent /data partition (subdirectories, managed by the extension system). Downstream products that need RAUC-managed app slots can define their own partition layout and system.conf.

2. Partition filesystems and read-only policy

Section titled “2. Partition filesystems and read-only policy”

We define filesystem types and mount policy per partition so that updates are safe and storage lifetime is preserved.

PartitionFilesystemMountRationale
Bootvfat (when separate, e.g. RPi) or part of rootfs (e.g. Rockchip in /boot)Read-only where possibleBoot partition holds kernel, DTB, RAUC keyring. Read-only avoids accidental overwrite; some boards require vfat for firmware.
Rootfs (slot A/B)ext4Read-only in normal operationRAUC writes a full image to the inactive slot. At runtime the active rootfs is mounted read-only so power loss or crashes cannot corrupt it. Writes (e.g. /tmp, /var, /etc overrides) go to an overlay: overlayfs with upperdir on tmpfs or on a small writable area (e.g. on /data).
Dataext4Read-writePersistent user data, AI models, config overrides, logs. Preserved across rootfs updates.

Read-only rootfs in practice: The rootfs image is built and shipped as ext4. At boot, the initramfs or early userspace mounts the rootfs and, if using overlayfs, creates a writable overlay (lowerdir = rootfs, upperdir/workdir = tmpfs or a dedicated dir on /data). That way the block device for the slot is never written at runtime, which reduces wear and prevents filesystem corruption. Services that need to write (e.g. logs, machine-id, RAUC status) use the overlay or explicit bind-mounts to /data.

Why ext4 for rootfs and data: ext4 is the standard choice for RAUC slots (best-supported slot type), is robust, and supports resize=true and tar-based installation. SquashFS is an alternative for a read-only root (smaller, no runtime writes at all) but RAUC’s typical flow is ext4; we keep ext4 for simplicity and tooling alignment.

We evaluated four candidates:

ToolApproachLicenseVerdict
RAUCA/B slot-based, bootloader integrationLGPL-2.1Selected
MenderA/B with hosted/self-hosted backendApache-2.0 / Commercial❌ SaaS dependency
SWUpdateFlexible update framework, Lua scriptingGPL-2.0⚠️ More complex than needed
systemd-repartPartition management (not a full update system)LGPL-2.1❌ No signing or rollback

RAUC wins on the combination of:

  • LGPL-2.1 license: compatible with paiOS’s AGPL-3.0 core (ADR-001)
  • Bootloader flexibility: native support for U-Boot (used by Radxa/Rockchip boards) and GRUB, plus a custom bootloader backend for boards with non-standard boot flows (e.g., Raspberry Pi firmware uses tryboot via a custom script). This lets us support multiple hardware targets without changing the update tooling.
  • No SaaS dependency: fully self-hosted, no mandatory cloud service
  • Mature and battle-tested: used in industrial IoT, medical devices, and automotive
  • D-Bus API: programmatic control from the paiOS Engine for status monitoring and update triggers
  • Bundle format: self-contained .raucb bundles with manifest, images, and cryptographic signatures

See Rationale for detailed comparisons with each alternative.

All update bundles are cryptographically signed with Ed25519 to prevent malicious firmware injection. The private key lives in CI/CD secrets (GitHub Actions encrypted secrets, HSM for production). The public key is embedded in the OS image at /etc/rauc/keyring.pem (mounted read-only at runtime).

StepActorAction
1. BuildCI/CD (GitHub Actions)Debos creates rootfs image
2. BundleCI/CDRAUC packages image into .raucb bundle
3. SignCI/CD (with HSM/secrets)Bundle signed with Ed25519 private key
4. DistributeCDN or local mediaSigned bundle made available for devices
5. VerifyDevice (RAUC)Bundle signature verified against embedded public key
6. InstallDevice (RAUC)Image written to inactive slot only if verification passes

Key rotation is supported: new public keys can be included in update bundles, with the old key verifying the transition bundle.

Automatic rollback ensures devices recover from failed updates without manual intervention.

Health Check Flow:

Boot new slot → systemd starts → watchdog timer begins
┌──────────────┴──────────────┐
│ │
Services healthy Watchdog expires
within timeout (services failed)
│ │
Mark slot as Bootloader reverts
"good" (confirmed) to previous slot
│ │
Normal operation Boot old slot
(known-good)
ComponentMechanismDetails
Boot counterU-Boot boot_count variableIncremented on each boot; reset on successful health check
Max attemptsU-Boot boot_count_limitDefault: 3 attempts before reverting to previous slot
Health checkrauc status mark-goodCalled by a systemd service after critical services pass readiness checks
Watchdogsystemd WatchdogSecHardware watchdog resets the device if the OS hangs during boot
Critical servicessystemd Type=notifypaiOS Engine and core services must report readiness via sd_notify

Failure Scenarios:

ScenarioResult
Kernel panic on new slotWatchdog resets, U-Boot decrements counter, boots old slot
paiOS Engine fails to startHealth check service times out, no mark-good, next boot decrements counter
Power loss during updateInactive slot was being written, active slot unaffected, normal boot
Power loss during rebootU-Boot boots whatever slot was last marked active, safe

Updates can reach the device via three paths. The verification and installation flow is identical regardless of how the bundle arrives.

MethodHow it works
USB driveUser plugs in a USB stick containing the .raucb bundle; the Engine detects it and triggers RAUC
Local fileBundle transferred via local network (e.g., SCP) or pre-loaded on storage
OTA (online)Device polls a metadata server and downloads from CDN

OTA Version Discovery (online only):

  1. Device polls https://updates.pai.dev/v1/manifest.json (configurable interval, default: 6 hours)
  2. Manifest contains available versions, hardware compatibility, and bundle URLs
  3. If an update is available and compatible, the device downloads the .raucb bundle
  4. RAUC verifies the signature and installs to the inactive slot

Manifest Format (example):

{
"latest": "0.3.0",
"channels": {
"stable": {
"version": "0.3.0",
"compatible": "paios-rk3588",
"bundle_url": "https://cdn.pai.dev/releases/0.3.0/paios-rk3588-0.3.0.raucb",
"bundle_sha256": "a1b2c3...",
"bundle_size": 524288000,
"release_notes": "https://pai.dev/changelog/0.3.0"
},
"beta": { "...": "..." }
}
}

The exact user-facing workflow for offline updates (e.g., button confirmation, progress indication) will be defined when the update UI is implemented.

We ship full rootfs images compressed with zstd (RAUC default, 30-50% savings). Delta updates (e.g., casync-based chunk differencing) would reduce transfer size further, but they add significant build pipeline complexity: every release would need deltas generated against multiple previous versions. Following the project’s preference for stability and simplicity, we defer delta updates to a later phase if download size becomes a real bottleneck.

Large AI model files (ONNX, RKNN) live on the persistent /data partition and are updated via a separate, lighter mechanism (download + verify). This keeps rootfs updates small, since models change on a different cadence than the OS.

The OS building tool Debos (ADR-002) creates rootfs images that are packaged into RAUC bundles:

Debos Recipe → rootfs.ext4 → RAUC Bundle (.raucb) → Sign → Upload
StepToolOutput
1. Build rootfsDebosrootfs.ext4 (Debian-based image)
2. Bundle + signrauc bundle --signing-keyringSigned paios-rk3588-0.3.0.raucb
3. UploadCI/CD scriptPush to CDN + update manifest

Mender has excellent UX, but:

  • SaaS dependency: the full feature set requires Mender Server (hosted or self-hosted)
  • Commercial licensing: enterprise features are behind a paywall
  • Bootloader integration friction: integration often requires a Yocto layer and manual patches for newer hardware (e.g., Compute Module 5)
  • Overhead: more complex than needed for our use case

SWUpdate is flexible and powerful, but:

  • Configuration complexity: uses Lua scripting for update logic, which adds a runtime dependency and maintenance burden
  • Broader scope: SWUpdate is a general-purpose update framework (supports single-copy, dual-copy, and custom flows), so it requires more configuration to get a clean A/B setup that RAUC provides out of the box

systemd-repart handles partition management, but:

  • Not an update system: no signing, no rollback logic, no bundle format
  • Would require custom tooling: we would need to build everything RAUC provides out of the box
  • Brick-proof updates: A/B partitioning means a bad update never prevents booting
  • Automatic recovery: watchdog + boot counter handle failures without human intervention
  • Cryptographic security: Ed25519 signing prevents unauthorized firmware
  • Offline-capable: USB and local updates work identically to OTA
  • No SaaS lock-in: fully self-hosted distribution infrastructure
  • Simple pipeline: full-image updates avoid the complexity of delta generation
  • ⚠️ Double storage: A/B partitioning requires 2× rootfs space (~2-4 GB overhead)
  • ⚠️ Reboot required: updates are not live, a reboot is needed to activate
  • ⚠️ Bootloader coupling: tight integration with U-Boot means bootloader updates require extra care
  • ⚠️ Full image downloads: no delta updates initially, so each update transfers the full rootfs (mitigated by zstd compression and model separation)
  • U-Boot environment: RAUC communicates with U-Boot via environment variables (boot_order, boot_count). The U-Boot configuration must be set up during initial image provisioning.
  • First boot: factory images ship with only Slot A populated. Slot B is empty until the first OTA update.
  • Rollback scope: rollback restores the OS and Engine. AI models on /data are preserved independently.
  • Platform extensibility: teams building products on top of paiOS can add their own update layers. RAUC’s slot system supports multiple named slots (e.g., rootfs, appfs, config), so a downstream product could define an additional app slot for their custom software without touching the core OS update flow.

Flexible sizing (e.g. for future app slots) is often suggested via BTRFS subvolumes or LVM. The following is accurate as of RAUC 1.15:

ApproachRAUC supportNotes
BTRFS subvolumesNo. RAUC has no type=btrfs slot type. Supported slot types are raw, ext4, vfat, nand, nor, ubivol, ubifs, jffs2, plus bootloader types.Systems like SteamOS use BTRFS for A/B by switching the default subvolume at boot; that is a different update mechanism, not RAUC. To use RAUC you target block devices (partitions or LVs), not subvolumes.
LVMYes. RAUC accepts any block device. You can use logical volumes (e.g. device=/dev/vg0/rootfs-a) with type=ext4. RAUC writes the image to the LV as usual.LVs can be resized at runtime (e.g. in a pre-install hook or out-of-band). That avoids fixing slot sizes at image creation time. Downside: extra complexity (LVM on embedded), and the partition table still needs one (or more) partition(s) for the PV.

paiOS choice: We use simple partitions (Boot | rootfs A | rootfs B | data) for clarity and minimal moving parts. If a downstream product needs resizable app slots, they can adopt LVM and point RAUC at LVs; we do not mandate it in the base image.