Project Roadmap

Strategy

paiOS is a privacy-first, open-source AI engine for edge hardware. The architecture is designed so that each product is a different configuration of the same modular codebase: activating only the modules needed via Cargo feature flags.

Each milestone adds one more architecture domain, validating the stack incrementally.

Products

paiOS Engine

The open-source core. A modular, vendor-agnostic AI runtime that manages inference across heterogeneous backends (NPU, CPU, GPU), exposes multi-protocol APIs (Ollama, OpenAI, MCP, gRPC), and runs fully air-gapped. Every product built on paiOS is a different set of feature flags compiled from this codebase.

Why it matters for contributors: Everything you build here powers every product. Inference improvements, API extensions, and security hardening have the widest impact.

paiBox

A plug-and-play local AI server for businesses that need to keep data on-premise. Runs on commodity ARM boards (Radxa Rock 5C / RK3588) or x86 hardware. Drop-in replacement for cloud AI APIs: same Ollama/OpenAI endpoints, but nothing leaves the network.

Target use cases: GDPR-compliant AI for European SMBs, on-premise inference for healthcare and legal, air-gapped environments (defense, classified).

paiScribe

A privacy-first meeting transcription device that works 100 percent offline. Records, transcribes, identifies speakers, and generates summaries: all on device, with no cloud dependency.

Target use cases: Law firms (attorney-client privilege), medical practices (HIPAA), government agencies, defense contractors, and anyone who needs meeting records without data leaving the room.

Architecture-to-Product Mapping

Each product activates a different subset of engine modules. The following diagram shows which modules power which product:

graph LR
  subgraph products ["Products"]
      paiBox["paiBox<br/>Private AI Server"]
      paiScribe["paiScribe<br/>Meeting Device"]
  end

  subgraph modules ["Engine Modules"]
      Common["common<br/>Config · Logging · Permissions"]
      Core["core<br/>Sessions · Events · Flows"]
      Inference["inference<br/>LLM · STT · TTS · VAD"]
      API["api<br/>Ollama · OpenAI · MCP · gRPC"]
      Audio["audio<br/>Capture · AEC · Ring Buffer"]
      Peripherals["peripherals<br/>Buttons · LEDs · Haptics"]
      Vision["vision<br/>Camera · RGA · Motion"]
  end

  paiBox --> Common
  paiBox --> Core
  paiBox --> Inference
  paiBox --> API

  paiScribe --> Common
  paiScribe --> Core
  paiScribe --> Inference
  paiScribe --> API
  paiScribe --> Audio
  paiScribe --> Peripherals

  style Vision stroke-dasharray: 5 5,opacity:0.5

Dashed: Vision module is designed but not yet activated by a shipped product.

Milestone Overview

Milestone	Deliverable	Modules Activated	Key Capabilities
M0	paiOS Engine + paiBox v0.1	Common, Core, Inference, API	Text inference, Ollama/OpenAI API, MCP server
M1	paiBox v1.0 + paiScribe v0.1	+ Audio, Peripherals	STT (Whisper), USB HID, buttons, LEDs
M2	paiScribe v1.0	+ Audio (AEC, Diarization)	Speaker diarization, echo cancellation, TTS, meeting summaries

M0: Foundation + paiBox v0.1 (Current)

Modules: common, core, inference, api

Monorepo Setup: Establish structure for engine, os, and apps.
Architecture Definitions: Define Hexagonal Architecture layers and IPC (gRPC/UDS).
Infrastructure: Set up CI/CD, License Compliance via cargo-deny, and CLA bot (CLAassistant).
Documentation: Launch Starlight documentation site.
Inference Engine: LLM inference via Rockchip NPU (infer_rkllm) and CPU fallback (infer_llamacpp_cpu).
API Layer: Ollama-compatible API, OpenAI-compatible API, MCP server, gRPC gateway.
Configuration: Model management, user permissions, audit logging.
paiBox v0.1: Plug-and-play local AI server running on Radxa Rock 5C / RK3588 boards.

M1: paiBox v1.0 + paiScribe v0.1

Adds: audio, peripherals

Audio Pipeline: Microphone capture, STT via Sherpa-ONNX (Whisper), Voice Activity Detection (Silero VAD).
Peripherals: Button input, LED status indicators, USB HID keyboard injection.
paiBox v1.0: Multi-user management, web dashboard, fleet management basics.
paiScribe v0.1: Basic offline meeting transcription, single-speaker dictation mode.

M2: paiScribe v1.0

Enhances: Audio (AEC, diarization, TTS)

Speaker Diarization: Multi-speaker identification in meetings.
Echo Cancellation: WebRTC AEC3 for rooms with speakers.
Text-to-Speech: Local TTS via Piper (ONNX).
MCP Client: Tool execution for extending paiScribe capabilities.
paiScribe v1.0: Full meeting transcription with speaker labels, AI summaries, secure export.

Future Vision

Beyond M2, the architecture supports additional modalities — including vision and wearable form factors. These milestones will be detailed as M0-M2 mature. The modular design means future products reuse the same crates; contributors working on the foundation today are building the groundwork for everything that follows.