Skip to content

Project Roadmap

paiOS is a privacy-first, open-source AI engine for edge hardware. The architecture is designed so that each product is a different configuration of the same modular codebase: activating only the modules needed via Cargo feature flags.

Each milestone adds one more architecture domain, validating the stack incrementally.

The open-source core. A modular, vendor-agnostic AI runtime that manages inference across heterogeneous backends (NPU, CPU, GPU), exposes multi-protocol APIs (Ollama, OpenAI, MCP, gRPC), and runs fully air-gapped. Every product built on paiOS is a different set of feature flags compiled from this codebase.

Why it matters for contributors: Everything you build here powers every product. Inference improvements, API extensions, and security hardening have the widest impact.

A plug-and-play local AI server for businesses that need to keep data on-premise. Runs on commodity ARM boards (Radxa Rock 5C / RK3588) or x86 hardware. Drop-in replacement for cloud AI APIs: same Ollama/OpenAI endpoints, but nothing leaves the network.

Target use cases: GDPR-compliant AI for European SMBs, on-premise inference for healthcare and legal, air-gapped environments (defense, classified).

A privacy-first meeting transcription device that works 100 percent offline. Records, transcribes, identifies speakers, and generates summaries: all on device, with no cloud dependency.

Target use cases: Law firms (attorney-client privilege), medical practices (HIPAA), government agencies, defense contractors, and anyone who needs meeting records without data leaving the room.

Each product activates a different subset of engine modules. The following diagram shows which modules power which product:

graph LR
  subgraph products ["Products"]
      paiBox["paiBox<br/>Private AI Server"]
      paiScribe["paiScribe<br/>Meeting Device"]
  end

  subgraph modules ["Engine Modules"]
      Common["common<br/>Config · Logging · Permissions"]
      Core["core<br/>Sessions · Events · Flows"]
      Inference["inference<br/>LLM · STT · TTS · VAD"]
      API["api<br/>Ollama · OpenAI · MCP · gRPC"]
      Audio["audio<br/>Capture · AEC · Ring Buffer"]
      Peripherals["peripherals<br/>Buttons · LEDs · Haptics"]
      Vision["vision<br/>Camera · RGA · Motion"]
  end

  paiBox --> Common
  paiBox --> Core
  paiBox --> Inference
  paiBox --> API

  paiScribe --> Common
  paiScribe --> Core
  paiScribe --> Inference
  paiScribe --> API
  paiScribe --> Audio
  paiScribe --> Peripherals

  style Vision stroke-dasharray: 5 5,opacity:0.5
Diagram (Expanded View)

Dashed: Vision module is designed but not yet activated by a shipped product.

MilestoneDeliverableModules ActivatedKey Capabilities
M0paiOS Engine + paiBox v0.1Common, Core, Inference, APIText inference, Ollama/OpenAI API, MCP server
M1paiBox v1.0 + paiScribe v0.1+ Audio, PeripheralsSTT (Whisper), USB HID, buttons, LEDs
M2paiScribe v1.0+ Audio (AEC, Diarization)Speaker diarization, echo cancellation, TTS, meeting summaries

Modules: common, core, inference, api

  • Monorepo Setup: Establish structure for engine, os, and apps.
  • Architecture Definitions: Define Hexagonal Architecture layers and IPC (gRPC/UDS).
  • Infrastructure: Set up CI/CD, License Compliance via cargo-deny, and CLA bot (CLAassistant).
  • Documentation: Launch Starlight documentation site.
  • Inference Engine: LLM inference via Rockchip NPU (infer_rkllm) and CPU fallback (infer_llamacpp_cpu).
  • API Layer: Ollama-compatible API, OpenAI-compatible API, MCP server, gRPC gateway.
  • Configuration: Model management, user permissions, audit logging.
  • paiBox v0.1: Plug-and-play local AI server running on Radxa Rock 5C / RK3588 boards.

Adds: audio, peripherals

  • Audio Pipeline: Microphone capture, STT via Sherpa-ONNX (Whisper), Voice Activity Detection (Silero VAD).
  • Peripherals: Button input, LED status indicators, USB HID keyboard injection.
  • paiBox v1.0: Multi-user management, web dashboard, fleet management basics.
  • paiScribe v0.1: Basic offline meeting transcription, single-speaker dictation mode.

Enhances: Audio (AEC, diarization, TTS)

  • Speaker Diarization: Multi-speaker identification in meetings.
  • Echo Cancellation: WebRTC AEC3 for rooms with speakers.
  • Text-to-Speech: Local TTS via Piper (ONNX).
  • MCP Client: Tool execution for extending paiScribe capabilities.
  • paiScribe v1.0: Full meeting transcription with speaker labels, AI summaries, secure export.

Beyond M2, the architecture supports additional modalities — including vision and wearable form factors. These milestones will be detailed as M0-M2 mature. The modular design means future products reuse the same crates; contributors working on the foundation today are building the groundwork for everything that follows.