ADR-004: Engine Architecture
Status
Section titled “Status”Accepted (2025-03-01)
Context
Section titled “Context”paiOS is designed to power Personal AI Hardware across multiple form factors: from paiBox (Private AI Server) to future wearable devices. This requires an architecture that is:
- Hardware-agnostic: The same core logic must run on different silicon (Rockchip, NVIDIA, Apple Silicon)
- Privacy-first: Users must be able to trust what has access to sensors
- UX-friendly: Users should be able to use standard GGUF models without complex conversion pipelines
- Efficient: Embedded devices have limited RAM; we cannot afford microservice overhead
Trust, Not Just Privacy
Section titled “Trust, Not Just Privacy”Inspired by Android’s permission model, we ensure that apps and AI models cannot access hardware (camera, microphone, NPU) without explicit user permission. All hardware access goes through a single trust boundary, giving users real control over their data.
The Core Challenge
Section titled “The Core Challenge”The RK3588 SoC offers heterogeneous compute resources:
| Resource | Capability | Trade-off |
|---|---|---|
| NPU | 6 TOPS, high throughput | Requires proprietary .rknn format (vendor lock-in) |
| GPU | Mali-G610, parallel compute | Lower throughput than NPU |
| CPU | Cortex A76/A55, ARM NEON | Flexible but slower for matrix ops |
The challenge: How do we leverage all three resources efficiently while maintaining a simple, portable architecture?
Key Constraints
Section titled “Key Constraints”- Embedded RAM: Single-process architecture preferred over microservices
- Inference flexibility: The system must offer multiple ways to run models: lightweight CPU-friendly options (e.g. GGUF via llama.cpp) for quick testing without proprietary conversion; fully hardware-optimized backends (e.g. RKNN) for best performance; and the ability to mix backends for parallel workloads (heterogeneous compute).
- Security: All hardware access must go through a single trust boundary
- Maintainability: Contributors and agents must immediately understand which domain they work in (e.g. Vision, Inference) and, within that, which hexagonal layer (domain logic, ports, or adapters).
Documentation Approach
Section titled “Documentation Approach”We document the paiOS Engine architecture using the C4 Model:
- Level 1 (Context): System context showing paiOS’s relationships with external actors (users, apps, hardware)
- Level 2 (Container): Container diagram showing the paiEngine process and its interactions
- Level 3 (Component): Component diagram zooming into the pai-engine container (Hexagonal view: driving/driven, engine domains)
See C4 Architecture for system-level diagrams (including Level 3) and OS & Infrastructure for the OS/hardware layer. This ADR focuses on the “why” behind the Engine’s architectural decisions; the C4 diagrams show the “what” at different abstraction levels. For detailed implementation of each domain, see Engine Domains.
Decision
Section titled “Decision”We adopt a Modular Monolith architecture using Hexagonal Architecture (Ports and Adapters) combined with a Hybrid Inference Strategy that distributes workloads across NPU, CPU, and GPU.
1. Architectural Style: Modular Monolith
Section titled “1. Architectural Style: Modular Monolith”paiEngine is a single executable compiled from a single monorepo with clear module boundaries. No microservices.
Why Modular Monolith over Microservices? On embedded devices with limited RAM:
- Lower overhead: Single process = fewer context switches, shared memory, simpler lifecycle
- Faster startup: No inter-process communication or service discovery
- Simpler security: Single trust boundary, easier permission model
- Better resource utilization: Shared memory pools, unified resource management
The modularity comes from strict architectural boundaries (domain vs adapters) enforced by Rust’s module system and workspace structure, not from separate processes.
2. Hexagonal Architecture (Ports and Adapters)
Section titled “2. Hexagonal Architecture (Ports and Adapters)”We adopt Hexagonal Architecture because it is:
- Rust-native: Rust traits map directly to ports/adapters
- Embedded alignment: Embedded Rust Book and embedded-hal use traits as abstraction boundaries: the same idea as ports and adapters
- Pragmatic: Simpler than full Clean Architecture, ideal for MVP
- Zero overhead: Compile-time polymorphism
Core principle: The Core (Domain) defines Ports (Rust traits); the Edge (Adapters) implements them. Dependencies point inwards; main.rs is the Composition Root that wires adapters into the domain at startup.
SoC Vendor Independence & Rapid Porting: While the Rockchip RK3588 is the first SoC supported by paiOS, the architecture is explicitly designed to prevent vendor lock-in. By isolating hardware-specific SDKs (like Rockchip’s RKNN or librga) into interchangeable Adapters, the entire engine can be rapidly ported to new SoCs (e.g., NVIDIA Jetson, Snapdragon, or Apple Silicon) simply by writing new adapters for the existing Ports. The core logic remains completely untouched.
| Adapter Type | Direction | Examples |
|---|---|---|
| Driving Adapters (Left) | Inbound → Domain | gRPC API Gateway, HMI (buttons), CLI |
| Driven Adapters (Right) | Domain → Outbound | Camera, Microphone, NPU, Storage |
For detailed component implementations, see the individual domain pages:
- Core: Orchestrator, SessionManager, FlowRunner
- Audio: Audio capture, playback, signal conditioning
- Vision: Camera, frame processing, motion gating
- Inference: AI backends, resource management, MCP client
- API: API Gateway, protocol adapters, routing
- Peripherals: HMI, buttons, LEDs, haptics, USB keyboard emulation
3. Rust Workspace (Summary)
Section titled “3. Rust Workspace (Summary)”We enforce architectural boundaries using a Cargo workspace under engine/: domain crates (common, core, vision, audio, inference, api, peripherals) each contain domain logic, ports (traits), and feature-gated adapters; adapters live inside their domain crate, not in separate top-level crates; domain crates never import each other (cross-domain coordination goes through core); and pai-engine is the only executable and the Composition Root.
For the full workspace layout, feature flags, capability vs profile features, and the sys-crate pattern, see ADR-008: Workspace Layout and Workspace and Build.
4. Hybrid Inference Strategy
Section titled “4. Hybrid Inference Strategy”We distribute workloads based on their characteristics:
| Resource | Workload | Rationale |
|---|---|---|
| NPU | Main LLM (Llama, Mistral, Qwen) | Highest throughput, primary inference |
| CPU | Speech-to-Text (Whisper), VAD, Wake Word | ARM NEON optimized, saves NPU memory |
| CPU | Vector Database (RAG) | SIMD benefits, fast RAM access |
| GPU | Small logic models (classifiers) | Parallel aux tasks, avoids NPU context switch |
Inference paths we offer: GGUF via upstream llama.cpp gives a user-friendly path (no proprietary conversion, CPU/GPU fallback) for LLMs and quick testing; we use llama.cpp in CPU-only mode (no dependency on community forks). RKNN/RKLLM provide NPU-optimized vision and language models; ONNX (e.g. Sherpa-ONNX for audio: wake word, STT, TTS, VAD) covers small, portable models. Further adapters (MCP client, mock) and backends can be added behind the same port traits. The architecture allows mixing backends for parallel workloads. All are encapsulated behind Rust port traits.
See Inference for the full adapter table, resource management, and hardware allocation strategy.
5. Runtime & Concurrency Model
Section titled “5. Runtime & Concurrency Model”We use a Mono-Daemon architecture with a clear separation between async I/O and blocking compute:
When to use what
Section titled “When to use what”| Primitive | When to use | Examples |
|---|---|---|
std::thread | Long-lived, blocking, or CPU-bound work that must not block the Tokio event loop. Each thread owns its resources and communicates via channels. | Dedicated inference thread (NPU/GPU driver calls are blocking FFI); camera capture thread (V4L2 ioctl blocks). |
Tokio task (tokio::spawn) | I/O-bound, async work that benefits from cooperative scheduling. Use for anything that awaits network, file, or timer futures. | gRPC/REST request handlers; MCP server/client; EventBus dispatch; health checks; scheduled timers. |
tokio::task::spawn_blocking | Short-lived, blocking work that doesn’t justify a dedicated thread. Tokio runs it on a thread pool and returns a future. | One-off config file parsing; SQLite permission lookups; model file validation before loading. |
Communication between threads and tasks
Section titled “Communication between threads and tasks”| Mechanism | Use case | Details |
|---|---|---|
Bounded tokio::sync::mpsc | Primary channel between Tokio tasks and between Tokio tasks and std::threads. | Sender can be passed to a std::thread; receiver lives in a Tokio task (or vice versa). Always bounded to enforce backpressure. |
tokio::sync::broadcast | One-to-many event distribution (EventBus). | The EventBus is a broadcast channel: when Audio detects a wake word or Peripherals registers a button press, the event is broadcast to all subscribers (SessionManager, active Flows). Bounded; slow receivers miss events (acceptable: missed events are logged, not queued indefinitely). |
tokio::sync::oneshot | Single request → single response. | Used when the Core sends a command to an inference thread and waits for the result (e.g., “generate text” → generated text). |
Shared state (Arc<Mutex<_>> / Arc<RwLock<_>>) | Configuration and session state that is read often, written rarely. | SessionConfig, PermissionManager cache. Prefer RwLock for read-heavy access. Avoid holding locks across .await points (use tokio::sync::RwLock if unavoidable, but prefer channels). |
Control vs Data plane
Section titled “Control vs Data plane”The Core orchestrates setup and teardown of pipelines (control plane). Once a pipeline is running, data flows directly between domains via bounded channels without routing every frame through the SessionManager:
flowchart LR C["Core control plane"] --> S["Start Voice flow / set up channels"] S --> A[Audio] A --> STT[STT task] STT --> LLM[LLM thread] LLM --> TTS[TTS task]
The Core holds the channel handles and can tear down the pipeline (close senders, cancel tasks) at any time; this is the Saga rollback mechanism. But while the pipeline runs, data moves point-to-point for efficiency.
6. Sys-Crate Pattern
Section titled “6. Sys-Crate Pattern”All unsafe and FFI code lives in sys-crates under libs/. The domain crates never depend on them; only adapters (inside domain crates) do.
Benefits: Clear unsafe boundary, domain testability, backend swappability, feature isolation.
See Workspace and Build for full details on the sys-crate pattern and its relationship to adapters.
7. API Gateway & Security Architecture
Section titled “7. API Gateway & Security Architecture”The api crate serves as the primary ingress point (Driving Adapter). It enforces strict, hardcoded routing rules that prevent “Confused Deputy” attacks:
| Adapter | Accessible Ports | Restricted |
|---|---|---|
| LocalSystem (UDS/IPC) | ALL ports | None |
| SecureNetwork (gRPC TCP) | Inference, SensorRelay, SessionConfig | DeviceControlPort |
| McpServer (MCP) | Inference, SensorRelay, SessionConfig | DeviceControlPort |
| Ollama/OpenAI (HTTP) | Inference only | All others |
The Permission System (HITL) adds a second layer: sensitive actions require physical button press on the device for confirmation, even if the adapter has port access.
See API for gateway details and Security Architecture for the full security model.
8. MCP Integration (Bidirectional)
Section titled “8. MCP Integration (Bidirectional)”paiOS uses MCP in two directions:
| Role | Feature | Purpose |
|---|---|---|
MCP Server (api_mcp_server) | Allows host PCs to use the AI device as a remote sensor/tool | |
MCP Client (infer_mcp_client) | Allows the device’s LLM to use external tools (Home Assistant, memory, web search) |
Key principle: The LLM never executes tools directly. It generates structured JSON tool-calls that are routed through the ToolExecutionPort to external MCP servers, gated by the PermissionManager.
See Inference for the MCP client architecture.
9. App–Engine Topology
Section titled “9. App–Engine Topology”The engine is always the server; clients connect to it. The topology varies by form factor, but the architecture is the same:
| Scenario | Client | Transport | Profile notes |
|---|---|---|---|
| paiBox (AI Server) | Host PC (Claude Desktop, CLI, IDE) | gRPC UDS (same device) or MCP (stdio/SSE) | Primary use case for Milestone 0–2. The engine runs on the device; the host PC is the client. |
| Standalone Wearable (future) | The device itself (voice/buttons) | In-process (PeripheralsInterface) | Agentic: the user talks to the device, the engine runs the full pipeline locally. No external client required. |
| Companion App (Smartphone) | Smartphone app | gRPC TCP/TLS over WiFi/BT | The app is just another remote client, subject to the same routing matrix. No architecture change. |
| GUI (Desktop) | Local desktop app | gRPC UDS or REST | See API: GUI / Companion App. |
“Direct vs overlay”: A smartphone companion app that sits “on top” of the engine is not an overlay layer; it’s a standard API client connecting via gRPC TCP. The engine remains the single backend.
Agentic angle: When the primary interaction is “user talks to one model” (engine as autonomous backend), the engine handles the full voice pipeline locally. External clients (phone, PC) are optional; they can configure or monitor, but the engine operates independently. This is already supported by the existing architecture: PeripheralsInterface drives the engine locally, API adapters enable optional remote access.
Disabling remote APIs: If a build should have no network API (e.g. minimal embedded, no TCP), simply omit api_grpc_tcp from the profile. The engine still works via local IPC (api_grpc_uds) and direct peripheral interaction.
10. Simplification Review (YAGNI)
Section titled “10. Simplification Review (YAGNI)”We reviewed the architecture for over-engineering. Decisions:
| Area | Decision | Rationale |
|---|---|---|
| Port count (4 capability ports) | Keep all four (DeviceControlPort, SessionConfigPort, SensorRelayPort, InferencePort) | Merging ports (e.g. SessionConfig + SensorRelay into one “ControlPort”) would weaken the routing matrix; each port is a distinct security boundary. |
| Crate count (7 domain crates) | Keep all seven, but implement incrementally | The crate boundaries enforce Hexagonal Architecture at compile time. For Milestone 0, only common, core, inference, api, and pai-engine need code; the rest can be stubs with ports defined and mock adapters only. |
| Build profiles | Keep desktop / rockchip / test only | No additional profiles unless a concrete new board requires one (e.g. raspi). Avoid premature profile_* variants. |
| FlowRunner abstraction | Keep the trait, implement only HardcodedFlowRunner for MVP | The trait is one file; the cost is near zero; the value (future extensibility) is high. No scripting runtime in MVP. |
| Composition root | Keep single main.rs; extract a composition module only if main.rs exceeds ~300 lines | See Composition Root. |
| Module merging | Do not merge crates (e.g. API into Core) | Blurs domain boundaries and makes the routing matrix harder to audit. |
11. Future Scope & Deferred Features (YAGNI)
Section titled “11. Future Scope & Deferred Features (YAGNI)”Features intentionally excluded from the MVP, with architectural readiness:
| Feature | Why Deferred | How Architecture Supports It |
|---|---|---|
| Dynamic / User-Defined Flows | Static, strongly-typed Rust flows (flows::Voice, flows::Chat) are faster to implement, fully testable, and sufficient for MVP. The long-term vision: users defining their own personalised flows individually (e.g. custom transcription pipelines, night-vision alerts, interview prep sessions) and requires a scripting/agentic runtime that is out of scope for the initial release. | SessionManager abstracts all flow execution behind the FlowRunner trait. A ScriptedFlowAdapter (Rhai), AgenticFlowAdapter (LangGraph), or a JSON/YAML DSL adapter can be injected at startup without any Core changes. See Core: MVP Flows & FlowRunner. |
| Native Local Vector DB | Consumes precious RAM/Storage on the embedded device | MCP-based memory today; LocalMemoryAdapter (SQLite-VSS or Qdrant-embedded) in inference later |
| BLE Provisioning & OpenThread | Massive complexity for MVP | New Driving Adapter in API Gateway (api_ble_provisioning); no Core changes needed |
12. Cross-Cutting Robustness Guarantees
Section titled “12. Cross-Cutting Robustness Guarantees”The following patterns are required across all domain crates. They were explicitly identified during the architecture design phase as critical for production reliability. Violating any of these will cause crashes, Zombie States, or corrupted session flows.
| Pattern | Requirement | Applies To |
|---|---|---|
| Bounded Contexts | No direct cross-domain imports. All inter-domain coordination goes through core. | All domain crates |
| Cancellation Safety | Every domain crate that holds hardware resources must expose clean stop_*() / cancel_*() methods that are safe to call at any time. Core invokes these during Saga rollback. | audio, vision, inference |
| Saga / Rollback | Multi-step flows (Voice, Stream) are Sagas: each step registers a compensating action. On interrupt or error, compensating actions execute in reverse order, guaranteeing no Zombie States. | core flows + all domain crates |
| Idempotency | State-changing operations (start recording, load model) must be idempotent. Duplicate requests return AlreadyInProgress, not an error or a second parallel operation. | core, api |
| Backpressure & Bounded Queues | All internal Tokio channels and data pools (FramePool, RingBuffer) use bounded capacities. When full, old data is dropped. OOM is never acceptable. | vision, audio, inference |
See Core: Production-Grade Robustness for the canonical reference implementation of these patterns.
13. Scope and Completeness Check
Section titled “13. Scope and Completeness Check”We compared the architecture against typical Edge-AI and embedded-agent systems to identify critical gaps. Status of each component:
| Component | Status | Where |
|---|---|---|
| Observability (logs, metrics, tracing) | Partially covered | Logger in common; structured tracing (e.g. tracing crate with spans) not yet specified. Recommendation: adopt tracing from day one; add a tracing subscriber in main.rs. Metrics and trace export (e.g. OpenTelemetry) deferred. |
| OTA / Update system | Covered | ADR-003 (A/B partitioning). |
| Secure boot / attestation | Out of scope for MVP | Depends on hardware support; document as deferred. |
| Key storage / secrets | Not covered | No secrets management specified. For MVP, API keys (if any) can live in config; for production, integrate with Linux keyring or a hardware-backed store. Add as a deferred item. |
| Time / sync (NTP, RTC) | Implicit | Handled by the host OS (Linux NTP). No engine-level concern unless offline operation requires monotonic timestamps. |
| Power / battery | Covered | SystemStatus port in Core, SystemMonitorAdapter. |
| Recovery / safe mode | Partially covered | A/B partitioning (ADR-003) provides rollback. A dedicated “safe mode” (minimal boot with diagnostics only) is not yet specified. Add as a deferred item. |
| Health / diagnostics endpoint | Not covered | A /health or similar endpoint for monitoring (e.g. companion app checking if the engine is alive) is missing. Recommendation: add a basic health check to the API gateway (always enabled, no authentication, returns engine state and uptime). |
| Graceful shutdown | Not covered in docs | See ADR-008 Improvement Checklist: startup/shutdown lifecycle is tracked there. |
| Cross-domain data flow | Partially covered | Documented in this ADR under Runtime & Concurrency (control vs data plane). |
Nothing critical was found missing. The main gaps (structured tracing, health endpoint, graceful shutdown, secrets management) are important for production readiness but do not require architectural changes; they fit cleanly into the existing structure.
Rationale
Section titled “Rationale”Why Hexagonal over unstructured Monolith?
Section titled “Why Hexagonal over unstructured Monolith?”- Contributors immediately see which domain they work in and which hexagonal layer (domain vs adapters) they touch.
- Testability: The domain can be tested without hardware (mock adapters)
- Portability: The same domain runs on paiBox today and on future wearable devices tomorrow
Why distribute across NPU/CPU/GPU?
Section titled “Why distribute across NPU/CPU/GPU?”Using only the NPU creates bottlenecks. By distributing workloads, we achieve true parallelism without expensive context switches.
Why Modular Monolith instead of Microservices?
Section titled “Why Modular Monolith instead of Microservices?”Single process = fewer context switches, shared memory, simpler lifecycle, faster startup, better resource utilization on embedded devices. Modularity from Rust’s module system, not separate processes.
Why Sys-Crates instead of inline unsafe?
Section titled “Why Sys-Crates instead of inline unsafe?”- Auditability: All unsafe code in one place
- Vendor SDK integration: Rockchip RKNN SDK requires C bindings
- Future-proofing: When better Rust-native alternatives emerge, we swap the sys-crate
Alternatives Considered
Section titled “Alternatives Considered”Architectural Style Alternatives (Final Review)
Section titled “Architectural Style Alternatives (Final Review)”Before comparing specific technology choices, we evaluated the overarching architectural style. This section records that review to close the question definitively.
| Style | Strengths for paiOS | Weaknesses for paiOS | Verdict |
|---|---|---|---|
| Hexagonal (Ports & Adapters) | Rust traits = ports (zero-cost); embedded-hal alignment; single composition root; adapters are naturally feature-gated; domain logic testable without hardware | Indirection grows with the number of ports; can feel like “boilerplate” for small projects | Chosen |
| Layered / Clean Architecture | Well-documented in literature; strict dependency rule | More layers than needed (entities, use cases, interface adapters, frameworks); the extra layers add ceremony without benefit for a monolith where traits already enforce boundaries | Rejected |
| Vertical Slices (by feature/flow) | Good for teams owning entire features end-to-end; reduces cross-cutting coordination | Doesn’t naturally express hardware abstraction (a camera adapter isn’t a “feature”); harder to share ports across slices; would duplicate infrastructure in each slice | Rejected |
| Actor Model (message-passing components) | Natural concurrency model; each actor is independently testable | Complex in Rust (ownership + message-passing friction); harder to reason about synchronous orchestration (Sagas, rollback); actor frameworks add weight | Not chosen as primary; the EventBus and channel-based communication borrow ideas from actors without the full framework |
| HAL-only (embedded-hal traits, no orchestration layer) | Minimal overhead; proven in bare-metal Rust | Too narrow: covers hardware abstraction but doesn’t address orchestration, session management, API routing, or security; paiOS is an OS-level daemon, not a bare-metal driver | Rejected as sole pattern; embedded-hal concepts are used within adapters |
Conclusion: keep Hexagonal:
- Natural Rust fit. Traits are ports; structs are adapters;
#[cfg(feature)]gates adapters at compile time. No runtime framework needed. - Embedded-HAL alignment. The embedded Rust ecosystem already uses this pattern (trait = abstraction boundary). Contributors with embedded Rust experience feel at home.
- Single composition root. One place (
main.rs) wires everything; the rest of the codebase is hardware-agnostic and testable. - Scaling path. If the project grows: keep one composition root, optionally split it into a
composition/module (see Composition Root); new domains become new crates with the samedomain/ports/adapterslayout. No structural redesign needed. - Agent-native angle. Ports are stable API boundaries; MCP and external agents are modeled as driving adapters. Agent-parity is maintained at the port level without special-casing.
Risk to watch: As the number of ports grows, the Orchestrator’s type signature can become unwieldy (currently 7 generic parameters). Mitigations: use a type-family / associated-type bundle to reduce parameters, or use Box<dyn Trait> for orchestration-level ports where vtable cost is negligible.
Technology-Specific Alternatives
Section titled “Technology-Specific Alternatives”| Alternative | Why Not Chosen |
|---|---|
| Native RKNN only | Vendor lock-in, requires Python conversion, no CPU fallback |
| NPU for all inference | Bottleneck when multiple models needed simultaneously |
| Microservices architecture | Too much RAM/IPC overhead on embedded devices |
| Direct hardware access (no HAL) | Security risk, no permission control, not portable |
| Pure Safe Rust (no sys-crates) | Impractical: vendor SDKs require FFI bindings |
| Shared memory for IPC | License implications (see ADR-001), complex lifecycle |
Consequences
Section titled “Consequences”Positive
Section titled “Positive”- Hardware-agnostic core enables multi-form-factor support (server → wearable)
- Clear boundaries help contributors understand the architecture
- Security auditable: all unsafe code isolated in sys-crates
- UX-friendly: GGUF models work out of the box
- Parallel inference: NPU handles LLM while GPU runs auxiliary tasks
Negative
Section titled “Negative”- llama.cpp scope: We use upstream llama.cpp in CPU-only mode; we do not depend on the
rk-llama.cppcommunity fork (which adds flexible NPU/CPU inference and fallback when ops are unsupported on the Rockchip NPU). If that fork is merged upstream, we benefit from improved flexibility and robustness, but our design does not require it. - Complexity: Three inference backends add maintenance burden
- Trait abstraction overhead: Minimal performance impact, but exists
Mitigation
Section titled “Mitigation”| Risk | Mitigation Strategy |
|---|---|
| llama.cpp backend | Upstream llama.cpp, CPU-only; Rust traits allow backend swap if we add NPU via llama.cpp later |
| Backend complexity | Strict domain separation; use established crates (whisper-rs, llama-cpp-2) |
| Unsafe code risks | Isolated in sys-crates; regular security audits |
Architecture Diagram
Section titled “Architecture Diagram”For detailed visualizations, see:
- C4 Architecture: System Context and Container diagrams
- OS & Infrastructure: Full layer stack with technology table
Related
Section titled “Related”- ADR-001: Licensing Strategy: IPC boundary as license boundary, shared memory constraints
- ADR-002: OS Building Tool: Debos for reproducible image building
- ADR-003: Update System: A/B partitioning and OTA updates
- OS & Infrastructure: Detailed layer descriptions and technology stack
- C4 Architecture: Container and Component diagrams with data flows
- Engine Domains: Detailed domain documentation
- Workspace and Build: Feature flags, sys-crate pattern, build configuration
- Security Architecture: Permission system, port isolation, threat model