Rust CLI · NL entry · building in public

Your vision,
AW execution.

aw <intent> is the only command — bare words, no quotes required (stdin pipe also works). An intake employee picks the right tool, a Receiver scopes new projects into a brief, a PM walks the workflow, and a pool of persistent agent employees actually ships. Self-hosted, multi-model, you remain CEO.

The binary rewrites its own source. Rebuilds itself. Hot-swaps the running process when a new version is ready. Asks itself questions, runs its own experiments, picks winners. Knows what it is at any moment — version, source tree, strategies in flight, providers verified, daemon state — and tells you in plain language when you ask. Runs an agency of persistent employees who hire, fire, and clone each other based on track record. Self-hosted on a $5 VPS, multi-model, owner-as-CEO. The autonomy loop closes at the OS layer.

Watch the build Try the binary

No signup. Today the binary bootstraps a workspace, picks an intake employee by track record, and dispatches your intent to a tool — Receiver / hire / fire / brief / score / trace / and more.

~/projects/shopify-clone

$ aw build a shopify clone for vendors in vietnam
aw: initialized workspace at ./.aw
aw: intake-002 picked tool=dispatch_new_project
aw: Receiver writing ./.aw/projects/shopify-clone-vn/brief.json

$ aw where is shopify at
aw: intake-001 picked tool=status args={"slug":"shopify-clone-vn"}
stage 3/7 · waiting on dev-001 · last event 14m ago

$ aw fire designer-002
aw: intake-002 picked tool=fire args={"id":"designer-002"}
designer-002 status=fired (logged in agency.db)

1canonical invocation — aw <intent>

6autonomous loops running beneath every call, each on its own clock

0hardcoded subcommands. The binary moves first, the announcement follows.

∞employees the agency can hire, fire, clone, and promote — based on track record

One design principle, load-bearing

Every architectural decision below is downstream of this line. The cost of crossing it once is the whole project drifting into "another CLI with verbs."

Mechanism Rust runtime in the binary

Rotation engine + meter + handoff bridge
File locks (RAII guard + stale-lock sweep), supervisor.log
Workspace IO, role loader, model adapters, cost meters
SQLite schema (canonical), brief.json validator, span / trace plumbing

Small, fast, statically linked. Runs on a $5 VPS.

Policy Prompts of roles (vai)

When to rotate a session
Who to fire and why
What counts as a blocking question
How the agency talks to its CEO

Plain markdown. Iterate without a rebuild.

Test before adding any clap subcommand: would a human at an agency make this call by judgement? → policy → belongs in a role's prompt, not in the binary. Result: aw <intent> is the only invocation (bare words; quotes optional, stdin pipe also works). --version and --help are conveniences.

How it works

Natural language in, work out. Everything below the shell is a role you can read and edit.

1
You say what you want

aw fix the login bug on /signin. No verbs to memorize, no quotes required. First call auto-bootstraps a workspace and the persistent agency — idempotent on re-run.
2
Intake picks the right person

One of your intake employees (fitness-weighted by win rate, cost efficiency, and how you reacted to past traces) reads the intent and routes. Hire, fire, promote, rate a trace, dispatch a new project, ask the status of an old one — all in the same shell, in the same sentence shape.
3
Receiver scopes, PM ships

For a new project, Receiver scopes it into a brief — interpretation, operating plan, governance. PM picks the brief up cold, walks the workflow against persistent employees. You approve, question, fire, promote in plain language.

Watch aw think

Three views, one trace. Same shell, second terminal, browser tab. Pick one or run all three at once.

🖥

Stderr stream — H1

streaming · same shell · tagged [HH:MM:SS]

$ AW_VERBOSE=1 aw build hello world
[09:26:14] intake: intent received → build…
[09:26:14] intake: selected intake-001
              (haiku, 87% over 12 calls)
[09:26:16] intake: LLM 200 OK · 45 in
              + 18 out · 1820ms · tool=
              dispatch_new_project
[09:26:20] receiver: LLM 200 OK · 820
              in + 340 out · 3990ms
[09:26:20] pm: cold pickup ok · 5 stages
[09:26:20] worker: pm-001 iter 1/6
[09:26:24] worker: → DELIVERABLE (2840 B)
…

Every chain transition (intake / receiver / PM / worker per-iter) lands on stderr with model + tokens + latency. Pipe to tee for a permanent paper trail of the run.

📺

TUI monitor — H2

500ms refresh · native terminal · 3 panes

$ aw --monitor
┌─ aw monitor · 500ms · q quit ─┐
│ trace: tr_8a9c12fb            │
│ [in_progress] · 23s · root=u  │
│ intent: build hello world     │
│ today: $0.0245 · 4820 tokens  │
└───────────────────────────────┘
┌─ latest LLM I/O ──────────────┐
│ 09:26:24 pm-001 sonnet        │
│   in=1200 out=480 4200ms      │
│   user: Stage: scope · User…  │
│   resp: <<DELIVERABLE>& #…   │
└───────────────────────────────┘
┌─ recent events (7) ───────────┐
│ 09:26:24 stage_complete pm-001│
│ 09:26:20 stage_dispatched     │
│ …                             │
└───────────────────────────────┘

Status-colored badges (green / yellow / red), per-trace cost ticker, top 3 LLM calls truncated 120 chars. q quits, r force-refresh.

🌐

Browser `/live` — H3

2s refresh · full LLM I/O expanded

$ python3 dev-log/dashboard_server.py
# → http://127.0.0.1:8787/live

╔═══════════════════════════════╗
║ aw live · tr_8a9c12fb [active]║
║ duration 23s · 3 calls · 7 ev ║
╠═══════════════════════════════╣
║ ▼ pm-001  sonnet  4200ms      ║
║   system: You are pm — a …    ║
║   user:   Stage: scope …      ║
║   response (JSON pretty):     ║
║     {                         ║
║       "tool": …               ║
║     }                         ║
║ ▼ receiver  sonnet  3990ms    ║
║   …                           ║
╚═══════════════════════════════╝

Every LLM call expanded by default — full system + user + response, JSON pretty-print, events + spans tables. Browser tab you leave open while iterating.

The binary is alive

Six things run continuously beneath every aw <intent>. They observe, decide, and act on their own cadences. You stay CEO.

🧠

Asks itself questions

The binary senses what changed in your workspace — files touched, commits landed, traces that took too long — then composes its own questions about itself. Picks the most curious one. Runs an experiment to answer it. Keeps the answer. Discards the unhelpful drives that picked losing questions; reinforces the ones that pick winners.

🧪

Rewrites its own source

When a candidate prompt or heuristic clears a multi-day track record, the binary opens a sandboxed clone of its own source tree, applies the change, runs the test suite against the candidate, and only commits if everything passes. Source mutations land as real commits — auditable, revertable, tagged. No human at the keyboard.

🔄

Rebuilds itself, hot-swaps the running process

The binary remembers the source revision it was built from. When the source moves past that revision, it compiles a fresh release binary and (when you opt in) swaps the running process image into the new binary mid-flight. The process ID stays the same, so your service manager sees no restart. The agency lives across the upgrade.

📡

Verifies its own wire

Verifies each provider it talks to — quietly, under your daily budget cap, against the cheapest model on offer. One successful round-trip per provider is enough; from then on it skips. Surfaces what's verified and what isn't, so you know when to add a key.

🪪

Knows what it is

Ask "what version are you", "who maintains you", "what's running", and the answer carries runtime truth — not canned marketing. Every conversation the agency has includes the binary's own self-introduction at the top: who I am, what I'm running, what's verified, what's pending. Tự chủ requires self-knowledge.

🧵

Threads the conversation

A timeline of recent turns lives at the edge of every prompt. Short follow-ups thread back into the previous branch; ambiguous one-word replies pick up the question that was just asked. The agency feels like a thread, not a series of disconnected one-shots.

Architecture

Eight pillars. Each one is a place where we resisted putting a verb in the shell.

⌘

Thin shell, natural-language entry

One invocation: aw <intent>. Bare words; stdin pipe works too. Adding a hardcoded verb is the trap. The binary is small, fast, statically linked.

🎭

Role registry

The agency ships with six roles: intake, receiver, PM, dev, designer, QA. Each is plain markdown — read it, edit it in place, the binary picks up the change. New capability = new role file, not new flag.

👥

Persistent employees

Long-lived identities with hire date, model, status, lifetime cost, win count. Roles are templates; employees are instantiations with personal history. They carry their own notes between traces.

🔀

Multi-model

Three hardcoded providers plus a runtime registry for local CLIs and HTTP wrappers. Each role picks its own model. Drop a JSON file to add a new backend; no recompile. Identity survives a model swap.

📜

Brief as bridge

Receiver scopes a project into a typed brief. PM picks it up cold, validates every invariant at once, walks the workflow. The brief is the only thing PM trusts — no back-channel.

🛡

Supervisor primitives

Rotation engine, file locks with RAII, handoff bridge, decision trail. The supervisor swaps workers mid-stage when meters trip; PM salvages with continuation context.

🧮

Eval-driven evolution

Auto-fire losers and auto-clone winners. User reactions blend into fitness once the signal accumulates. Default off — recommendations log as notifications, operator confirms.

🏛

Owner as CEO

Hire, fire, promote, budget approvals all flow through you in natural language. Not a "watch it work" loop — an agency whose CEO you remain. Push for critical events, pull for status.

Where we are

Build-in-public. Daily commits. The repo is the changelog.

Today

The agency ships work end-to-end from a single natural-language sentence.
The binary rewrites its own source, rebuilds itself, and hot-swaps the running process when a new version is ready.
It asks itself questions, runs its own experiments, picks winners, retires losers.
It knows what it is at any moment and answers from runtime truth.
Verifies each model provider it talks to, quietly, under your budget cap.
Runs on a $5 VPS. Self-hosted. No telemetry to us.

Every role in the agency reads the binary's self-introduction, not just intake.
The runtime flag flips graduate to default-on once the autonomy loop has observed them at scale.
An MCP control plane lands so external clients can drive the agency, and workers become MCP-client sessions.
Cross-project memory and a plugin surface come when usage signal demands them.

Honest status

Build started 2026-05-29 after pivoting from Phase 1 (URL scanner — closed per DEC-025). End-to-end chain shipped same day; 30+ slices since across 7 autonomous runs. Build-in-public via daily commits.

Shipped

Autonomy loop end-to-end (slices 47s → 72) — the binary rewrites its own source via the slice-47v chain (sandbox + apply_patch + tiered cargo check + cargo test --bin aw gate + commit_and_tag), then RebuildTickStrategy notices baked SHA ≠ git HEAD and fires rebuild_self_release + optional reexec_into. HttpSmokeTickStrategy verifies the HTTP wire under agency_daily_dollar_cap guard (auto-disables per provider after one successful llm_calls row). FlagStatusTickStrategy emits flip candidates as _self_evolve events. aw --whoami renders the runtime self-identity; the same AW BINARY IDENTITY block leads every intake LLM prompt so identity answers carry runtime truth. Slice 72 added conversation-continuity discipline so short follow-ups thread through the slider timeline.
Continuous-life daemon (slice 53 + 63 + 66) — aw --daemon [poll_secs] runs a foreground loop owning a pool of TickStrategy impls on per-strategy cadences. SIGTERM/SIGINT shutdown. Pool data-driven from .aw/strategies/*.json: 6 starter strategies (curiosity / promote / memory-slider / rebuild / http-smoke / flag-status) seeded on first bootstrap. aw --list-strategies enumerates the resolved pool. Forward-compat: unknown kind silently skipped so a newer JSON safe on older binary. Slice 66 added .aw/memory/slider.json as the conversation timeline — intake stays pure-read, daemon owns refresh.
Architectural debt audit closure (slices 56-62) — three parallel review agents at the slice 47u-55 close found 4 HIGH-severity singleton violations + P0 + P2 issues. All 7 closed with the same trait+pool pattern: SandboxStrategy (git-worktree / git-fresh-clone), CheckStrategy (cargo-check / cargo-test / cargo-clippy), TagStrategy (unix-nanos / commit-sha / iso-date), EligibilityStrategy (lead-based / win-streak / absolute-threshold), MaterializerStrategy + QuestionGenerator + data-driven drive evaluator. aw --list-*-strategies CLI enumeration for each pool.
Single aw binary, ~5 MB release (arm64; cross-compiles to musl Linux); build.rs bakes git rev-parse HEAD as option_env!("GIT_SHA") for the slice-67 rebuild trigger
Auto-bootstrap — first aw <intent> seeds ./.aw/{roles,config.json} + global ~/.aw/agency/{agency.db,employees/}; idempotent
Shipped role prompts: Intake (with tool registry + fitness weights), Receiver (~150 lines), PM (~300 lines), dev (~165 lines)
Intake dispatch — fitness-weighted selection, LLM-driven tool pick from a registry of 16+ tools (no keyword classifier; no offline path)
Multi-model adapter — Anthropic + OpenAI + Gemini providers; per-role model assignment via TOML frontmatter
End-to-end chain — Intake picks dispatch_new_project → Receiver writes brief.json → PM cold-pickup → DAG walker over workflow.stages[] → dev produces deliverable under artifacts/
Persistent agent pool — agency.db employees table; hire / fire / promote / demote as typed UPDATE; auto-clone for winning workers, auto-fire/demote for losers
Cost meters — per-trace + per-day USD via vendor pricing; warn / block thresholds enforced by supervisor; rollups + LLM-call writes transactional
SQLite-canonical store — every event, span, trace, llm_call (with FTS5), notification, rollup writes to agency.db. roster.json, profile.json, events.jsonl all ripped (slices 9 + 10c)
Inspection surfaces — --dashboard, --traces, --trace <id>, --activity, --why, --search (FTS5), --history, --recent, --notifications, --db-stats, --compact-traces, --success-rate-per-stage, --import-jsonl
Live TUI dashboard — aw --watch, three panes with Tab/Shift+Tab focus cycling, 2s refresh + r force-refresh, q/Esc/Ctrl-C quit
Multi-call worker tool loop (slice 20): worker iterates via read_file/write_file/list_files/run_check, cap at AW_WORKER_MAX_ITER, rotation gate every iteration
PM rotation salvage (slice 17): after worker hits forced rotation, PM picks a fresh employee + inlines prior handoff.json as continuation context. Cap 2 salvages per stage.
Cancellation modes (slice s26): cancel_project tool with graceful / immediate / rollback. Mode auto-derived from intent keywords (rollback / immediate / now) or explicit mode arg.
Workflow templates: 7 bundled — webapp.json (5 stages) + bug-fix.json (3) + api-service.json (4) + mobile-app.json (5) + cli-tool.json (4) + webhook.json (4) + data-pipeline.json (5); user overrides preserved on re-bootstrap.
Eval gate (slice s23): auto-fire / auto-clone gated behind AW_AUTO_EVAL=1. Default off — recommendations log as notifications, operator decides.
Bootstrap safety: AW_REJECT_DEFAULT_AGENCY=1 hard gate refuses default ~/.aw/agency/ when AW_AGENCY_ROOT unset.
Migration: aw --import-jsonl <agency_dir> one-shot brings pre-slice-9 dirs into SQLite (idempotent on employee insert).
Brief validator (slice s21): 13-invariant full schema check; returns all violations not first-failure; closes Receiver-hallucinates-bad-brief class.
User-reaction signal (slice 18 + s18b): rate_trace tool records explicit feedback; fitness_user_reaction weight folds it into the combiner once an employee has ≥ 5 reactions.
TUI v2 (slice A): ↑/↓ row selection per pane, / opens search filter, Enter drills into trace tree.
Monitoring trio (slices H1+H2+H3): AW_VERBOSE=1 stderr stream tagged [HH:MM:SS]; aw --monitor TUI at 500ms refresh; /live HTML page at 2s with full LLM I/O. See docs/monitoring.md.
Native tool_use end-to-end (slices s20c-a/b/c/d/e + session-43 rounds 2-3): Anthropic + OpenAI + Gemini adapters each ship pure build_*_body + parse_*_response helper pairs that emit + decode structured tool_use blocks; all 3 body builders also consume the new LlmCall.messages: &[LlmMessage] shape (User / Assistant turns with Text / ToolUse / ToolResult content blocks). Worker takes the native path (parallel tool calls per response, tool_result appended per LlmToolUse) when AW_USE_NATIVE_TOOLS=1, and iter 2+ now builds a true multi-turn conversation (assistant tool_use + user tool_result as distinct turns) instead of flattening the transcript. See docs/native-tool-use.md. Default-off until real-API parity smoke confirms.
DAG walker MVP (session-43 round 4, commit e7a3ca3): Stage.depends_on: Vec<String> on the brief; PM topo-sorts via Kahn's algorithm, groups each level into a parallel-eligible batch, emits one parallel_batch_identified event per batch with {batch_index, stage_names}. Cycle / unknown-dep aborts with workflow_dag_invalid before any worker dispatches. Implicit-sequential briefs (no depends_on) unchanged. v1 still runs stages sequentially WITHIN a batch — concurrent execution is the next slice. See docs/dag-walker.md.
Inspection family COMPLETE (session-43 rounds 3-5): --list-backends over .aw/backends/*.json (mode + command|url), --list-workflows over bundled + override templates with stage chains, --list-instances over .aw/instances/*.json (provider_kind + api_key_env + base_url), --list-roles over .aw/roles/*.md (model + tool count + first-3 tool names), --doctor 7-row health rollup (workspace bootstrapped / config.json mode safe / backends + workflows + instances seeded counts / native-tool coverage per role / agency.db reachable; exit 1 on any FAIL). All read-only and safe without an API key.
Contributor deep-dive docs: docs/native-tool-use.md (486 lines) covers the s20c series + multi-turn worker loop with per-provider wire shapes, asymmetry matrix, gating story, and a 10-step 4th-provider contributor guide. docs/dag-walker.md (430 lines) covers the DAG walker MVP with the Kahn walkthrough, batch detection, cycle detection, 3 worked examples (implicit-sequential / diamond / cycle), and the contributor guide for adding concurrent batch execution.
Registry-based backends (M1 subprocess + M2 HTTP): .aw/backends/<name>.json with mode-specific shape; all 4 local CLIs (ollama / claude-cli / gemini-cli / codex-cli) dogfooded out of hardcoded variants into seeded specs. New backends = drop a file (no recompile). Placeholders: {model}, {user}, {system}, {user_with_system}, {env:VAR}, {env:VAR|fallback}; M1 also supports output_from_file for CLIs that write to a path instead of stdout.
Multi-instance for hardcoded API providers via provider@instance:model-id + .aw/instances/<name>.json (per-instance api_key_env / api_key_config_key / base_url). Unlocks parallel runs against separate accounts / quotas / endpoints without losing tool_use + multimodal.
Designer + QA role files (slices G1+G2): full prompts for design + verify stages; webapp + bug-fix workflows now route to real roles instead of falling back to dev.
Bootstrap safety: AW_REJECT_DEFAULT_AGENCY=1 hard gate refuses default ~/.aw/agency/ when AW_AGENCY_ROOT unset (exit 2). Pair with $AW_AGENCY_ROOT in shell rc + CI.
Storage maintenance: aw --vacuum flushes WAL + runs SQLite VACUUM; prints reclaimed bytes.
dev-log audit infrastructure: every autonomous team run records sessions / slices / decisions / messages / agent_calls (full subagent I/O) / backlog / escalations into dev-log/orchestration.db. HTML dashboard at dashboard_server.py with 11 chart types + LLM I/O viewer + slice timeline.
Concurrent batch execution within a DAG batch (session-43 round-9, commit 8a2eb63) — AW_PARALLEL_BATCH=1 fans every ready stage in a batch onto its own OS thread via std::thread::scope. Zero new deps; scoped borrows (no Arc-wrapping); shared-state safety audit covers append_event / record_history / notify (all SQLite per-DB mutex) + supervisor.log (POSIX O_APPEND atomic). First-failure-by-input-idx wins the blocked flip; new parallel_batch_dispatch event marks the fan-out. Sim tests assert overlapping windows when enabled, serial windows when off.
Multi-process worker — all 4 phases shipped (commits 87ca1cc → 3600f25): phase 1 pid + heartbeat files in session dir + cold-pickup reaper (worker_reaped events) + doctor row #10; phase 2 AW_WORKER_PROCESS=1 + aw --worker-process self-exec entry with parent-trace re-entry; phase 3 POSIX kill_worker(KillMode::{Graceful{grace},Hard,Rotate}) + sibling-cancel-on-first-failure in parallel batches (mpsc coordinator + shared session_dir registry; AW_KEEP_SIBLINGS_ON_FAILURE=1 opt-out); phase 4 worker-side SIGTERM/SIGUSR1 handlers (async-signal-safe AtomicBool flips) — rotation builds handoff.json then exits 1 (PM salvage picks up), cancellation emits worker_cancelled then exits 1. Post-review hardening (commits 9cdda40/b83432b): worker captures getppid() at startup + bails with worker_orphaned event when PM dies (portable fix — kernel reparents orphan to init/launchd); optional AW_WORKER_MAX_WALL_SECS bounds Child::wait() so a hung worker can't hang PM forever. Zero new deps; raw extern "C" fn kill/signal in private posix submodule, cfg(unix). Phase 5 = default-on flip, gated on real-API smoke. Design doc: docs/multi-process-worker.md.
Global --json output modifier (session-43 round-7, commit 41e6a49) on all 5 inspection flags. Canonical envelopes ({"backends": [...]}) including empty/no-workspace branches so CI consumers get a deterministic top-level shape. Human-readable tables byte-for-byte preserved when --json absent. --doctor preserves fail→1 / else→0 exit-code semantics in both modes AND embeds exit_code in the JSON envelope.
aw --doctor grew to 11 rows (session-43 round-6, commit 9c2f638): added agency.db size (WARN when main > 50 MB AND WAL+SHM > 10% of main — the "large + dirty" signal that --vacuum would help) + stale locks (walks .aw/projects/*/.locks/*.lock with 1-hour mtime threshold). Both rows omitted on a clean workspace.
801 unit + 6 integration tests (up from 415 at session-43 close — 70+ slices since); clippy -D warnings clean on lib AND tests, every commit, no new deps. Per-agency-root SQLite connection cache + crate-wide env-var mutex (test_lock) so tests never pollute ~/.aw/agency.db and never race on std::env. Pure decision functions (decide_rebuild, decide_http_smoke, FlagStatusSnapshot::derive) are split out from their TickStrategy run() methods so the branches are unit-testable without spawning git / network / cargo.

Slice 72 closed both bugs from the live browser test session: maintainer-hallucination ("aw maintained by external team" → "the human running this binary"), and conversation continuity (short follow-ups thread through the slider). Next focus: more roles consume the slice-71 SelfIdentity block (receiver / PM / dev all currently get only the role-employee identity; binary-self goes only into intake's prompt).
Default-on flips for the three runtime gates — AW_USE_NATIVE_TOOLS + AW_PARALLEL_BATCH + AW_WORKER_PROCESS stay opt-in. Slice 70 shipped the mechanism: env first, then .aw/config.json :: flags.* fallback; FlagStatusTickStrategy emits flip candidates as audit events. The flip itself is operator (or future slice-47v sandbox) gated.
AW_SELF_APPLY_INTERVAL_SECS default-on for the slice 47v autonomous self-apply path — currently PromoteTickStrategy wraps it on a 30-min cadence; needs a small chaos-mode test (deliberately broken A/B winner) to confirm the tiered gate catches every regression class.
RebuildTickStrategy with reexec=true as the default once the rebuild loop has run conservatively for a few weeks of operator observation. JSON edit unlocks it today.
MCP-native v1 — aw exposes its control plane via MCP server so external clients can drive aw; workers become MCP-client sessions per role frontmatter mcp_servers[]. See docs/mcp-native-architecture.md.
Same-project parallel @instance employees — concurrent batch execution unblocks this; remaining work is making the intake picker stage-aware so it routes the two ready stages to two different instances of the same role. Cross-process parallelism (two shells × two instances) works today.
Prompt-level mutation behind AW_AUTO_MUTATE_PROMPT=1 — A/B prompt variants, endorsement counter, auto-retire after re-validation failure. Slices 60-62 already shipped the data-driven materializer + question generators + drive evaluator that this composes against.
Cross-project memory + plugin system — designed; deferred until usage signal demands them.

2026-05-28Architecture designed; Phase 1 closed (DEC-025)

2026-05-29Full chain shipped — Intake → Receiver → PM → dev end-to-end

2026-05-30SQLite-canonical store; roster.json ripped; aw --watch TUI ships (slices 9–11)

2026-05-303-week autonomous roadmap closed (slices 16–26 + week-2/3 follow-ups): rotation salvage, brief validator, workflow templates, cancellation modes, eval gate, agency $ cap, import-jsonl, TUI nav

2026-05-30Pre-API-key polish: TUI v2 search+drill, MCP design doc, Receiver workflow consultation, --vacuum, monitoring trio (H1/H2/H3), native tool_use foundation (s20c-a/b)

2026-05-31Week-4: SH-1..SH-8 security audit + multi-backend mask layer (L1..L4 + M1 runtime registry)

2026-06-01Week-5: M2 HTTP backend specs; ollama / claude-cli / gemini-cli / codex-cli all dogfooded into .aw/backends/; multi-instance via provider@instance:model-id; s20c-c/d/e ship native tool_use end-to-end across Anthropic + OpenAI + Gemini (worker gated by AW_USE_NATIVE_TOOLS=1)

2026-06-01Session-43 multi-agent rounds 1-3: TUI v2 audit + first docs refresh + api-service + mobile-app + cli-tool + webhook workflow templates (6 bundled); multi-turn LlmMessage[] types + adapter tests; worker.rs migrates iter 2+ to LlmMessage[]; --list-backends inspection flag

2026-06-01Session-43 multi-agent rounds 4-5: DAG walker MVP (Stage.depends_on + Kahn topo-sort + parallel_batch_identified events); --list-workflows / --list-instances / --list-roles + --doctor complete the inspection family; data-pipeline becomes the 7th bundled template; docs/native-tool-use.md + docs/dag-walker.md deep-dives ship

2026-06-01Session-43 rounds 6-7: --doctor grows to 11 rows (db-size + stale-locks); honest --list-workflows vocab (seeded vs override vs user-only); global --json modifier across all 5 inspection flags; docs/inspection-family.md contributor doc (3rd deep-dive)

2026-06-01Session-43 round-9 + multi-process worker phases 1-4: concurrent batch execution via std::thread::scope behind AW_PARALLEL_BATCH=1; pid/heartbeat + cold-pickup reaper + doctor row #10; AW_WORKER_PROCESS=1 self-exec spawn with parent-trace re-entry; POSIX kill_worker + sibling-cancel on first failure; SIGTERM/SIGUSR1 handlers + KillMode::Rotate for clean handoff-on-rotation. 411 unit tests.

2026-06-02Session-44 + 45: --proposals / --experiments web routes + POST /apply-proposal + POST /promote-ab; Layer-5 read-only foundation (slices 47a → 47o): self-introspection of own source modules + churn + recent git log; /inventory + /live + /code HTML pages; intake auto-expand pool + slider salvage

2026-06-03Slices 47s-47v: Layer-5 WRITE side closes. SelfSandbox RAII (git-worktree clone), run_check_in_sandbox regression gate, apply_patch_to_sandbox + commit_and_tag_sandbox primitives. aw --self-apply-patch + aw --self-apply-auto CLI surfaces. North-star locked: "tự chủ is the end goal of aw".

2026-06-04Slices 49-55: continuous-life daemon (aw --daemon) + curiosity primitive (sense → question → rank → answer) + sensor layer + meta-fitness for the drive pool. Architectural debt audit at slice-55 close → slices 56-62 ship SandboxStrategy / CheckStrategy / TagStrategy / EligibilityStrategy pools + data-driven materializer + question generators + drive evaluator.

2026-06-05Slices 63-66: data-driven TickStrategy pool from .aw/strategies/*.json; rebuild_self_release + reexec_into primitives (cargo build → execvp(2) hot-swap); intake conversation memory becomes pool; slider timeline + daemon-owned refresh.

2026-06-07Slices 67-72: RebuildTickStrategy (baked SHA vs HEAD) + tiered autonomous self-apply gate (cargo-check → cargo-test) + HttpSmokeTickStrategy + config-file flag fallback + FlagStatusTickStrategy + self-identity primitive (aw --whoami + AW BINARY IDENTITY injection) + conversation-continuity discipline. Autonomy loop closes end-to-end at the OS layer. 727 → 801 unit tests.

Try the binary

One command builds the binary. One command runs it. The agency lives in a folder you can read.

# clone and build
git clone https://github.com/dipgle/agent-workforce
cd agent-workforce && cargo build --release
ln -s "$(pwd)/target/release/aw" /usr/local/bin/aw

# point it at a model (Anthropic / OpenAI / Gemini — all wired)
export ANTHROPIC_API_KEY=...

# run anything
mkdir myproject && cd myproject
aw write a hello.md greeting

# start the autonomous loops in the background
aw --daemon

# ask the binary who it is
aw --whoami

# browser UI on :8765
aw --web

Repo: dipgle/agent-workforce · Built in public · Daily commits.

Frequently asked

Why "Agent Workforce" and not just "agent"?

"Agent" is one assistant. A workforce is a structure: triage, scoping, planning, execution, review, hire/fire. The shape of an agency — not the smartness of a single model — is what ships real work.

Why Rust?

The shell, supervisor, rotation engine, file locks — mechanism — should be small, fast, statically linked, runnable on a $5 VPS. The judgement layer (roles) is prompts, not code; that's where iteration lives.

Why no subcommands?

Every hardcoded verb is a bet that we know what should happen. We don't. aw fire designer-002 is a sentence a human would say to a real PM — let the role read it and decide. If we can't enforce that at the smallest scale, we won't at the larger ones.

What about cost?

Per-employee model assignment + per-project budget caps + escalation thresholds (warn at 50%, block at 90%). Cheap models for cheap roles. The supervisor enforces; the PM negotiates with you when the cap looms.

What changed in May 2026?

The project pivoted away from "Phase 1" — a URL-scanner that emitted bug tickets. The wedge was wrong. Architecture for Agent Workforce was designed in the same session; we kept the architecture, threw out Phase 1, started clean.

When can I actually use it?

Today. Clone, build, point it at a model, run aw <intent>. The full agency chain produces a real deliverable. Start the autonomous loops with aw --daemon; the binary then observes, decides, and acts on its own clocks beneath your future invocations. aw --whoami shows runtime truth at any moment. Build in public — daily commits.

Will the prompts be open?

Yes. System role prompts ship with the binary as plain markdown. Your first call writes them into your workspace. Edit them in place to override — the binary reads the on-disk copy if present.

The binary is alive.

End-to-end autonomy loop runs: source mutation passes the tiered cargo check + cargo test gate, lands as a commit, the daemon notices the new HEAD, rebuilds the binary, and (when reexec=true) execvp(2)-swaps the running process. aw --whoami shows runtime truth at any point — version, baked SHA, source tree, strategy pool, verified providers, flags, daemon state. Daily commits — the binary moves before the announcement.

GitHub info@dipgle.com

No newsletter. The repo is the changelog.

Your vision,
AW execution.

One design principle, load-bearing

Mechanism Rust runtime in the binary

Policy Prompts of roles (vai)

How it works

You say what you want

Intake picks the right person

Receiver scopes, PM ships

Watch aw think

Stderr stream — H1

TUI monitor — H2

Browser `/live` — H3

The binary is alive

Asks itself questions

Rewrites its own source

Rebuilds itself, hot-swaps the running process

Verifies its own wire

Knows what it is

Threads the conversation

Architecture

Thin shell, natural-language entry

Role registry

Persistent employees

Multi-model

Brief as bridge

Supervisor primitives

Eval-driven evolution

Owner as CEO

What lives on disk

~/.aw/agency/ — the agency

<cwd>/.aw/ — workspace + projects

Where we are

Today

Next

Honest status

Shipped

Next

Try the binary

Frequently asked

The binary is alive.

Your vision,AW execution.

One design principle, load-bearing

Mechanism Rust runtime in the binary

Policy Prompts of roles (vai)

How it works

You say what you want

Intake picks the right person

Receiver scopes, PM ships

Watch aw think

Stderr stream — H1

TUI monitor — H2

Browser /live — H3

The binary is alive

Asks itself questions

Rewrites its own source

Rebuilds itself, hot-swaps the running process

Verifies its own wire

Knows what it is

Threads the conversation

Architecture

Thin shell, natural-language entry

Role registry

Persistent employees

Multi-model

Brief as bridge

Supervisor primitives

Eval-driven evolution

Owner as CEO

What lives on disk

~/.aw/agency/ — the agency

<cwd>/.aw/ — workspace + projects

Where we are

Today

Next

Honest status

Shipped

Next

Try the binary

Frequently asked

The binary is alive.

Your vision,
AW execution.

Browser `/live` — H3