Build a personal AI operating system with Claude Code

The phrase "personal AI operating system" is doing a lot of work, so let's strip it down. I don't mean an actual operating system. I don't mean a SaaS product. I don't mean a chatbot wrapper. I mean: the layer of tooling, conventions, and persistent state that turns Claude Code from a clever CLI into the daily substrate you build everything on top of.

Most people use Claude Code one project at a time. The shift happens when you stop treating each session as an isolated tool use and start treating the broader setup as infrastructure. Conventions accumulate. Knowledge persists. Dispatch becomes routine. The same setup that ran a one-off refactor last week is now running an agent fleet across six projects with no friction.

This post is about the layers that make that work, and the order to build them in.

What "personal AI operating system" actually means

There's a useful frame that separates tools from systems. A tool does one thing well. A system is what emerges when several tools share conventions, share state, and compose. A hammer is a tool. A workshop is a system.

Claude Code as a CLI is a tool. Claude Code combined with a tmux convention, a context vault, a manifest of your projects, and a dispatch grammar is a system. The cost of getting from the first to the second isn't infrastructure spend, it's discipline. You have to write the conventions down and follow them.

The payoff: tasks that took thirty minutes of setup take thirty seconds. The cold-start tax disappears. Each new project boots into a workflow that already understands itself.

The four layers

The setup that has worked for me has four layers, each load-bearing for the layers above it:

Layer 1: The command surface. A CLI binary that exposes the high-level operations of the system: create a workspace, dispatch a worker, list running agents, check resource pressure. The CLI is what your fingers learn. It's the closest thing the system has to a user interface, and the more it can do in one verb, the more your daily workflow accelerates. In my setup this is omni. The specifics matter less than the principle: one CLI, scoped to your operations, that you reach for instinctively.

Layer 2: Persistent state. A context vault for atemporal knowledge (insights, patterns, decisions). A workspace manifest for project metadata (paths, deploy commands, validation selectors, required tokens). A task store for in-flight work. These three are the system's memory. They survive every restart, every fresh session, every reorganization. The CLI in layer 1 is the surface; the state in layer 2 is the substrate.

Layer 3: Convention. The rules, written down, in a known location, loaded into every session. How to escalate. How to dispatch. When to restart. What goes where. What gets committed. These rules are what make the system yours: a stranger reading the rules can predict how the system will behave, and so can a fresh agent reading the same rules in a fresh session. Without conventions, the system is a pile of tools. With them, the system has personality and predictability.

Layer 4: Tier architecture. Once you have convention, you can have tiers. A long-running coordinator at the top. Workers spawned for specific tasks below. Clear escalation paths between them. The tier model is what lets the system handle work bigger than any single session, work that runs across days, across projects, across multiple parallel agents, without the operator becoming the bottleneck.

The order matters. You build the CLI first because everything else hooks off it. Persistent state next because convention without state is theater. Convention next because tiers without convention collapse into chaos. Tiers last, once the substrate underneath them is stable enough to support them.

Why a CLI is the right surface

The instinct to reach for a web dashboard, a desktop app, a Notion database, a custom IDE plugin, all of these are wrong, and the wrongness compounds.

A CLI wins for three reasons:

It's scriptable. Every command becomes a primitive other commands can compose. The web app pushes you toward GUI workflows that don't compose with anything. The CLI pushes you toward pipes and chains that build on themselves over years.

It's the surface agents already speak. Claude Code is a CLI. Every shell command is something an agent can invoke. The moment you build a web UI, you've put a wall between the agents and your tooling that the agents have to climb over to do their jobs.

It's diff-able. A CLI's behavior is captured in code and shell history. A web app's behavior is captured in a database you don't fully understand. When something breaks, the CLI is debuggable; the web app is opaque.

This isn't a religious anti-GUI stance. It's an empirical observation: the systems I've watched compound year-over-year are the ones where the operating surface is text. The systems that decay are the ones with shiny UIs that never keep up with the underlying changes.

The compounding decision

The most important decision in building a personal operating system isn't a technology choice. It's the decision to write conventions down and follow them.

Every agent run is an opportunity. The first time you encounter a pattern, you might solve it ad-hoc. The second time, the temptation is to solve it ad-hoc again. The third time, you have a choice: turn it into a rule, or pay the ad-hoc tax forever.

Rules are how the system gets smarter without you having to remember to be smart. A new session reading the rules behaves like a session that has seen the patterns before, even though it hasn't. That's the compounding. Without rules, you're the bottleneck on the system's intelligence; with them, the system can outgrow you.

The discipline isn't large. It's roughly: any time you correct an agent or work around a tool limitation, ask whether it generalizes. If yes, write it down where the next session will read it. That single habit, applied consistently, is most of what separates systems that compound from systems that don't.

What to build first

If I were starting from zero, in order:

A scratch CLI, even just a shell function or two, that wraps the operations you do most. Add commands as you find yourself doing the same shell incantation twice.

A vault of insights in whatever shape lets you write and search fast. A directory of markdown files works. The fancy MCP layer can come later.

A rules folder loaded into every Claude Code session. Start with five rules. Add one per friction event.

A workspace manifest that tells the CLI where your projects are. This enables identity-based references; the moment your scripts say "deploy the kaizen project" instead of hardcoding paths, moves and reorganizations stop breaking everything.

A tmux dispatch convention. One named session per long-running agent. Workers spawned by name. The supervisor is already on your machine; use it.

What to never build

A few things I've watched people start and not finish, all of which are red flags:

A web UI for any of the layers. See above.

A custom LLM wrapper around Claude Code. The wrapper goes stale faster than the underlying tool moves. Use the tool directly.

A monolithic configuration system. Per-workspace config beats global config; convention beats config; rules beat both.

A new file format. Your knowledge wants to be markdown. Don't fight it.

The summary, compressed: build the CLI, store the state, write the rules, run the tiers. Anything else is decoration. The actual system is the discipline of treating your own setup as infrastructure rather than as scattered tools, and the compounding of that decision, year over year, is what an "operating system" really means in this context.