Claude Code Setup Log #7: Hermes, GBrain, and Treating AI Work as an Operating System

Changing it up a bit this week.

For the last few weeks I’ve been hitting usage limits faster than I want to admit. Part of that is the 1M-context / token-burn issue I wrote about last week. But part of it is just me pushing the workflow harder: multiple coding projects, multiple background agents, GBrain maintenance, Slack/email/calendar ingestion, transcript processing, finance summaries, and random one-off automations all competing for the same Claude Code lane.

That forced a useful question:

What if Claude Code should not be the place every agent job runs?

This week I finally started treating the setup more like an operating system:

Claude Code for deep project work where the Claude-specific skills/plugins matter most.
Hermes for orchestration and scheduled agent jobs.
GBrain as the memory layer across my vault, transcripts, Slack, email, calendar, people, and decisions.
Codex as another serious coding lane instead of a fallback.
Local/open models for bounded summarization and maintenance jobs.
Deterministic scripts for the boring parts that should not be left to a model.

Still early, but the shape feels right.

The lineage here matters too. Andrej Karpathy’s LLM Wiki framing is what made me want durable, compiled knowledge instead of making the model rediscover context every session. Garry Tan’s open source GBrain is the version I’ve been able to actually run with: markdown as source of truth, plus retrieval that makes the brain usable by agents at scale.

1. Hermes as the router, not just another chatbot

The problem I kept running into was not “I need one more model.” It was “I need somewhere for background work to run that doesn’t consume my main Claude Code workflow.”

My old habit was to put almost everything into CC: explore this repo, update this doc, run this collector, summarize this output, check this schedule, debug this cron, search the brain, create the follow-up task. That works fine when you’re running one or two jobs. It gets ugly when you have five long-running things happening in parallel and every one of them wants context, tool calls, and quota.

Hermes is becoming the router for that.

Right now I have 19 scheduled Hermes jobs running against the setup. A few examples:

email-to-brain at 6:35am
calendar-to-brain at 6:00am
morning GBrain health report at 6:25am
Slack collector/import at 2:30pm
GBrain graph refresh at 7:25pm
nightly dream / maintenance jobs
weekly transcript enrichment and reconciliation
model refresh checks
vault document optimization

The pattern I like most is simple:

Deterministic script collects or transforms the data.
The model reads structured output.
The model summarizes, flags judgment calls, or updates the brain.

That split matters. I do not want an LLM deciding how to paginate Slack, manage OAuth state, detect duplicate thread files, or parse calendar events. Code should do that. The model should answer questions like: What matters here? Is this person-page-worthy? What should Aaron see? What can stay silent?

That is a much better use of tokens.

The bigger workflow change: a lot of useful work can now happen when I’m not staring at the terminal. Hermes can run the boring jobs, send me the short version, and leave Claude Code quota for the work where I actually need Claude Code.

2. GBrain as the memory layer

The thing I’ve wanted for a while is an agent that remembers across tools without me re-explaining my life every session.

Not “memory” as in a few preference bullets.

Real memory:

What projects am I running?
Who did I meet with?
What did I promise?
What decisions did I make?
What do I keep praying about?
What Slack threads mattered?
What did last week’s agent already discover?
What setup detail should never have to be rediscovered again?

That’s what GBrain is starting to become.

My local GBrain is currently on version 0.22.4, with 8,509 pages indexed and 100% embedding coverage. The content is not just one folder of notes. It’s the broader operating context: Obsidian vault, meeting transcripts, Slack imports, email summaries, calendar events, people pages, decisions, daily planning, finance notes, Hermes change logs, and Claude session history.

That changes the prompt pattern.

Old version:

“Claude, here’s 12 paragraphs of context about what happened, who this person is, what I tried before, where the files live, and what I care about.”

New version:

“Check the brain first, then answer.”

I open a lot of prompts now by typing some version of: “check my brain first, then ____.” That small habit change has been one of the clearest signs that the system is starting to work.

That is a small sentence with a big workflow change behind it.

It’s not perfect. The current brain score still has work to do, especially around graph and timeline coverage. Some pages need cleanup. Some ingestion paths are too noisy. Some entity resolution needs human judgment. But the direction is the part I care about: the agent stops being a blank chat window and starts becoming an operating layer over accumulated context.

The surprising part is how much this changes my trust level. If an agent can search prior sessions, read the current setup docs, inspect the changelog, and see today’s calendar context before answering, I spend less time re-grounding it and more time asking it to reason.

3. Codex as a second serious coding lane

Usage limits forced this one, although I still use Claude Code primarily.

I still use Claude Code heavily. The ecosystem around it is excellent: skills, MCPs, plugins, browser tooling, session management, slash-command workflows. But running every project through one quota bucket does not scale once agents become part of daily operations.

A lot of people are trying to recreate Claude Code-style orchestration around Codex. I’ve played with that setup too. But my current answer is simpler: use Codex as a coding model inside my Hermes/GBrain harness, and let the harness handle routing + memory.

Hermes is now configured with OpenAI Codex / gpt-5.5 as the default provider, with a 400k context length. It lets me split jobs by shape:

Claude Code: deep project work where the CC skills/plugins matter most.
Codex: sessions that do not need the full CC setup.
Hermes/GBrain: routing, scheduled jobs, and memory.
Local models: cheaper summaries and recurring audits.
Scripts: the boring reliable glue.

The question becomes less “which model is best?” and more “which lane should this job run in?”

That feels like the right abstraction.

A lot of AI tooling discourse still treats model choice as the center of the workflow. I increasingly think routing is the center. A good setup should know when to use the expensive long-horizon agent, when to use a cheaper summarizer, when to use a local model, and when no model should be involved at all.

4. The lesson: parallel agents need operations

The deeper lesson this week is that once you start running agents in parallel, the bottleneck stops being prompt quality and starts being operations.

The questions become more boring and more important:

What runs where?
What gets remembered?
What gets scheduled?
What wakes me up?
What stays silent?
What is deterministic code vs model judgment?
What gets written back to the brain?
What should be retrievable six months from now?
What should never enter the context window in the first place?

That is the layer I’m building now.

Claude Code is still the best deep-work coding interface for a lot of my day-to-day work. But the rest of the system needs to exist around it: Hermes for orchestration, GBrain for memory, Codex/open models as extra lanes, and scripts as the boring reliable glue.

The end state I’m aiming for is simple: less re-explaining, less quota burn, more background progress, and an agent that gets smarter because the system remembers what actually happened.

Curious how others are splitting this up. Are you still mostly in one AI coding tool, or are you starting to build a router around Claude Code, Codex, local models, and open-source agent frameworks?