Claude Code Setup Log #3: Multi-Model Review, Doc Cleanup, and Terminal Upgrades

More notes on what’s been working for me in my Claude Code setup.

1. /counselors for multi-model code review

When I asked Claude to review code that Claude wrote, I kept finding things it missed. I was running multiple review passes to feel confident, which defeated the point.

Aaron Francis built counselors, a CLI that dispatches a review prompt to multiple AI agents in parallel. Different models, different providers, different blind spots. I point it at a spec or diff and get back independent reviews.

I have 10 agents configured across three providers:

Provider	Agents
Anthropic	Claude Opus, Sonnet, Haiku
OpenAI	GPT-5.3 Codex (medium, high, xhigh reasoning)
Google	Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 3 Flash, Gemini 2.5 Flash

The agents are all read-only enforced — they can explore the codebase and run git commands but never modify files. There’s a loop mode for deeper analysis where each round’s findings feed into the next, and built-in presets for common patterns: bughunt, security, contracts, regression, invariants, hotspots.

counselors run -f review-prompt.md --group best --json

The moment that sold me on this: On a Snowflake integration spec, Claude and Codex both approved the schema but Gemini flagged that my date filtering would silently return empty results for future-dated records. That one disagreement between models would have taken a week to surface in production.

The pattern I keep seeing: when two models agree but the third doesn’t, that disagreement is almost always worth investigating. I run the “best” group (Opus + Codex 5.3 + Gemini 3.1 Pro) before merging. All agents work in parallel with a max concurrency of 4, and a full review takes about 20 minutes of wall time.

2. /cleanup-docs for organizing scattered markdown files

Working with Claude Code, every repo accumulates loose markdown: specs, research notes, completion reports. After a few sessions you have 40+ files scattered across the directory and no idea what’s current.

/cleanup-docs is a skill that scans the repo, classifies each file by content — not filename — and proposes an organized structure, explaining why each file goes where. It runs in five phases: discovery, strategic planning, self-critique, presenting the plan for approval, and execution. Nothing moves without explicit approval.

The classification is content-driven. A file titled notes.md that contains “next steps” and “action items” goes to .project-status/. A file titled research.md that contains past-tense completion language goes to .project-status/completed-tasks/. It reads the content, not the name.

After approval, it uses git mv for tracked files to preserve history, stages everything, and generates a completion report.

Concrete result: Ran it on a repo with 47 docs and had them organized in about 3 minutes. It grouped them into docs/specs/, docs/research/, docs/implementation/, .project-status/completed-tasks/, and _archived/ — with per-file rationale for each decision. The self-critique phase caught two misclassifications before I even reviewed the plan. I used to spend 30+ minutes doing this manually, and I’d still miss files.

3. Anthropic’s /skill-creator for packaging reusable skills

Whenever I create or update a skill using Claude Code, I run Anthropic’s official /skill-creator skill to make sure Claude Code follows best practices.

The design principle that changed how I think about skills: the context window is a public good. Every skill competes for tokens with your conversation, your system prompt, and the actual work. So skill-creator enforces progressive disclosure:

Metadata (name + description) — always loaded, ~100 tokens. This is what triggers the skill.
SKILL.md body — loaded only when the skill triggers, under 500 lines.
Bundled resources (scripts/, references/, assets/) — loaded only when Claude decides it needs them.

Before this, my skills were 400-line monoliths burning context every conversation. I had a debugging skill that loaded into every single session, whether I was debugging or not. After restructuring it with skill-creator, the core instructions are 80 lines and the detailed diagnostic steps live in references/ — only loaded when Claude is actually debugging. Idle cost went from 400 lines to ~100 tokens.

The init script (scripts/init_skill.py) generates the directory structure. The packaging script (scripts/package_skill.py) validates everything and creates a .skill file for distribution. I’ve packaged 35+ skills this way — from Snowflake query patterns to Docker deployment workflows to OAuth setup guides.

Gives me confidence that all my skills are optimized and following best practices from Anthropic.

Side note: Recommend reading Thariq Shihipar’s post from this week on how the Claude Code team uses skills internally. Good context on why the progressive disclosure pattern matters so much.

Link: Ships with Claude Code. Run /skill-creator to use it, or install from Anthropic’s skill marketplace.

4. Portless for multi-app local dev

I’m sometimes testing 9 apps locally (love me some localhosts…). I was juggling port numbers and restarting apps because OAuth callbacks were hardcoded to one port.

Chris Tate’s Portless gives each project a stable .dev subdomain with HTTPS. Now I start one app at am-dashboard.dev, another at sales-engage.dev, and work across all of them.

portless am-dashboard npm run dev  # → https://am-dashboard.dev
portless sales-engage npm run dev  # → https://sales-engage.dev
portless voice-dash npm run dev    # → https://voice-dash.dev

A dedicated auth relay handles Google OAuth so no app doubles as a proxy. The relay decodes which .dev subdomain initiated the login and redirects back. I don’t need to register new callback URLs in Google Console when I add a project.

The result: One command per app, zero port conflicts. I used to burn 10-15 minutes every morning figuring out which ports were free and restarting things in the right order. Now I never think about ports.

The setup was not trivial — I’ve documented 17 distinct issues across 300+ sessions, including a subtle one where cut -d= -f2 silently stripped base64 padding from shared secrets, causing OAuth to “succeed” but never redirect back. But once it’s working, it just works.

Link: Portless on GitHub

5. cmux terminal for multi-agent sessions

I’m usually running 7+ Claude Code sessions simultaneously. The problem was knowing when an agent needed attention without clicking through every tab.

cmux is a native macOS terminal built on libghostty (Ghostty’s rendering engine — not a fork, it uses libghostty the same way apps use WebKit). The feature that matters: notification rings. When a Claude Code session finishes a task or needs input, the pane lights up with a blue ring, the tab gets an unread badge, and macOS fires a desktop notification.

My workflow: start three agents on different tasks, keep working, switch to whichever pane rings.

My current workspace setup: 7 workspaces across 4 repos, up to 4 split panes per workspace with parallel Claude Code sessions inside git worktrees. I color-code workspaces by priority — red for active feature work, magenta for config, purple for personal projects. The tab sidebar shows git branch, listening ports, and the latest notification text, so I can decide whether to switch or keep going without opening the pane.

What it fixed: Before cmux, I’d lose 5-10 minutes per session just clicking through tabs to check on agents. With 7 parallel sessions, that adds up fast. The notification rings cut polling to zero — I never check a tab that doesn’t need attention.

Link: cmux on GitHub (3,500+ stars)

If any of this is useful or you’ve built similar workflows, I’d like to hear about it.