Two items this week. Both are downstream of the same shift: Claude Code now does much longer-horizon work than it used to, and the workflow has to catch up.
Opus 4.7 shipped yesterday, April 16. Opus 4.6 got a 1M token context window in March. Both push in the same direction — one agent session can now run tasks that used to take three. Great, except my old habits stopped scaling. I was coming back to a 40-minute background session with no idea whether the code actually worked. And my weekly usage limits were evaporating roughly twice as fast as they used to.
This log is two fixes for those two problems.
1. The /go pattern from Boris Cherny: give Claude a way to verify its own work
Boris Cherny — the creator of Claude Code — posted a six-item thread on getting the most out of 4.7. All six are good. The one that hit me hardest was #6:
Give Claude a way to verify its work. This has always been a way to 2-3x what you get out of Claude, and with 4.7 it’s more important than ever.
Verification means different things for different stacks. Backend: make sure Claude knows how to start the server and hit it end-to-end. Frontend: use a browser control tool (Claude Chromium extension, browser MCP, agent-browser). Desktop apps: computer use.
What I like about Boris’s version is that he’s packaged the whole thing into one trigger:
Personally, many of my prompts these days look like “Claude do blah blah blah /go”. /go is a skill that has Claude
- Test itself end to end using bash, browser, or computer use
- Run the /simplify skill
- Put up a PR
The sequencing is the insight. Each step catches different mistakes. The end-to-end test proves the code actually runs. /simplify strips the over-engineered bits Claude tends to add during a long session — dead abstractions, premature helpers, the defensive try/catch around code that can’t fail. Putting up a PR forces the work to cohere into a real reviewable artifact rather than a half-finished branch.
Why this matters more with 4.7. For a while I’ve been treating “did Claude actually run this?” as the tax I pay for using Claude Code. I’d come back to a long session, skim the tool calls, and spend 20 minutes re-running things it claimed worked. The underlying problem is what I’ve come to think of as wrong-approach drift: Claude picks a suboptimal strategy (serial instead of parallel, external DB instead of local dev DB, the wrong git command) and runs with it for a while before I notice. That drift is the #1 source of wasted time I have with this tool. Verification at the end of the loop forces the reality check early.
How I’m adapting Boris’s pattern. I had the individual pieces — a browser-verify skill, agent-browser for UI walkthroughs, a manual /simplify pass, a commit-and-PR script. What I didn’t have was a single handoff that chains them. I’m wiring up a /go skill that branches on project type:
- Backend: boot the dev server, run integration tests, hit the new endpoint with curl, check the response.
- Frontend: start the local Next.js dev server, launch agent-browser against
pandadoc.dev, walk the happy path, read the console, verify no errors. - Library code: run
vitestagainst the changed files, then the type checker.
All three branches end the same way: /simplify → commit → gh pr create with a description that includes what was tested and what’s left.
What changes when you actually do this. I ran a long background task overnight last week — a dashboard refactor Claude Code was working on in a separate pane while I was asleep. I came back in the morning to a green build, a PR description, and a simplification diff. I trusted it enough to merge without rereading the whole thing. Without verification baked in, that same task would have been a 30-minute morning spent re-verifying everything by hand.
Boris’s claim that verification is a 2-3x multiplier matches my experience informally. It’s probably the single highest-leverage change I’ve made to my Claude Code workflow in the last month.
Link: Boris Cherny’s thread on X
2. Capping the 1M context window at 400k: why my weekly quota is burning 2-3x faster, and how to slow it down
For the last three weeks I’ve been hitting my Max-plan weekly limit 2-3x faster than I used to. It’s been the most frustrating part of my workflow. I thought it was just me — too many parallel sessions, too many subagents. It wasn’t just me.
Three compounding changes explain it:
1. Opus 4.6 expanded to a 1M token context window in March. Bigger working window means more tokens per turn. Every file read, every tool call, every model response gets billed against a larger pool. Claude Code’s autocompact only triggers near the top of the window, which means one long session can consume an enormous amount of weekly quota before it ever summarizes. The r/claude thread titled “Anthropic broke your limits with the 1M context update” hit 368 upvotes and 164 comments. Widespread, not just me.
2. Opus 4.7 ships with a new tokenizer that uses up to 35% more tokens than 4.6 for the same prompt. This one I didn’t know about until an r/ClaudeAI PSA went up on day one of the 4.7 release. The tokenizer change is in the release notes; the usage implications are not. 35% more tokens for the same conversation means 35% faster quota burn, on top of whatever the 1M window was already doing. It’s also likely why a lot of “why did my session die early today?” posts spiked yesterday.
3. Prompt cache TTL dropped from 1 hour to 5 minutes earlier this month. The cache is what makes iterative work on the same codebase cheap — repeated reads of the same system prompt and file contents get discounted. A 5-minute TTL means any session where I step away for coffee starts paying full price again. Compounded with the above, it’s the third multiplier.
None of these are problems individually. All three at once means my weekly quota is paying a compound tax.
Two things have helped.
Cap the autocompact threshold yourself
Thariq on the Claude Code team pointed out an environment variable I had missed:
CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude
That caps the working window at 400k tokens instead of 1M. Still massive — Sonnet’s default was 200k for years and nobody complained. But it forces compaction to happen earlier, which keeps per-turn token usage lower and stops a single long session from consuming a third of my weekly quota in one sitting. I’ve added it to the shim I use to launch Claude Code so every new session starts at 400k by default.
Use the session management tools deliberately
Anthropic’s post this week on session management and the 1M context window is worth reading in full. The parts that actually changed my workflow:
/rewindinstead of retyping corrections. When Claude heads down the wrong path, I used to say “no, actually do X instead” and let it course-correct. That leaves the failed attempt in context forever./rewinddrops back to an earlier message and discards the bad attempt while keeping the file reads that were useful. The pattern I’ve landed on for serious drift: spin up a new Claude Code instance with/branch→/rename→/rewindto fork the session cleanly rather than keep trying to salvage the current one. Net: less context rot, fewer wasted tokens./clearover/compactwhen I know what to keep./compactis automatic and lossy./clearis manual and surgical — I tell Claude exactly what context to carry forward. For task switches between unrelated work,/clearis almost always the right call.- Subagents for tool-call-heavy work. If I’m going to have Claude read 40 files to answer one question, that’s a subagent job. The intermediate reads don’t need to live in my main context. This has noticeably changed how much quota an “explore this codebase” session costs.
- Be selective about skills per project. This one came out of a separate conversation — Boris himself commented on a GitHub cache thread that people are bloating context with too many skills loaded per project. Every skill SKILL.md adds system-prompt tokens to every turn. After reading that, I went through my skill library and cut the auto-loaded set in half for most projects. That alone trimmed noticeable tokens off every turn.
Result: my sessions are shorter, my /rewind usage is up, and I’m no longer waking up on Wednesday morning to a “You’ve used 70% of your weekly limit” warning.
Link: Anthropic’s session management post
Credit where it’s due: the fact that Anthropic published this post this week tells me the team heard the “limits feel worse” feedback loud and clear after the 1M context change and the 4.7 tokenizer shift. Kudos to Thariq and the Claude Code team for doing the user research and putting out a concrete response rather than waiting it out.
Both items are the same shape of lesson. The defaults are tuned for short tasks. When you start doing long-horizon work with Claude Code, you have to bring your own verification (Boris’s /go) and your own context discipline (Thariq’s CLAUDE_CODE_AUTO_COMPACT_WINDOW, Anthropic’s session tools, Boris’s skill selectivity). Otherwise the longer horizon just amplifies whatever was already wrong with your loop — including the 35% extra tokens 4.7 is now charging you for each one.
Curious what thresholds other heavy Claude Code users have landed on for autocompact, and whether anyone has a /go-style verification skill they’re willing to share. I’ll post mine once it handles the three project types cleanly.