Sharing a few things that worked in my agent setup this week. No single theme this time. Three separate setups: keeping draft review in a file, letting the agent run its own verification, and turning Codex loose on years of Gmail rules.
1. Keeping draft review out of chat
I draft a lot of Markdown with agents. Plans, posts, specs, finance notes. Drafting with agents works. Reviewing their drafts is where I lost things.
I’d lose the feedback in the transcript. I reacted to a draft in chat, spread notes across a dozen turns, and the agent’s next pass caught some and missed the nuance on the rest. Half the plan sat in my head, half in the chat log. A day later I couldn’t tell which version was current.
This week I started using Roughdraft (Nathan Baschez, open source) to turn a Markdown file into a local review surface. The agent writes the draft to disk, opens it in Roughdraft, and waits while I mark it up with comments and suggested edits in CriticMarkup. Then it reads my markup back from the file and revises from it.
The review feels closer to working in a Google Doc with a teammate than chatting with a bot. I highlight the exact sentence I want changed and leave the comment right there. The agent revises from that comment, anchored to that text, instead of a paraphrase I retyped into chat three messages later.
The waiting is the part that matters. The agent opens the file and holds the process until I click “Done Reviewing.” It won’t background that wait, kill it, or treat it as cleanup. My review is a step in the work, so the agent treats it like one.
Example use: “Write the plan to Markdown, open it in Roughdraft, wait for me to finish, then apply only the CriticMarkup edits I left and don’t touch the parts I didn’t mark.”
What it’s helped with: Review turned into a file I can point at. The comments, insertions, deletions, and substitutions live in it. That gives the next agent something concrete to read, and it gives me one artifact to come back to instead of a chat history to re-skim. It runs as a local app, no cloud and no account, so drafts I’d never paste into a hosted tool can go through the same loop.
2. Letting the agent do the verification
The pattern I kept hitting: an agent produces output, then a person burns time checking it.
For knowledge work, that checking is the expensive part. It means opening the sources, a CRM record or a doc or a spreadsheet, comparing names, dates, and numbers against what the agent wrote, then deciding what’s safe to send and what needs another set of eyes. The agent saved 20 minutes on the draft and I spent 30 verifying it. That math doesn’t compound.
So I flipped which side of the loop I sit on. Before the agent stops, I define what good looks like, give it the sources or tools to check itself against that definition, and tell it to loop until the output passes.
One concrete example. For a sales rep’s cold outbound email, “good” means every personalization claim is true. The account detail, the recent news, the title, the prior touch. The agent gets Salesforce access and a rule: check each claim in the draft against the CRM fields, the account page, and prior emails, and keep going until everything lines up. I’m not reopening the same Salesforce tabs after it stops. I read the result and the trail of what it checked.
Example use: “Write this cold outbound email and then verify the personalization facts against what’s in Salesforce. Use the Salesforce MCP to find the specific fields referenced and verify that the drafted email’s contents align with what’s in SFDC. Loop until everything is verified.”
What it’s helped with: Delba de Oliveira had a great post this week on giving agents a feedback loop for this. The framing that stuck with me: the best instructions teach the agent how I’ll know the work is done. That’s a higher bar than telling it what to do. Once that check lives in the prompt or the skill, I run it once by writing it down instead of every time by hand. Next I want to promote the checks I still do by hand, the browser click-through, the test run, the diff read, into skills and a clean-context reviewer that inspects the work before I see it.
3. Letting Codex clean up years of Gmail rules
My Gmail filters and labels had been piling up for years. I knew the pile was a mess. I also knew I was never going to audit hundreds of rules myself or fix them one at a time through the Gmail settings UI.
This is the kind of chore that sits on a list for years. The work is tedious, the stakes feel high (a bad filter can trash mail you wanted), and there’s no satisfying way to do it by hand. A good thing to hand to an agent, as long as the agent works carefully.
I had Codex do it. It exported all my filters and labels through the Gmail API, audited them against context from my second brain about how I work and what I care about, and applied the approved changes. The context mattered. The agent wasn’t guessing at which newsletters I read or which senders I route where. It had my patterns to reason from.
It found stale and overlapping rules, consolidated the redundant ones, and cut the clutter hard: 637 filters down to 320, and 79 labels down to 33. Roughly half of each, gone, without me clicking through a single settings page.
Example use: “After Gmail API OAuth and settings/labels permissions are set up, audit my Gmail filters and labels. Use my mailbox patterns and second-brain context to propose what to keep, merge, rename, or delete.”
What it’s helped with: I cleared a backlog chore I’d avoided for years in one session. The shape is the reusable part. A boring high-volume cleanup, an API that exposes the state, and second-brain context so the agent prunes the way I would. The same setup applies to a lot of accumulated mess beyond email.
No tidy theme this week. Three changes, each one making a different part of my week less annoying. More setup notes next time.