My AI Assistant Recommendations: A June 2025 Check-In
A snapshot that will be outdated by December, but useful for now
I get the same question at least three times a week. My dad asks while trying to write better prayers. A manufacturing executive needs help with SEO copy for his website. A realtor wants to review inspection reports with her clients.
“Which AI assistant should I use every day?”
I was scrolling through my responses yesterday when I realized I follow the same mental process each time. It’s not totally systematic, but it gives me a useful snapshot of where things stand in June 2025.
This will be outdated fast. But I want to capture what I’m recommending right now and why.
My Current Rankings
Here’s how I answer that question today:
1. Claude with Opus 4 → Sonnet 4 if budget matters
Anthropic seems to have figured something out that others haven’t. The company feels like it has “taste” dialed in. Both models think through problems in a way that matches how I actually work.
The catch: you’ll hit usage limits quickly unless you’re on their Max plan ($100-$200/month). But for complex business problems, it’s worth it.
2. Google AI Studio with Gemini 2.5 Pro for the budget-conscious
If you don’t need the prettiest interface and work mostly on desktop, this is your move. It’s essentially free with generous limits. The model itself is genuinely strong - recent benchmarks show Gemini 2.5 Pro scoring 63.2% on SWE-bench for coding tasks, which puts it in serious contention.
Google’s approach feels different. Where Claude thinks through problems step by step, Gemini processes everything at once and gives you the result. Both work, but the experience is distinct.
3. ChatGPT with o3 → o4-mini-high → 4o for beginners
If this is your first serious dive into AI assistants, start here. The interface is polished. The learning curve is gentle. And ChatGPT’s memory feature helps it understand your preferences over time, which makes daily use smoother.
It’s not just familiarity bias. ChatGPT handles the transition from casual user to power user better than the others.
What Actually Determines Quality
People think model performance comes down to raw intelligence. That’s part of it, but there’s more happening behind the scenes.
Each company wraps their models differently. The system prompts, the safety rails, the way they handle context - it all shapes what you get back. This is why Claude Cod excels at coding while Lovable produces better landing pages, even when using similar underlying models.
Your prompting matters more than the model choice. I’ve watched people blame a model for “hallucinating” when they’re actually feeding it ten irrelevant PDFs and asking vague questions. Context management isn’t optional.
My Testing Approach
I run the same prompt across Opus 4, Sonnet 4, o3, and Gemini 2.5 Pro regularly. Not for everything - that would be exhausting. But for important projects where I want to see different approaches to the same problem.
Each model has a personality. Claude tends to be thorough and structured. ChatGPT balances creativity with practicality. Gemini excels at mathematical reasoning without external tools.
I also test web search features against Perplexity and Grok. The results vary enough that it’s worth the extra effort for research-heavy work.
A Few Specifics Worth Noting
On Perplexity: I get asked about this constantly. In my experience, using o3 through Perplexity isn’t as good as using o3 directly in ChatGPT. The multi-model platforms usually add a layer that degrades performance. But Perplexity’s Sonar API is excellent for building apps that need web search.
On Grok: I want it to be better at finding and using X content than it is. The real-time access is useful, but the execution feels unfinished.
On pricing: Claude 4 Sonnet offers the best balance of performance and cost at $15/$30 per million tokens, compared to Claude Opus at $15/$75. For most business use cases, Sonnet gives you 90% of Opus performance at half the price.
Why I Use Seven Different Services
Two reasons. First, I need to know these tools to advise clients properly. Second, I enjoy optimizing and comparing outputs.
But for most people, pick one primary assistant and get really good at using it. The switching costs add up, and each platform has different prompting styles that work best.
What Hasn’t Changed
The fundamentals remain the same. Clear context gets better results. Specific requests work better than vague ones. And the person who takes time to learn effective prompting will outperform someone using the “best” model poorly.
I’m also seeing that the gaps between top models are narrowing. Claude 4 Sonnet, o3, and Gemini 2.5 Pro all perform within a few percentage points of each other on most benchmarks. The choice increasingly comes down to interface, pricing, and which ecosystem you’re already in.
Looking Forward
This landscape moves fast. What I’m recommending in June might be completely different by December. New models drop monthly. Pricing changes. Features get added.
But the patterns I’m seeing suggest we’re moving toward specialization. Claude is positioning itself as the coding champion, with GitHub planning to use Sonnet 4 as the base model for its new coding agent. Google is leaning into multimodal capabilities and enterprise integration. OpenAI is focusing on being the reliable generalist.
The question isn’t which assistant will “win.” It’s which combination of tools will make you most effective at your specific work.
For now, that’s my take. Ask me again in six months.