We got lots of new AI models in 2025. Here is your cheatsheet (which to use for what)
And a review of all the apps that let you use all the models at the same time!
Happy March 1st! I’m replacing the lighting fixture in my front room and have only had to go to Home Depot 3 times so far. So that is progress. Today let’s talk about AI models and which are good for what.
We are two months into 2025 and just about every week there have been major new announcements in the land of AI.
Here's a cheatsheet.
What AI models were released in 2025 so far, and what are they good at? ✨
Techcrunch wrote a partial rundown which I condensed, added to, and then expanded here. In order of release, from most recent release to the oldest.
OpenAI’s GPT 4.5 ‘Orion’
https://openai.com/index/introducing-gpt-4-5/
Price: $200 a month
GPT 4.5 is hot off the press. 4.5 is OpenAI's biggest model. It is expressive and accurate, and very good at writing. 4.5 underperforms on certain benchmarks compared to newer reasoning models. It is also expensive, and slow. Orion is available to subscribers of OpenAI’s $200 a month plan.
GPT‑4.5 has access to the latest up-to-date information with search, supports file and image uploads. It passed all of Roboflow’s vision tests but one.
Use GPT 4.5 if you want: A great writing buddy, or a personal “friendly AI”
Claude Sonnet 3.7
https://www.anthropic.com/news/claude-3-7-sonnet
Price: limited free use, $20/month for full access, API available
Claude 3.5 was released 8.5 months ago, but was still the best model for coding. That is until 3.7 came out.
Claude 3.7 is great at coding. Lots of coding tools like Cursor and Windsurf immediately added 3.7.
However, Claude 3.7 is great at coding, but less good at following instructions. It sometimes makes way too many changes, and doesn’t listen to you.
Free users can access Claude 3.7 Sonnet for basic tasks like writing, summarization, and general Q&A, but Thinking Mode is disabled. 3.7 is multimodal and can handle files, images, but not videos natively, and it doesn’t search the web.
Claude Pro users (the $20/month paid plan) get full access to Thinking Mode, along with higher message limits and priority access during peak usage times.
Use Claude 3.7 if you want: The best AI for coding (but can go too far)
xAI’s Grok 3
Price: $8 a month
Grok 3 is the newest release from xAI/X/Twitter. It is good at lots of things, and includes a very useful Deep Research mode they call Deep Search. It’s claimed to outperform other leading models on math, science, and coding. It is available to X Premium users (which costs $8 a month.)
Grok 3 is really smart and understands small details. It can do chain of thought. Overall it feels like Grok3 is a novel and different AI model.
Grok 3 is multimodal and can handle text, images, files but not videos, and it natively searches the web, including Twitter/X which others can’t get as much access to.
Roboflow hasn’t tested Grok 3 vision capabilities yet.
Use Grok 3 if you want: A great all around AI that includes Deep Research, and chain of thought reasoning, and results from Twitter/X
OpenAI o3-mini
https://openai.com/index/openai-o3-mini/
o3-mini is OpenAI’s latest reasoning model and is optimized for coding, math, and science. It’s smaller and significantly lower cost. It is available for ChatGPT Plus, Team, and Pro users.
OpenAI's o3-mini-high excels at reasoning tasks (second only to o1-pro), offers strong math performance comparable to o1, provides function calling support, reveals summarized chain-of-thought, operates with superior speed, and delivers exceptional value (15x cheaper than o1 with similar capabilities).
It is not a good at coding as other AIs.
o3-mini is multimodal, and can handle text, images, files. It can search the web.
It passed all of Roboflow’s vision tests but two.
Use o3-mini if you want: Super solid STEM and reasoning, at much cheaper than o1.
OpenAI Deep Research
https://openai.com/index/introducing-deep-research/
Price: $200 a month
OpenAI’s Deep Research pretty amazing. My feed is full of smart people I trust saying it can deliver a report that you would get from an intern researching something for a few days, in a few minutes. Designed for doing in-depth research on a topic with clear citations, this service is only available with ChatGPT’s $200 per month Pro subscription.
Use OpenAI Deep Research if you want: The best researching AI available.
Mistral Le Chat
https://mistral.ai/products/le-chat
Price: Free
Mistral has launched app versions of Le Chat, a multimodal AI personal assistant. Le Chat responds faster than many other chatbots. Available for free, it also has a paid version with up-to-date journalism from the AFP. It is fast, but it is more limited and makes more errors than others.
Le Chat is multimodal and can handle text, pdfs, and images and can search the internet for new info.
Use Le Chat is you want: Super fast responses, and another free option.
OpenAI Operator
https://openai.com/index/introducing-operator/
Price: $200 a month
OpenAI’s Operator is meant to be a personal intern that can do things independently, like help you buy groceries. It requires a $200 a month ChatGPT Pro subscription. Still gets stuck a lot but people are experimenting with it.
Operator is multimodal and can handle text, pdfs, and images and can search the internet for new info.
Use Operator if you want: an AI agent controlling your computer.
Google Gemini 2.0 Pro Experimental
https://deepmind.google/technologies/gemini/pro/
Price: $19.99 per month
Google Gemini’s much-awaited flagship model is very good at coding and understanding general knowledge. But the biggest differentiator is it has a super-long context window of 2 million tokens, helping users who need to quickly process massive chunks of text.
2 million tokens is ~3,000 pages of text. The average person would need 200 days of talking to fill 3,000 pages.
Gemini is multimodal, meaning you can prompt with text, images, video, or code. It can also search.
The service requires (at minimum) a Google One AI Premium subscription of $19.99 a month.
Use Gemini 2.0 Pro if you want: Super large context window to input lots of text, or you need to input video.
Alibaba Qwen 2.5-Max
Released on January 29, 2025, during the Lunar New Year, this model from Alibaba claims to outperform GPT-4o, DeepSeek-V3, and Meta’s Llama 3.1-405B on various benchmarks. Its timing suggests a rushed response to DeepSeek’s momentum in the Chinese AI market.
Qwen 2.5 Max hasn’t really made a splash in the US. It has multimodal capabilities meaning it can processes text, images, audio, and video in 29 languages, including Mandarin, Arabic, and Hindi.
Qwen 2.5 Max can generate images and videos, but it can’t accept images as an input, just text and files. It can also search the web.
Use Qwen 2.5 if you want: To try another newer model? (I don’t have a good use case for this one)
DeepSeek R-1
https://api-docs.deepseek.com/news/news250120
DeepSeek took the internet by storm for a few reasons. 1. It was a big release from a previously unknown company. 2. It does really well on lots of benchmarks. 3. It open sourced some of the model to allow others to run it. 3. It exposed the chain of thought reasoning so you could ‘see the model thinking’ which give people more confidence in the answer, and 4. it was much cheaper than some of the other reasoning models at the time.
Other models were doing reasoning, but DeepSeek chose to expose that and people loved it.
There are some questions about data security and privacy but it is a great model and available in other places like Perplexity run on US servers if you worry about that.
DeepSeek R-1 can only accept text as an input. You can put images in but it will just try to extract text and won’t ‘look’ at the image itself.
R-1 can search the web and on release was the first AI to support reasoning AND search at the same time, but others were released after that also do that like Deep Research and Grok, and Gemini.
(I’ll have a longer breakdown of DeepSeek coming in the next few weeks)
Use DeepSeek R-1 if you want: chain of thought reasoning, and a cheaper API
Summary
All models can handle text and files.
Most can use images as input.
Only Gemini can do video out of the box (you can break any video down into images but that takes pre-processing)
Only Claude 3.7 can’t search.
I’ll update the chart to include reasoning and chain of thought when I write the DeepSeek article.
Josh’s Picks
The newest model isn’t always the best. These are the models I recommend for the 3 tasks of coding, research, and general purpose AI.
If you need video, Gemini 2.0 is the only model in the game.
Still not sure which to use? Use them all in one app!
Multi Model Apps
Knowing which model is right for which tasks can be a little bit hard. I think we need intelligent model routing (I’m writing up that idea on my other newsletter) but until we have intelligent model routing, it can be very useful to try a few models on the same task, to see which performs better, which you like better, and which has better vibes.
There are a few ways to use multiple models at the same time:
Manual! (just open a bunch of tabs)
Chorus (Mac app)
T3.chat
Poe
Manual! Multi Tab Approach
I literally most of the AI chat/models in my bookmark bar and will open multiple ones and give them the same prompt by copying and pasting.
Upside is this approach is free. Downside is lots of copying and pasting.
Chorus (Mac App)
The Mac app Chorus lets you chat with all the models at the same time. And when I say the same time, literally at the same time. You can see all the models responding in their own column.
You get 100 free requests. After that you have to pay for Pro, or bring your own API keys.
T3 Chat
Super fast web chat interface made by programmer/stream Theo (https://x.com/theo) T3.chat lets you pick from a bunch of models for each individual chat. Theo adds models usually the same day they come out, and the whole interface is snappy.
You can use their cheapest models limited for free or pay for all models at $8 a month.
Poe
Poe is a multi model web chat (with mobile and desktop apps as well) from Quora (kinda weird connection but we will go with it)
You can use it a little for free or subscribe for $5 a month for more usage.
Other options that I haven’t personally reviewed or used yet include:
https://faune.ai/ (Mac App)
https://teamai.com/multiple-models/ (web app)
https://lmstudio.ai/ (local LLMs)
https://infermatic.ai/ (lesser known LLM models)
https://chathub.gg/models
https://chromewebstore.google.com/detail/multigpt-access-all-chatb/dfobejficjaelohpjceiicphofmmglop (Chrome extension)
https://www.chatplayground.ai/
Final thought
You don’t have to keep up with every single AI release. But you should be experimenting to:
find ways to use AI to dramatically speed up your workflow,
or make it easier to do your job,
or just to have fun.
AI models are eager interns that want to work for you all day for pennies. Figure out how to use them to make the world better!
2025 has been a wild AI ride so far, and it is only going to get crazier.
Did I miss your favorite model? Let me know in the comments.
Solid and comprehensive. Appreciate this.