IndustryJune 15, 202511 min read

The Great Model Race

How four companies fought for image generation supremacy in the year everything changed

“The best image generator is the one that makes you forget you’re looking at a generation.”

— David Holz, Midjourney CEO

There is a moment, familiar to anyone who has watched technology evolve, when a product stops being a novelty and becomes infrastructure. Email did it. Search did it. Smartphones did it. In 2025, AI image generation crossed that threshold—not with a single breakthrough, but with four simultaneous ones that split the landscape into competing visions of what synthetic imagery should become.

Let’s be precise about what happened. In March, OpenAI launched GPT Image 1—the model formerly known as “4o image generation”—and in doing so, quietly killed the DALL-E brand that had defined AI art for three years. The move was more than nomenclature. By embedding image generation directly into ChatGPT, OpenAI made a declaration: images are not a separate product. They are a feature of conversation. You don’t go to a tool to make pictures. You talk, and pictures happen.

The market’s response was immediate and telling. When users discovered they could upload a photo and say “make this look like a Studio Ghibli film,” the resulting viral wave crashed OpenAI’s servers. Sam Altman posted that their “GPUs are melting.” Free users were throttled to three generations per day. The Ghibli moment wasn’t just a meme—it was proof that the killer use case for AI images isn’t creation from nothing, but transformation of what already exists.

By December, OpenAI doubled down with GPT Image 1.5: four times faster, twenty percent cheaper, with text rendering good enough for infographics and marketing materials. TechCrunch called it part of OpenAI’s “code red warpath.” But speed and cost miss the deeper story. OpenAI’s thesis is that image generation belongs inside a general intelligence system. It’s not a canvas. It’s a capability.

Midjourney disagreed. V7, released in April 2025, represented what CEO David Holz described as “a totally different architecture” trained on entirely new datasets. Where OpenAI optimized for accessibility, Midjourney optimized for artistry. Draft Mode generates images ten times faster. Omni Reference lets you maintain character and style consistency across series. The hands are finally right—and if that sounds trivial, you haven’t spent three years watching AI art produce six-fingered horrors.

But V7’s real achievement is harder to quantify. The images feel considered. They have compositional weight. Light falls with intention. There’s a reason professional illustrators and concept artists still reach for Midjourney over anything else: it makes aesthetic choices, not just pixel predictions. It’s opinionated software, and in a market racing toward generality, that opinion is Midjourney’s competitive moat.

Then there’s FLUX, which might be the most interesting story of 2025 for reasons that have nothing to do with image quality. Black Forest Labs was founded by the researchers who built Stable Diffusion—the people who, in a very real sense, created the open-source AI image movement—after leaving Stability AI amid its well-documented implosion. Their twelve-billion-parameter model hit stride with FLUX.1 Kontext in May: a suite that treats image editing and image generation as the same operation.

The numbers are staggering. Inference speeds up to eight times faster than competing models. Open-source and commercial tiers that let anyone from a hobbyist to Adobe deploy the technology. Speaking of Adobe: by September, FLUX.1 Kontext Pro was integrated into Photoshop’s generative fill. When the tool that defines professional image editing chooses your model as a backend option, you’ve arrived.

FLUX produces the most photorealistic images in the field. Full stop. The detail work—pores, fabric texture, the way light scatters through a glass of water—is at a level where detection becomes genuinely difficult. This is not a boast from Black Forest Labs. It’s a problem for everyone else, including us.

And then Google showed up. Truly showed up, not in the half-hearted way of Imagen 2 or the overly cautious Gemini image features of 2024. Gemini 3 Pro Image, released in November 2025, delivered super-detailed visuals with art style control and fast inference. More importantly, Google did something nobody else bothered to do: they baked SynthID watermarking into every single output from day one. Ten billion pieces of content watermarked and counting.

Here’s the divergence that matters. OpenAI believes image generation is a conversational feature. Midjourney believes it’s an artistic tool. FLUX believes it’s infrastructure for other software. Google believes it’s a responsibility. Four philosophies. Four architectures. Four different answers to the same question: what are AI images for?

If 2023 was the year AI art arrived, and 2024 was the year it explored, then 2025 is the year it professionalized. Global investment in generative AI solutions tripled, reaching roughly thirty-seven billion dollars. Nearly nine in ten enterprises now deploy AI in at least one business function. The toy became a tool.

But professionalization brings something the early enthusiasts didn’t anticipate: fragmentation. There is no longer a single “best” model. There are best models for specific workflows, specific aesthetics, specific ethical frameworks. The great model race of 2025 didn’t produce a winner. It produced a market. And markets, unlike races, don’t end. They evolve, consolidate, and—eventually—regulate.

For those of us in the detection space, the implications are sobering. Every new architecture means new artifacts to learn, new fingerprints to catalog, new evasion patterns to anticipate. When one company controls the landscape, detection is a cat-and-mouse game. When four companies are simultaneously pushing four different boundaries, detection becomes a cat-and-four-mice game played on four different boards.

The models will keep getting better. That’s not a prediction; it’s thermodynamics. What matters now is whether verification infrastructure scales at the same rate as generation capability. Based on 2025, the honest answer is: not yet. But we’re working on it.

Want to check an image?