Lights, Camera, AI
The year video generation went from party trick to Sundance premiere
“Cinema is truth twenty-four times a second. What is AI video?”
— Adapted from Jean-Luc Godard
It is useful to remember how recently AI video was a joke. In early 2024, OpenAI’s Sora demos showed people with melting faces walking through streets that warped like funhouse mirrors. Runway’s Gen-2 produced clips where physics was more suggestion than law. The consensus was clear: AI could generate still images that fooled people, but motion—coherent, physically plausible motion—was years away.
The consensus was wrong by about eighteen months.
At the Sundance Film Festival in January 2026, multiple production-ready short films premiered that were created using AI video generation tools. They were not novelties screened in a “future of cinema” sidebar. They were films. With narratives. With emotional arcs. With characters that maintained consistent appearances across scenes. The temporal consistency problem—the “jitter” that made early AI video look like a fever dream—had been solved.
Three systems converged to make this possible, and each took a different approach.
OpenAI’s Sora 2 shipped to select users in the U.S. and Canada at the end of September 2025. Its January 2026 update introduced “character cameos”—persistent character embeddings that maintain identity across scenes. Pair that with synchronized audio generation (dialogue and sound effects, not just background music) and extended duration up to twenty-five seconds, and you have something that starts to feel like a shot, not a clip. Sora currently ranks seventh on the AI video leaderboard. Seventh. That’s how fast the field is moving.
Runway Gen-4.5, released in December 2025, tops that leaderboard. Its core capability is what the company calls “infinite character consistency”—the ability to maintain a character’s appearance, clothing, and mannerisms across an arbitrary number of generated shots. For filmmakers, this is everything. Consistency is what separates a video from a film. It’s what allows you to cut between angles without the viewer’s brain flagging something as wrong. Runway solved it.
Google’s Veo 3, released in May 2025, took a different tack. Its defining feature is natively synchronized audio: not audio generated separately and aligned in post, but audio that emerges from the same generation process as the video. Dialogue, sound effects, ambient noise—all produced as a unified output. Veo 3.1 added vertical video for YouTube Shorts and upscaling to 4K. Over seventy million videos have been created since launch. Google, being Google, watermarks every single one with SynthID.
And then there are the specialists. Pika’s “Pikaformance” model generates hyper-realistic facial expressions synced to any sound input at near real-time speeds. Kling 2.0, from the Chinese company Kuaishou, delivers cinematic 1080p quality with lip-sync capabilities that make dubbing look native. The ecosystem is deep, diverse, and moving fast enough to give anyone in the detection space vertigo.
But let’s talk about what hasn’t been solved.
The jitter problem was technical. Character consistency was technical. Audio synchronization was technical. Technical problems, given sufficient compute and talent, get solved. The problems that remain are not technical. They are artistic, and they are epistemological.
The artistic problem: AI video generation excels at producing footage that looks like other footage. It is, by training and by design, a recombination engine. Give it a prompt that references existing cinematic language—“close-up, golden hour, shallow depth of field”—and it will produce something competent. But competent is not interesting. Interesting requires intent. It requires a filmmaker who chooses this angle instead of that one not because the training data suggests it, but because it serves a story only they can tell.
The Sundance films that worked—the ones that felt like films rather than demos—worked because human directors used AI generation as a production tool, not a replacement for direction. They chose the shots. They edited the sequence. They imposed meaning on the generated footage. The AI provided the raw material. The humans provided the vision. When that relationship inverts—when the prompt is the direction and the output is the film—you get something technically impressive and artistically vacant.
The epistemological problem is darker. We have spent more than a century developing a shared understanding that moving images are records of events. Newsreel footage, documentary film, smartphone video of police encounters—all of these carry evidentiary weight because we trust that a camera was present and recording. AI video generation destroys that trust by producing footage of events that never occurred, indistinguishable from footage of events that did.
This is not hypothetical. AI-generated videos of Venezuelan celebrations circulated to millions in January 2026. They were not obviously fake. They would not have been obviously fake to a trained eye. They were detected by systems like ours that analyze generation artifacts below the threshold of human perception. But detection after viral spread is damage control, not prevention.
The numbers tell us where this is heading. Seventy million videos generated through Veo alone. Millions more through Sora, Runway, Kling, and Pika. The volume of synthetic video will soon rival the volume of synthetic images. And if the image detection landscape is any guide—where thirty-two percent of social media images already show evidence of AI augmentation—the video landscape will become equally contested, equally unreliable, equally in need of verification infrastructure.
Jean-Luc Godard said cinema is truth twenty-four times a second. AI video is prediction sixty times a second. The difference is everything.