BIP America News & Media Platform

collapse
Home / Daily News Analysis / Google’s new anything-to-anything AI model is wild

Google’s new anything-to-anything AI model is wild

May 25, 2026  Twila Rosenbaum  6 views
Google’s new anything-to-anything AI model is wild

Google’s anything-to-anything AI: First impressions of Omni

Google has launched a new family of generative AI models called Omni, which the company claims will eventually be able to convert any type of input—text, photo, video, audio—into any other type of output. For now, Omni Flash, the first released model, focuses on video generation. Integrated into Google's Flow platform (the successor to Veo), Omni promises better character consistency, more real-world knowledge, and the ability to use a video clip as a starting point along with text prompts.

Allison Johnson, a senior reviewer at The Verge, put Omni through its paces by reviving her earlier experiment with her son’s stuffed deer, Buddy. She tested the model’s ability to create videos from text prompts, edit existing clips, and generate deepfake videos of herself. The results were, in her words, “a mixed bag so baffling.”

The test with Buddy the deer

Last year, Johnson used Google’s earlier Veo model to create vacation videos of Buddy—a stuffed deer—to replicate a Gemini ad. That experiment revealed both the potential and the flaws of AI video generation. With Omni, she returned to Buddy to see if improvements had been made. She uploaded a photo of Buddy and used prompts like “Buddy skydiving” or “Buddy packing for a cruise.”

Omni produced scenes that were more consistent than Veo: Buddy’s antlers (which he doesn’t have in real life) appeared less frequently, and the character’s form remained stable across multiple clips. Yet, glitches persisted. In one skydiving clip, Buddy suddenly switched orientation mid-air. In a montage showing Buddy packing honey and later applying it as sunscreen, the bottle shape and contents changed jarringly from frame to frame. The final frame of that video was described as “the model barfing up elements” from the sequence.

Text-based video editing: Improved but imperfect

Omni allows users to edit generated videos through text prompts. Johnson found this feature noticeably better than Veo 3, where edits often failed or ruined the video. With Omni, she could request “enhance Buddy’s facial reactions” or “remove the antlers” and see the model attempt to comply. However, the results were unpredictable. Emphasizing facial reactions made Buddy look strange and unnatural. Removing antlers from one scene caused the model to add them to all other scenes—a classic AI hallucination.

Editing is costly: each edit round costs 40 credits. Johnson’s $20-per-month AI Pro plan includes 1,000 credits. After generating about 20 clips with some edits, she had only 145 credits left. This makes iterative refinement expensive for average users.

Deepfake me: Omni’s real-world video augmentation

One of Omni’s touted strengths is inserting AI-generated elements into real video. Johnson tested this by providing a selfie video (neutral expression) and prompting Omni to create clips of herself eating spaghetti, sitting in an airplane seat, and posing in front of the Eiffel Tower biting a baguette. The results were “convincing as hell,” she wrote. The spaghetti-eating clip fooled her husband, who has seen her daily for a decade—his only clue was an unfamiliar bowl. The airplane video had a background figure appearing twice, and the fork-clink sounded manufactured. But for social media, such tells would likely go unnoticed.

The Eiffel Tower deepfake varied: some clips looked cartoonish, but one was indistinguishable from reality except that the AI version had her hair in a ponytail when she normally wears it down. Johnson noted that she herself needed multiple viewings to spot the AI—highlighting how close these models are to producing undetectable fake footage.

Background and implications

Omni represents the latest step in Google’s aggressive push into generative AI, following the Veo series and the broader Gemini ecosystem. Google first teased the “anything-to-anything” concept at Google I/O 2024, and the company has been iterating rapidly. The model is built on a multimodal architecture that integrates understanding of text, images, and video. While Google promises eventual support for audio input and output, the current release focuses on text-to-video and video-to-video tasks.

The technology raises significant ethical and practical questions. Generative AI video tools are already used for harmless fun (like Buddy’s vacations) but also for misinformation and deepfake scams. As realism improves, the line between genuine and synthetic content blurs. Johnson’s experience confirms that while Omni is not yet perfect, its output is good enough to fool casual observers—and even close family members. “We’re definitely deep in the uncanny valley,” she concluded.

Cost and accessibility

Omni is not free. Users need a Google account and a paid AI Pro subscription ($20/month for 1,000 credits). Video generation costs between 15 and 40 credits per clip, depending on length and complexity. Editing adds more cost. For professionals who need high-quality video quickly, this may be acceptable. But for casual users, the expense quickly adds up, potentially limiting adoption to those with a dedicated budget for AI tools.

Comparison with competitors

Omni enters a crowded field. Competitors include OpenAI’s Sora, Runway Gen-3, Pika Labs, and others. Sora has shown impressive realism but remains in limited beta. Runway and Pika offer text-to-video with varying degrees of consistency. Google’s advantage lies in its integration with Gemini and the ability to use existing video as input. However, the credit system and lingering artifacts may put Omni behind the curve. Johnson’s tests suggest Omni’s character consistency is better than Veo but still not reliable enough for professional productions.

Another differentiator is Omni’s promise of “any-to-any” future where audio and even code could be interconverted. This would make Google’s model uniquely versatile, but the timeline for broader modalities remains unclear.

Reaction from the tech community

The article garnered attention for its honest assessment. Many readers appreciated the hands-on, real-world testing rather than a corporate demo. The deepfake section sparked discussion about privacy and consent. Johnson’s willingness to deepfake herself (with permission) demonstrated how easily the technology can target individuals. The article also noted that Google has not yet implemented robust safeguards against misuse, beyond basic content policies.

Some critics argue that releasing such powerful tools without tighter guardrails is irresponsible. Conversely, proponents point to the creative potential. Johnson herself acknowledged the exhaustion of being repeatedly amazed by AI’s rapid improvement: “The edge has worn off.”

Technical details of Omni Flash

Omni Flash is a distilled version of the full Omni model, optimized for speed and cost. It runs on Google’s TPU v5 infrastructure. The model uses a diffusion transformer architecture with cross-attention layers to handle multimodal inputs. According to Google, it “incorporates more real-world knowledge” by training on a larger, curated dataset that includes object interactions, physics, and human poses. This may explain why Buddy’s body stayed consistent, but the honey bottle still changed—indicating that the model struggles with object permanence across scenes.

Johnson’s request to “remove the antlers” led to the model adding antlers elsewhere, a phenomenon known as “prompt overshooting” where the model overcompensates. Such issues are well known in AI generation and require fine-tuning of prompt phrasing.

Future outlook

Google is expected to roll out Omni across more products, including Google Photos, YouTube, and perhaps even Android. At Google I/O 2026, the company hinted at a “universal media layer” where AI seamlessly transforms content. But as Omni shows, the path to that vision is littered with small but persistent errors. Johnson’s final word: “It’s still not quite as easy to make an AI-generated cinematic masterpiece as Google would like you to believe. But Omni does improve on Veo in recognizable ways.”

The real test will come when the model is widely available and the public begins experimenting. For now, Omni stands as a fascinating, flawed step toward a future where any content can be turned into any other—with all the promise and peril that entails.


Source: The Verge News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy