Google I/O 2024’s keynote session was a 112-minute-long affair where the company made several major announcements focused on artificial intelligence (AI). The announcements ranged from new AI models to integration of AI into Google products, but perhaps one of the most interesting introductions was Veo, an AI-powered video generation model, that can generate 1080p resolution videos. The tech giant said that the AI tool can generate videos that go beyond the one-minute mark. Notably, OpenAI also unveiled its video AI model dubbed Sora in February.
During the event, Demis Hassabis, co-founder and CEO of Google DeepMind, unveiled Veo. Announcing the AI model, he said, “Today, I’m excited to announce our newest and most capable generative video model called Veo. Veo creates high-quality 1080p videos from text, image and video prompts. It can capture the details of your instructions in different visual and cinematic styles.”
The tech giant claims that Veo can closely follow prompts to understand the nuance and tone of a phrase and then generate a video to resemble it. The AI model can generate videos in different styles like timelapse, close-ups, fast-tracking shots, aerial shots, and various lighting and depth of field shots. Apart from video generation, the AI model can also edit videos when the user provides it with an initial video and a prompt to add or remove something. Further, it can also generate videos beyond the one-minute mark either through a single prompt or via multiple sequential prompts.
To solve the problem of consistency in video generation models, Veo uses latent diffusion transformers. This helps in reducing the instances of characters, objects, or the entire scene flickering, jumping, or morphing unexpectedly between frames. Google highlighted that videos created by Veo will be watermarked using SynthID, the company’s in-house tool for watermarking and identifying AI-generated content. The model will soon be available for select creators via the VideoFX tool at Google Labs.
Veo’s similarities with OpenAI’s Sora
While neither of the AI models is available to the public yet, both share several similarities. Veo can generate 1080p videos for a duration that can surpass one minute, whereas OpenAI’s Sora can generate videos of up to 60 seconds. Both models can generate videos from text prompts, images, and videos. Based on diffusion models, both are capable of generating videos from multiple shots, styles, and cinematography tehcniques. Both Sora and Veo also come with AI-generated content labels. Sora uses the Coalition for Content Provenance and Authenticity (C2PA) standard while Veo uses its native SynthID.