Google Deepmind updates Veo 3.1 with reference image function for more dynamic videos

Google DeepMind Enhances Veo 3.1 with Reference Image Feature for Superior Video Generation Control

Google DeepMind has introduced a significant update to its Veo 3.1 AI video generation model, incorporating a new “Reference Image” function. This enhancement empowers creators to achieve greater consistency and dynamism in generated videos by using a user-uploaded image as a visual anchor for characters, objects, or scenes. The update addresses a key challenge in AI video synthesis: maintaining coherent visual elements across dynamic motion sequences.

Previously, Veo 3 excelled at producing high-fidelity 1080p videos up to eight seconds in length, showcasing realistic physics, smooth camera movements, and intricate details like fabric textures and lighting effects. However, ensuring that specific characters or elements retained their appearance amid complex actions often proved inconsistent. The Reference Image feature changes this dynamic. Users can now upload a single image—such as a portrait of a character or a depiction of an object—and instruct the model to preserve its style, pose, clothing, or environmental details throughout the video clip.

How the Reference Image Function Operates

The integration of the Reference Image capability is seamless within Google’s existing platforms. In the VideoFX lab, accessible via labs.google, users begin by crafting a text prompt describing the desired scene, action, or narrative. They then select the Reference Image option and upload their chosen image file. The model analyzes the reference, extracting key visual attributes like facial features, body proportions, attire, and color palettes. These elements are then locked in as the video unfolds, allowing for fluid animations such as running, jumping, or interacting with environments without visual drift.

On Vertex AI, DeepMind’s enterprise-grade platform, the feature supports programmatic access through APIs. Developers can integrate reference images into scalable workflows, enabling batch processing for applications in advertising, film previsualization, or educational content. The API documentation highlights parameters for fine-tuning the reference strength, balancing adherence to the image with creative interpretation from the text prompt. This flexibility ensures outputs range from photorealistic fidelity to stylized interpretations, all while upholding Veo 3.1’s hallmark realism.

DeepMind emphasizes that the update leverages advanced diffusion techniques refined in Veo 3. The model, trained on vast datasets of video and image pairs, employs temporal consistency modules to propagate reference details frame-by-frame. This results in videos where motion feels natural—think a character sprinting through a forest, leaves rustling realistically, while their facial expression and outfit remain unchanged from the reference.

Demonstrated Capabilities and Examples

To illustrate the feature’s impact, DeepMind shared several example videos generated with Veo 3.1 and Reference Images. One standout clip depicts a young woman in a red jacket running dynamically through varied landscapes: city streets, beaches, and urban parks. The reference image, a static photo of the woman, ensures her appearance stays identical across scenes, with hair flowing naturally and jacket folds simulating real-world physics. Another example features a black dog leaping playfully in a sunlit field, maintaining precise fur patterns and eye details from the uploaded pet photo.

These demonstrations highlight improved handling of complex motions. The model simulates accurate weight shifts, ground interactions, and environmental responses, such as splashes in water or dust kicked up during runs. Compared to prior iterations, where character consistency might falter after a few seconds, the Reference Image function sustains fidelity for the full clip duration.

DeepMind notes that Veo 3.1 also benefits from broader upgrades in Veo 3, including enhanced prompt adherence and reduced artifacts. Videos now exhibit better understanding of cinematography terms, like “drone shot” or “slow-motion pan,” seamlessly incorporating them alongside reference-guided elements.

Availability and Access

The updated Veo 3.1 with Reference Image is immediately available in the VideoFX lab for users in supported regions, including the US and select European countries. Access requires a Google account, with generation credits allocated daily—typically enough for several high-quality clips. For enterprise users, Vertex AI offers tiered pricing based on compute usage, with the Reference Image endpoint documented in the latest API reference.

DeepMind positions this update as a step toward more intuitive video creation tools, bridging the gap between static images and cinematic outputs. Early feedback from VideoFX testers praises the feature for streamlining iterative workflows, reducing the need for multiple regenerations to achieve desired consistency.

Implications for AI Video Generation

This enhancement underscores DeepMind’s focus on user control in generative AI. By anchoring videos to reference images, Veo 3.1 minimizes the “uncanny valley” effect in character animation and opens doors for personalized content, such as avatar-based storytelling or product visualization. As the model evolves, future iterations may expand to multi-image references or longer clip durations, but for now, it sets a new benchmark for accessible, high-control video synthesis.

The rollout follows Veo 3’s recent launch, which already garnered acclaim for its 1080p resolution and eight-second clips outperforming competitors in realism benchmarks. With Reference Image, creators gain a powerful tool to infuse personal or branded visuals into AI-generated narratives, fostering innovation across creative industries.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.