Runway’s Gen-4.5 edges past Google and OpenAI in text-to-video benchmark

amu · December 1, 2025, 4:07pm

Runway’s Gen-4.5 Leads Text-to-Video Generation in Latest Benchmark

In a significant advancement for AI-driven video synthesis, Runway’s newly released Gen-4.5 model has claimed the top spot on the text-to-video benchmark leaderboard, surpassing established competitors from Google and OpenAI. According to results published by Artificial Analysis, Gen-4.5 achieved an overall score of 71.5 out of 100, edging out Google’s Veo 2 at 66.5 and OpenAI’s Sora at 64.3. This positions Runway ahead of other prominent models, including Luma’s Dream Machine V2 (65.3), Kling 1.6 (63.8), and Pika 1.5 (62.1).

The benchmark, known as VBench-2.0, evaluates text-to-video models across 12 distinct dimensions critical to high-quality output. These include subject consistency, background consistency, motion quality, spatial relationships, temporal flickering, and more specialized criteria such as camera movement emulation, dynamic color changes, and 3D object awareness. VBench-2.0 builds on its predecessor by incorporating advanced video understanding models like Google’s Video-MME and Qwen2.5-VL to provide more precise scoring, addressing limitations in human evaluations that can introduce bias.

Gen-4.5’s dominance is particularly evident in motion-related metrics, where it scored 80.1—well above Veo 2’s 75.2 and Sora’s 72.4. This reflects the model’s superior handling of smooth, natural movements, avoiding common artifacts like unnatural distortions or jerky transitions. In temporal consistency, Gen-4.5 earned 78.3, demonstrating robust frame-to-frame coherence even in complex scenes involving multiple interacting elements. Spatial quality followed closely at 76.5, with strong performance in maintaining accurate proportions, lighting, and depth perception.

Comparatively, while Veo 2 excels in camera control (scoring 82.1) and dynamic framing (79.4), it lags in overall integration, particularly in multi-subject interactions and background stability. Sora, a pioneer in the space, shines in 3D structure understanding (77.2) but struggles with temporal flickering (68.1), leading to occasional visual glitches in longer clips. Luma’s Dream Machine V2 performs admirably in color vividness (74.3) but falls short in subject consistency (62.7). Kling 1.6, from Kuaishou, leads in some aesthetic categories like character details (75.6) but averages lower due to inconsistencies in motion expansion.

Runway’s CEO, Cristóbal Valenzuela, highlighted the iterative progress behind Gen-4.5 during a recent announcement, noting that it represents “a leap in cinematic quality” achieved through extensive training on diverse video datasets and refined diffusion architectures. The model supports text prompts up to 10 seconds of video at 1080p resolution, with extensions possible via Runway’s image-to-video and video-to-video tools. Early testers praise its ability to generate hyper-realistic human motions, intricate physics simulations, and seamless scene transitions, as demonstrated in sample clips like a “chef flipping pancakes in slow motion” or “a surfer riding a massive wave at golden hour.”

Access to Gen-4.5 remains limited, available only to a select group of creators through Runway’s API platform. Pricing starts at $0.05 per second of generated video, with higher tiers offering priority queueing and longer durations. This controlled rollout allows Runway to gather feedback and iterate rapidly, a strategy that has propelled the company from Gen-3 Alpha— which debuted earlier this year—to this benchmark-topping iteration in mere months.

The implications of Gen-4.5’s lead extend beyond raw scores. As text-to-video technology matures, benchmarks like VBench-2.0 become essential for quantifying progress amid a proliferation of models. Runway’s edge underscores the value of specialized fine-tuning for video-specific challenges, such as preserving narrative flow and emotional expressiveness. Industry observers note that while Google and OpenAI boast massive resources, Runway’s agility in model releases gives it a competitive foothold.

Challenges persist across the board. No model exceeds 80 in any single category, revealing room for improvement in areas like multi-turn editing and real-world physics accuracy. Ethical considerations, including deepfake risks and copyright in training data, also loom large, though Runway emphasizes responsible AI practices with built-in safeguards.

As the race intensifies, Gen-4.5 sets a new standard, signaling that text-to-video generation is approaching production-ready fidelity for filmmakers, advertisers, and educators alike.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.