Generate photoreal AI video with synced audio from a single image or prompt with Grok Imagine 1.5, the #1 video model on the Image-to-Video Arena leaderboard.
#1 on the Image-to-Video Arena
Released by xAI on May 31, 2026, Grok Imagine 1.5 is a 52-Elo jump over the 1.0 model in Arena.ai's blind testing — enough to clear ByteDance's Seedance 2.0, Google's Veo, and Alibaba's HappyHorse and take the top spot on the public Image-to-Video leaderboard. The wins are concentrated where it matters: audio realism, frame-to-frame coherence, and faces that hold their identity through whole shots.
Up to 12 reference files in one generation
Grok Imagine 1.5 takes four kinds of input at once: up to 9 images, up to 3 video clips (≤15 seconds total), up to 3 audio files, and your text prompt. Mix and match up to 12 files across modalities to lock in characters, places, props, music, and a voice in a single shot — no chaining tools, no stitching results afterwards.
Native audio you don't have to add later
1.5's headline upgrade is audio: ambient soundscapes, room tone, music, and dialogue render in the same pass as the picture. Wrap any sound or line of dialogue in quotation marks inside your prompt and it lands in the final clip already mixed — no Premiere session, no separate text-to-audio model, no manual sync.
Photoreal faces with character consistency
Faces in 1.5 hold their identity across an entire shot — a long-standing weak point of every prior model. Bring your own voice track or a celebrity reference and the same character moves, blinks, and lip-syncs the whole way through. Pair an uploaded audio file with an image and the speaker's lips track the dialogue automatically.
Overchat AI brings you the power of the world’s top AI models: ChatGPT, Claude, Gemini, Mistral, and more.

What can you create with Grok Imagine 1.5? Get inspired with these ideas:
Short-form video that sounds finished
TikToks, Reels, and Shorts where the SFX, music, and any voiceover render in-frame with the video — no separate audio pass, no manual sync, no exported timeline.
AI shorts with real dialogue
Write a script, drop a character image, and Grok Imagine 1.5 generates the scene with the actor lip-syncing the line. Character consistency holds across multiple shots, so the same face shows up the same way every time.
YouTube b-roll and explainers
Travel cutaways, product demos, explainer visuals, talking-head intros — photoreal video with synced ambient sound, ready to drop on the timeline next to your own footage.
Talking-head clips in your own voice
Upload a photo of yourself plus a voice recording and Grok Imagine 1.5 animates your face speaking the line — lip-sync, expression, and head movement landed in one pass. Perfect for course intros, founder updates, or social posts when you can't get in front of a camera.
Product marketing video
Upload product photos as references, write the scene in plain English, and get a polished ad ready for the landing page — with the right ambient sound and a voiceover that matches the brand if you want one.
Cinematic photoreal scenes
Up to 12 reference slots let you lock in real places, real props, real actors, and combine them with anything imaginary. Faces and lighting hold across the whole shot — the kind of consistency that used to require a full VFX team.
Create AI videos with Grok Imagine 1.5 in 3 simple steps
Describe Your Video
Write your prompt and optionally drop in references — up to 9 images, 3 video clips, 3 audio tracks, or any mix up to 12 files total.
AI Generates The Video
Grok Imagine 1.5 generates the video with synced ambient sound, dialogue, and music — typically in under a minute.
Download and use
Get your video ready to share, post, or integrate into your projects.
What is Grok Imagine 1.5?
Grok Imagine 1.5 is xAI's flagship AI video generator, released on May 31, 2026. It turns a text prompt or a single image into a short photoreal video with synced audio — ambient sound, music, and dialogue all rendered in the same pass. As of release, it sits at #1 on the public Image-to-Video Arena leaderboard, ahead of ByteDance's Seedance 2.0, Google's Veo, and Alibaba's HappyHorse.
How do I use Grok Imagine 1.5?
Sign into Overchat AI with email, Google, or Apple, open the video generator, and pick Grok Imagine 1.5 from the model dropdown. Write your prompt, optionally attach up to 12 reference files (images, video clips, audio tracks), and generate. Clips typically come back in under a minute with audio already mixed in. No API key, no xAI account, no install.
What's the difference between Grok Imagine 1.5 and Grok Imagine 1.0?
Grok Imagine 1.5 scores +52 Elo over 1.0 in Arena.ai blind tests. The biggest gains are in audio (native generation instead of a tacked-on track), character consistency (the same face holds across the whole shot), photorealism, and lip-sync accuracy. Practically: 1.0 felt like a video model with audio bolted on; 1.5 feels like one model generating both.
Can Grok Imagine 1.5 generate video with sound?
Yes — native audio is the headline upgrade in 1.5. Write what you want into the prompt and wrap any specific sound effect or line of dialogue in quotation marks; the model renders and mixes it inline. You can also upload your own audio (up to 3 tracks per generation) and the model will sync the speaker's lips to your dialogue track automatically.
How long can a Grok Imagine 1.5 video be?
A single generation produces a short clip — the model is optimized for high-quality short-form output rather than long-running scenes. For longer pieces, generate multiple clips and chain them together; character consistency holds well enough across shots that the same face and voice carry across cuts. Check the generator UI for the current per-clip duration limit.
Is Grok Imagine 1.5 free?
Grok Imagine 1.5 on Overchat AI is a paid model because photoreal video with native audio is genuinely expensive to render. Overchat AI also includes many free AI tools, including a free AI image generator and chat with frontier LLMs. Create a free account to try those, then pay only for the video clips you generate.