Join the Discord community for the latest model updates, prompts, and support.

xAI Grok Imagine Video generates short videos (1-15s) from natural-language prompts at 480p or 720p.

xAI Grok Imagine Video animates a starting frame image with natural-language motion prompts at 480p or 720p.

xAI Grok Imagine Video generates videos guided by 1-7 reference images that contribute people, objects, or styles. Output up to 10s at 480p or 720p.

xAI Grok Imagine Video continues an existing 2-15s mp4 with a 2-10s prompt-driven extension. Output matches input, capped at 720p.

xAI Grok Imagine Video edits an mp4 with natural-language instructions. Output retains source duration, capped at 8.7s. Billed per second of the input video (output duration == input duration).

xAI Grok Imagine generates polished visuals from natural-language prompts at 1K or 2K resolution, with 14 aspect ratios.

xAI Grok Imagine edits one or more reference images with natural-language instructions at 1K or 2K resolution. Supports single image and multi-image (<IMAGE_0>, <IMAGE_1>) reference editing.

Gemini Omni Flash is Google's multimodal video generation model. This image-to-video variant creates subject-consistent videos from up to 7 reference images combined with a text prompt, preserving visual identity across the full generated video.

