
Vidu Q2 Pro Reference-to-Video API by Vidu
Vidu Q2-Pro Reference-to-Video is an advanced AI video generation model that brings static images to life. Upload a reference image and describe the motion you want — the model generates high-quality video with smooth animation, optional audio, and cinematic quality up to 1080p.
Vidu Q2-Pro Reference-to-Video
Vidu Q2-Pro Reference-to-Video is a professional-grade AI video generation model that generates video featuring specific subjects with cinematic precision. Provide subject images alongside a motion prompt, and the model delivers up to 1080p video with rich detail, strict subject fidelity, and smooth natural motion — ideal for high-end creative, brand, and production workflows.
Why Choose This?
-
Professional quality Cinematic detail and smooth motion with faithful subject preservation at up to 1080p.
-
Subject-driven generation Feature specific characters or objects with strict visual fidelity throughout the video.
-
Flexible duration Create videos up to 10 seconds in length.
-
Audio generation Optional audio with configurable type: full audio, speech only, or sound effects only.
-
Motion control Adjust movement amplitude for subtle or dynamic animations.
-
Prompt Enhancer Built-in tool to automatically improve your motion descriptions.
Parameters
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text description of the desired motion and action |
| subjects | Yes | One or more subject images to feature in the video (URL or upload) |
| resolution | No | Output quality: 540p, 720p (default), 1080p |
| duration | No | Video length in seconds (1-10, default: 5) |
| aspect_ratio | No | Aspect ratio of the output: 16:9 (default), 9:16, 1:1, 4:3, 3:4 |
| movement_amplitude | No | Motion intensity: auto (default), small, medium, large |
| generate_audio | No | Whether to generate audio for the video (default: true) |
| audio_type | No | Audio type when generate_audio is true: all (default), speech_only, sound_effect_only |
| seed | No | Seed for generation (default: 0); use -1 for a random seed |
How to Use
- Upload your subject images — provide one or more images of the subjects to feature in the video.
- Write your prompt — describe the motion, camera movement, and desired action.
- Set resolution — higher resolution for better quality, lower for faster processing.
- Adjust duration — set video length up to 10 seconds.
- Configure audio (optional) — enable audio and select the audio type: all, speech_only, or sound_effect_only.
- Set motion intensity (optional) — adjust movement_amplitude for subtle or dynamic animations.
- Run — submit and download your video.
Pricing
| Resolution | Cost |
|---|---|
| 540p | Starts at 0.0250/sec |
| 720p | Starts at 0.0250/sec |
| 1080p | Starts at 0.0500/sec |
Best Use Cases
- Character Consistency — Generate high-quality video featuring a specific character or subject with strict visual fidelity.
- Brand & Product Videos — Produce professional-grade product animations while preserving brand identity.
- Film & Narrative Production — Animate reference imagery for previs, concept reels, or final narrative content.
- Style-Consistent Campaigns — Create multiple video assets that maintain a unified visual style across a campaign.
- Premium Social Media Content — Publish cinematic, reference-guided video for high-visibility channels.
Pro Tips
- Use the Prompt Enhancer to refine your motion descriptions.
- Provide high-resolution, well-composed subject images for the strongest visual consistency.
- Be specific about movement direction, speed, camera angles, and framing in your prompt.
- Use multiple subject images to define different characters or scene elements independently.
- Set movement_amplitude to "small" for precise, controlled motion or "large" for expressive action.
- Set audio_type to speech_only when the scene involves dialogue, or sound_effect_only for purely ambient audio.
- Describe lighting, atmosphere, and environmental effects in the prompt for richer scene quality.
Notes
- Both prompt and subjects are required fields.
- Maximum video duration is 10 seconds.
- When generate_audio is true, audio_type controls what is generated: all includes speech and sound effects, speech_only generates voice audio, sound_effect_only generates ambient and environmental sounds.
- Ensure uploaded subject image URLs are publicly accessible.
Related Models
- Vidu Q2-Pro-Fast Reference-to-Video — Pro quality reference-to-video with significantly faster generation speed.
- Vidu Q2-Pro Image-to-Video — Professional quality animation from a single reference image.
- Vidu Q2-Pro Start-End-to-Video — Professional quality video generation with precise start and end frame control.


















