
Vidu Q2 Reference-to-Video API by Vidu
Vidu Q2 Reference-to-Video is an advanced AI video generation model that brings static images to life. Upload a reference image and describe the motion you want — the model generates high-quality video with smooth animation, optional audio, and cinematic quality up to 1080p.
Vidu Q2 Reference-to-Video
Vidu Q2 Reference-to-Video is a capable AI video generation model that generates video featuring specific subjects. Provide subject images alongside a motion prompt, and the model produces smooth, natural video that faithfully preserves each subject's appearance and identity — offering a strong balance of quality and cost for subject-driven workflows.
Why Choose This?
-
Balanced quality and speed Solid visual consistency and motion quality at a mid-tier price point.
-
Subject-driven generation Feature specific characters or objects with consistent appearance throughout the video.
-
High resolution output Generate videos in 540p, 720p, or 1080p quality.
-
Flexible duration Create videos from 1 to 10 seconds in length.
-
Audio generation Optional audio with configurable type: full audio, speech only, or sound effects only.
-
Prompt Enhancer Built-in tool to automatically improve your motion descriptions.
Parameters
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text description of the desired motion and action |
| subjects | Yes | One or more subject images to feature in the video (URL or upload) |
| resolution | No | Output quality: 540p, 720p (default), 1080p |
| duration | No | Video length in seconds (1-10, default: 5) |
| aspect_ratio | No | Aspect ratio of the output: 16:9 (default), 9:16, 1:1, 4:3, 3:4 |
| movement_amplitude | No | Motion intensity: auto (default), small, medium, large |
| generate_audio | No | Whether to generate audio for the video (default: true) |
| audio_type | No | Audio type when generate_audio is true: all (default), speech_only, sound_effect_only |
| seed | No | Seed for generation (default: 0); use -1 for a random seed |
How to Use
- Upload your subject images — provide one or more images of the subjects to feature in the video.
- Write your prompt — describe the motion, camera movement, and desired action.
- Set resolution — higher resolution for better quality, lower for faster processing.
- Adjust duration — set video length up to 10 seconds.
- Configure audio (optional) — enable audio and select the audio type: all, speech_only, or sound_effect_only.
- Set motion intensity (optional) — adjust movement_amplitude for subtle or dynamic animations.
- Run — submit and download your video.
Pricing
| Resolution | Cost |
|---|---|
| 540p | Starts at 0.0250/sec |
| 720p | Starts at 0.0250/sec |
| 1080p | Starts at 0.0500/sec |
Best Use Cases
- Character Consistency — Generate video featuring a specific character or subject across multiple scenes.
- Product Videos — Animate product imagery while maintaining accurate brand appearance.
- Style-Consistent Content — Produce video that matches the visual aesthetic of existing creative assets.
- Social Media Content — Create animated clips grounded in your existing image library.
- Concept Development — Explore reference-guided motion ideas quickly and affordably.
Pro Tips
- Use the Prompt Enhancer to refine your motion descriptions.
- Provide clear, well-lit subject images for the most consistent visual output.
- Be specific about movement direction, speed, and camera angles in your prompt.
- Use multiple subject images when the scene involves more than one character or object.
- Set audio_type to speech_only when the scene involves dialogue, or sound_effect_only for purely ambient audio.
- Start with 540p for previews and switch to 720p or 1080p for final output.
Notes
- Both prompt and subjects are required fields.
- Maximum video duration is 10 seconds.
- When generate_audio is true, audio_type controls what is generated: all includes speech and sound effects, speech_only generates voice audio, sound_effect_only generates ambient and environmental sounds.
- Ensure uploaded subject image URLs are publicly accessible.
Related Models
- Vidu Q2 Text-to-Video — Generate video directly from text descriptions without a reference image.
- Vidu Q2-Pro Reference-to-Video — Higher quality reference-guided video with 1080p support and extended duration.
- Vidu Q2-Pro-Fast Reference-to-Video — Pro quality reference-to-video with faster generation speed.


















