
Pixverse v6 Reference-to-Video API by PIXVERSE
Pixverse v6 Reference-to-Video model. High-quality video generation from image prompts.
PixVerse V6 Reference-to-Video
PixVerse V6 Reference-to-Video enables subject-consistent video generation using up to 7 image references. Tag references as subjects or backgrounds, and V6 generates a video that faithfully preserves the visual identity of your referenced elements with improved consistency and motion quality.
Why Choose This?
-
Subject consistency Maintain character and object identity throughout the generated video.
-
Multiple references Use up to 7 image references for rich, multi-subject scenes.
-
Reference typing Tag each reference as "subject" or "background" for precise control.
-
High resolution output Generate videos in 360p, 540p, 720p, or 1080p quality.
-
Flexible duration Create videos from 1 to 15 seconds in length.
-
Audio generation Optional synchronized audio that matches your scene.
Parameters
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text description of the desired scene and motion |
| images | Yes | Array of image reference objects (1–7 items) |
| images[].image | Yes | Reference image (public URL or Base64, max 20MB) |
| images[].type | No | Reference type: "subject" or "background" |
| images[].ref_name | No | Name for this reference, used in prompt mentions (max 30 bytes) |
| model | Yes | Model name (default: v6) |
| duration | No | Video length in seconds (1-15, default: 5) |
| quality | No | Output resolution: 360p, 540p, 720p (default), 1080p |
| aspect_ratio | No | Video aspect ratio (default: 16:9) |
| sound | No | Generate synchronized audio (default: enabled) |
| seed | No | Random seed for reproducibility |
How to Use
- Prepare your reference images — gather subject and/or background images.
- Write your prompt — describe the scene and motion; mention subjects by
ref_nameif assigned. - Build the images array — provide each image with its type and optional ref_name.
- Set quality — choose resolution based on your quality and speed requirements.
- Adjust duration — set video length up to 15 seconds.
- Configure audio (optional) — enable or disable synchronized audio generation.
- Run — submit and download your video.
Best Use Cases
- Character Animation — Generate videos with specific people or characters with consistent appearance.
- Product Videos — Keep product appearance consistent throughout the generated video.
- Brand Content — Maintain brand identity elements across video generations.
- Multi-character Scenes — Include multiple distinct subjects in a single scene.
- Custom Backgrounds — Fix a specific environment as the persistent video backdrop.
Pro Tips
- Assign
ref_nameto each reference and mention them naturally in your prompt. - Use
type: "subject"for characters and objects,type: "background"for environments. - Use clean, high-quality images with clear subjects for the best consistency.
- Limit references to the most essential subjects to maintain quality.
- Ensure image URLs are publicly accessible; base64 is supported for private assets.
Pricing
| Quality | Billing Standard | Without Audio | With Audio |
|---|---|---|---|
| 360p | per second | $0.025 | $0.035 |
| 540p | per second | $0.035 | $0.045 |
| 720p | per second | $0.045 | $0.060 |
| 1080p | per second | $0.090 | $0.115 |
Notes
promptandimagesare required fields.- Up to 7 image references supported for V6 and C1 models.
ref_namemax length is 30 bytes (UTF-8).- Supported image formats: PNG, JPEG, JPG, WebP.
- Image aspect ratio should be between 1:2.5 and 2.5:1; minimum dimension 300px.
- Maximum video duration is 15 seconds.
Related Models
- PixVerse V6 Text-to-Video — Generate video from text without reference images.
- PixVerse V6 Image-to-Video — Animate a single reference image.
- PixVerse C1 Reference-to-Video — Previous generation C1 reference-to-video model.


















