google/veo3.1/reference-to-video

Create richly detailed videos guided by visual references. Veo 3.1 Reference-to-Video preserves characters, style, and composition across scenes for consistent, visually coherent storytelling.

IMAGE-TO-VIDEOHOTNEW
image-to-video

Create richly detailed videos guided by visual references. Veo 3.1 Reference-to-Video preserves characters, style, and composition across scenes for consistent, visually coherent storytelling.

INPUT

Loading parameter configuration...

OUTPUT

Idle
Your generated videos will appear here
Configure your settings and click Run to get started

Your request will cost 0.18 per run. For $10 you can run this model approximately 55 times.

Here's what you can do next:

Parametri

Queue

Integrazioni

Schema di input

I seguenti parametri sono accettati nel corpo della richiesta.

Totale: 0Obbligatorio: 0Opzionale: 0

Nessun parametro disponibile.

Esempio di corpo della richiesta

json
{
  "model": "google/veo3.1/reference-to-video"
}

Please log in to view request history

You need to be logged in to access your model request history.

Log In

Google Veo 3.1 — Reference-to-Video Model

Veo 3.1 Reference-to-Video brings static images to life by combining visual reference consistency with cinematic motion generation. Powered by Google DeepMind’s next-generation Veo 3.1 architecture, this model transforms up to three reference images into coherent 5-second videos with smooth motion, accurate visual alignment, and synchronized native audio.

🌟 Key Features

🧠 Multi-Image Reference Support

  • Accepts up to three reference images to define the subject, environment, or style.
  • Maintains consistent identity, lighting, and appearance across frames.
  • Ideal for animating people, objects, or scenes with reliable fidelity.

🎬 Cinematic Video Generation

  • Produces 5-second motion clips at 1080p or 720p resolution.
  • Adds camera dynamics such as panning, zooming, or subtle perspective drift.
  • Supports synchronized audio generation, matching dialogue or ambient context.

💡 Smart Prompt Adherence

  • Interprets both text instructions and visual cues for precise motion storytelling.
  • Automatically harmonizes character interactions, props, and backgrounds.

⚙️ Capabilities

  • Input:

    • Up to 3 reference images (JPEG / PNG / WEBP)
    • Text prompt describing motion, action, and scene context
  • Output:

    • 8-second MP4 video (720p or 1080p)
    • Optional synchronized audio
  • Negative Prompt (optional):

    • Exclude unwanted artifacts or elements (e.g., “no text”, “no flicker”).
  • Seed (optional):

    • Reproduce specific results for consistent creative control.

💰 Pricing

DurationResolutionWith AudioWithout Audio
8 seconds720p$3.20$1.60
8 seconds1080p$3.20$1.60

✅ Commercial use allowed

🧩 How to Use

  1. Upload up to 3 reference images — define the subject, object, or visual style.
  2. Write a text prompt — describe the action, setting, and camera motion.
  3. (Optional) Add a negative prompt to remove unwanted details.
  4. Choose resolution (720p or 1080p).
  5. (Optional) Enable audio generation for synchronized sound.
  6. Click Run to generate your 5-second cinematic video.

💡 Best Practices

  • Use clear, well-lit reference images with similar styles and proportions.
  • Keep prompts concise but specific (e.g., “The man in image 1 waves to the penguins in image 2 under bright sunlight”).
  • Avoid overly complex scenarios with many characters or fast movement.
  • Enable audio for more immersive storytelling results.

📝 Notes

  • Ensure uploaded images are valid and accessible URLs or uploaded locally.
  • If the output looks unstable, reduce reference count or simplify the prompt.
  • Follow Google’s content safety rules; modify the prompt if flagged.
  • For best performance, prefer portrait-oriented subjects and balanced lighting.

Inizia con Oltre 300 Modelli,

Esplora tutti i modelli