google/gemini-omni-flash/reference-to-video-developer

Video a Video

DEV

Gemini Omni Flash Reference-to-Video Developer API by Google

google/gemini-omni-flash/reference-to-video-developer

Reference-to-video-developer

Gemini Omni Flash is Google's multimodal video generation model. This reference-to-video variant transforms existing video clips using reference images and text prompts, enabling video style transfer, scene editing, and character insertion.

Entrada

Prompt *

Imágenes(0/5)

Arrastra archivos aquí o haz clic para subir

MAX:5

Video clips *

MÍN: 1 / MÁX: 1

Duración

Relación de Aspecto

Resolución

Semilla

Salida

Inactivo

Los videos generados se mostrarán aquí

Configura los parámetros y haz clic en ejecutar para comenzar a generar

Cada ejecución costará $0.12. Con $10 puedes ejecutar aproximadamente 83 veces.

Puedes continuar con:

Seedance 2.0 Kling v3 Vidu Wan2.7

Parámetros

Ejemplo de código
import requests
import time

# Step 1: Start video generation
generate_url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
    "model": "google/gemini-omni-flash/reference-to-video-developer",  # Required. model name
    "prompt": "A beautiful sunset over the ocean with gentle waves",  # Required. Text prompt for generation
    "images": [
        "https://example.com/image1.jpg"
    ],  # Images to use as character, scene, or style references
    "video_clips": [
        {
            "url": "example_url",
            "start": 0,
            "ends": 10
        }
    ],  # Required. Source video clips to use as references for generation
    "duration": 8,  # The duration of the generated video in seconds. options: 4 | 6 | 8 | 10
    "aspect_ratio": "16:9",  # The aspect ratio of the generated video. options: 16:9 | 9:16
    "resolution": "720p",  # The resolution of the generated video. options: 720p | 1080p | 4k
    "seed": -1,  # Random seed for reproducibility
}

generate_response = requests.post(generate_url, headers=headers, json=data)
generate_result = generate_response.json()
prediction_id = generate_result["data"]["id"]

# Step 2: Poll for result
poll_url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"

def check_status():
    while True:
        response = requests.get(poll_url, headers={"Authorization": "Bearer $ATLASCLOUD_API_KEY"})
        result = response.json()

        if result["data"]["status"] in ["completed", "succeeded"]:
            print("Generated video:", result["data"]["outputs"][0])
            return result["data"]["outputs"][0]
        elif result["data"]["status"] == "failed":
            raise Exception(result["data"]["error"] or "Generation failed")
        else:
            # Still processing, wait 2 seconds
            time.sleep(2)

video_url = check_status()

Instalar

Instala el paquete de dependencias necesario.

pip install requests

Autenticación

Todas las solicitudes de API requieren autenticación mediante una clave de API. Puedes obtener tu clave de API desde el panel de Atlas Cloud.

export ATLASCLOUD_API_KEY="your-api-key-here"

Encabezados HTTP

import os

API_KEY = os.environ.get("ATLASCLOUD_API_KEY")
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

Mantén tu clave de API segura

Nunca expongas tu clave de API en código del lado del cliente ni en repositorios públicos. Usa variables de entorno o un proxy de backend en su lugar.

Enviar una solicitud

import requests

url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
    "model": "your-model",
    "prompt": "A beautiful landscape"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Enviar una solicitud

Envía una solicitud de generación asíncrona. La API devuelve un ID de predicción que puedes usar para verificar el estado y obtener el resultado.

POST/api/v1/model/generateVideo

Cuerpo de la solicitud

import requests

url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $ATLASCLOUD_API_KEY"
}

data = {
    "model": "google/gemini-omni-flash/reference-to-video-developer",
    "prompt": "A beautiful sunset over the ocean with gentle waves"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

print(f"Prediction ID: {result['data']['id']}")
print(f"Status: {result['data']['status']}")

Respuesta

{
  "code": 200,
  "data": {
    "id": "pred_abc123",
    "status": "processing",
    "model": "model-name",
    "created_at": "2025-01-01T00:00:00Z"
  }
}

Verificar estado

Consulta el endpoint de predicción para verificar el estado actual de tu solicitud.

GET/api/v1/model/prediction/{prediction_id}

Ejemplo de polling

import requests
import time

prediction_id = "pred_abc123"
url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }

while True:
    response = requests.get(url, headers=headers)
    result = response.json()
    status = result["data"]["status"]
    print(f"Status: {status}")

    if status in ["completed", "succeeded"]:
        output_url = result["data"]["outputs"][0]
        print(f"Output URL: {output_url}")
        break
    elif status == "failed":
        print(f"Error: {result['data'].get('error', 'Unknown')}")
        break

    time.sleep(3)

Valores de estado

processingLa solicitud aún se está procesando.

completedLa generación está completa. Las salidas están disponibles.

succeededLa generación fue exitosa. Las salidas están disponibles.

failedLa generación falló. Verifica el campo de error.

Respuesta completada

{
  "data": {
    "id": "pred_abc123",
    "status": "completed",
    "outputs": [
      "https://storage.atlascloud.ai/outputs/result.mp4"
    ],
    "metrics": {
      "predict_time": 45.2
    },
    "created_at": "2025-01-01T00:00:00Z",
    "completed_at": "2025-01-01T00:00:10Z"
  }
}

Subir archivos

Sube archivos al almacenamiento de Atlas Cloud y obtén una URL que puedes usar en tus solicitudes de API. Usa multipart/form-data para subir.

POST/api/v1/model/uploadMedia

Ejemplo de carga

import requests

url = "https://api.atlascloud.ai/api/v1/model/uploadMedia"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }

with open("image.png", "rb") as f:
    files = {"file": ("image.png", f, "image/png")}
    response = requests.post(url, headers=headers, files=files)

result = response.json()
download_url = result["data"]["download_url"]
print(f"File URL: {download_url}")

Respuesta

{
  "data": {
    "download_url": "https://storage.atlascloud.ai/uploads/abc123/image.png",
    "file_name": "image.png",
    "content_type": "image/png",
    "size": 1024000
  }
}

Schema de entrada

Los siguientes parámetros se aceptan en el cuerpo de la solicitud.

Total: 8Obligatorio: 3Opcional: 5

modelstringrequired

model name

Default: "google/gemini-omni-flash/reference-to-video-developer"

promptstringrequired

Text prompt for generation. Describes the target content, style, camera language, or character actions. Maximum 20,000 characters.

imagesarray[string]

Images to use as character, scene, or style references. Accepts 1 to 5 images when combined with a video reference (video costs 2 units out of a total quota of 7). Supported formats: PNG, JPEG, JPG, WebP. Each image is limited to 20MB.

Min items: 1Max items: 5

video_clipsarray[object]required

Source video clips to use as references for generation. Supports 1 video clip. Each video is limited to 100MB and 30 seconds duration. The trimmed segment (ends - start) must not exceed 10 seconds.

Min items: 1Max items: 1

urlstringrequired

URL of the source video clip.

Format: uri

startintegerrequired

Start time in seconds for trimming the video clip.

Default: 0Min: 0Max: 29

endsintegerrequired

End time in seconds for trimming the video clip. The difference between ends and start must not exceed 10 seconds.

Default: 10Min: 1Max: 30

durationinteger

The duration of the generated video in seconds.

Default: 8

46810

aspect_ratiostring

The aspect ratio of the generated video.

Default: "16:9"

16:99:16

resolutionstring

The resolution of the generated video.

Default: "720p"

720p1080p4k

seedinteger

Random seed for reproducibility. Use -1 to use a random seed.

Default: -1

Ejemplo de cuerpo de solicitud

{
  "model": "google/gemini-omni-flash/reference-to-video-developer",
  "prompt": "A beautiful landscape",
  "video_clips": [
    {
      "url": "example_url",
      "start": 0,
      "ends": 10
    }
  ],
  "duration": 8,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "seed": -1
}

Schema de salida

La API devuelve una respuesta de predicción con las URL de salida generadas.

codeinteger

HTTP status code of the response.

messagestring

Human-readable message; non-empty on failure.

dataobject

Ejemplo de respuesta

{
  "id": "pred_abc123",
  "status": "completed",
  "model": "model-name",
  "outputs": [
    "https://storage.atlascloud.ai/outputs/result.mp4"
  ],
  "metrics": {
    "predict_time": 45.2
  },
  "created_at": "2025-01-01T00:00:00Z",
  "completed_at": "2025-01-01T00:00:10Z"
}

Atlas Cloud Skills

Atlas Cloud Skills integra más de 400 modelos de IA directamente en tu asistente de codificación con IA. Un solo comando para instalar y luego usa lenguaje natural para generar imágenes, videos y chatear con LLM.

Clientes compatibles

Claude Code

OpenAI Codex

Gemini CLI

Cursor

Windsurf

VS Code

Trae

GitHub Copilot

Cline

Roo Code

Amp

Goose

Replit

40+ clientes compatibles

Instalar

npx skills add AtlasCloudAI/atlas-cloud-skills

Configurar clave de API

Obtén tu clave de API desde el panel de Atlas Cloud y configúrala como variable de entorno.

export ATLASCLOUD_API_KEY="your-api-key-here"

Funcionalidades

Una vez instalado, puedes usar lenguaje natural en tu asistente de IA para acceder a todos los modelos de Atlas Cloud.

Generación de imágenesGenera imágenes con modelos como Nano Banana 2, Z-Image y más.

Creación de videosCrea videos a partir de texto o imágenes con Kling, Vidu, Veo, etc.

Chat con LLMChatea con Qwen, DeepSeek y otros modelos de lenguaje de gran escala.

Carga de mediosSube archivos locales para flujos de trabajo de edición de imágenes e imagen a video.

Más información

github.com/AtlasCloudAI/atlas-cloud-skills

MCP Server

Atlas Cloud MCP Server conecta tu IDE con más de 400 modelos de IA a través del Model Context Protocol. Funciona con cualquier cliente compatible con MCP.

Clientes compatibles

Cursor

VS Code

Windsurf

Claude Code

OpenAI Codex

Gemini CLI

Cline

Roo Code

100+ clientes compatibles

Instalar

npx -y atlascloud-mcp

Configuración

Agrega la siguiente configuración al archivo de configuración de MCP de tu IDE.

{
  "mcpServers": {
    "atlascloud": {
      "command": "npx",
      "args": [
        "-y",
        "atlascloud-mcp"
      ],
      "env": {
        "ATLASCLOUD_API_KEY": "your-api-key-here"
      }
    }
  }
}

Herramientas disponibles

atlas_generate_imageGenera imágenes a partir de indicaciones de texto.

atlas_generate_videoCrea videos a partir de texto o imágenes.

atlas_chatChatea con modelos de lenguaje de gran escala.

atlas_list_modelsExplora más de 400 modelos de IA disponibles.

atlas_quick_generateCreación de contenido en un solo paso con selección automática de modelo.

atlas_upload_mediaSube archivos locales para flujos de trabajo de API.

Más información

github.com/AtlasCloudAI/mcp-server

API Schema

{
  "info": {
    "title": "AtlasCloud API",
    "version": "1.0.0",
    "description": "The AtlasCloud API."
  },
  "openapi": "3.0.0",
  "paths": {
    "/api/v1/model/generateVideo": {
      "post": {
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/Input"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/PredictionResponse"
                }
              }
            },
            "description": "The request status."
          }
        }
      },
      "x-api-name": "model_run"
    },
    "/api/v1/model/prediction/{request_id}": {
      "get": {
        "parameters": [
          {
            "in": "path",
            "name": "request_id",
            "required": true,
            "schema": {
              "description": "Request ID",
              "type": "string"
            }
          }
        ],
        "responses": {
          "200": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/PredictionResponse"
                }
              }
            },
            "description": "Result of the request."
          }
        }
      },
      "x-api-name": "model_result"
    }
  },
  "components": {
    "schemas": {
      "Input": {
        "properties": {
          "model": {
            "type": "string",
            "description": "model name",
            "default": "google/gemini-omni-flash/reference-to-video-developer"
          },
          "prompt": {
            "description": "Text prompt for generation. Describes the target content, style, camera language, or character actions. Maximum 20,000 characters.",
            "type": "string"
          },
          "images": {
            "description": "Images to use as character, scene, or style references. Accepts 1 to 5 images when combined with a video reference (video costs 2 units out of a total quota of 7). Supported formats: PNG, JPEG, JPG, WebP. Each image is limited to 20MB.",
            "items": {
              "type": "string",
              "format": "uri"
            },
            "maxItems": 5,
            "minItems": 1,
            "type": "array"
          },
          "video_clips": {
            "description": "Source video clips to use as references for generation. Supports 1 video clip. Each video is limited to 100MB and 30 seconds duration. The trimmed segment (ends - start) must not exceed 10 seconds.",
            "type": "array",
            "items": {
              "type": "object",
              "required": [
                "url",
                "start",
                "ends"
              ],
              "properties": {
                "url": {
                  "type": "string",
                  "format": "uri",
                  "description": "URL of the source video clip.",
                  "x-ui-component": "uploader"
                },
                "start": {
                  "type": "integer",
                  "description": "Start time in seconds for trimming the video clip.",
                  "default": 0,
                  "minimum": 0,
                  "maximum": 29
                },
                "ends": {
                  "type": "integer",
                  "description": "End time in seconds for trimming the video clip. The difference between ends and start must not exceed 10 seconds.",
                  "default": 10,
                  "minimum": 1,
                  "maximum": 30
                }
              }
            },
            "minItems": 1,
            "maxItems": 1
          },
          "duration": {
            "default": 8,
            "description": "The duration of the generated video in seconds.",
            "enum": [
              4,
              6,
              8,
              10
            ],
            "type": "integer",
            "x-ui-component": "select"
          },
          "aspect_ratio": {
            "default": "16:9",
            "description": "The aspect ratio of the generated video.",
            "enum": [
              "16:9",
              "9:16"
            ],
            "type": "string",
            "x-placeholder": "Select one",
            "x-ui-component": "select"
          },
          "resolution": {
            "default": "720p",
            "description": "The resolution of the generated video.",
            "enum": [
              "720p",
              "1080p",
              "4k"
            ],
            "type": "string",
            "x-placeholder": "Select one",
            "x-ui-component": "select"
          },
          "seed": {
            "default": -1,
            "description": "Random seed for reproducibility. Use -1 to use a random seed.",
            "type": "integer"
          }
        },
        "required": [
          "model",
          "prompt",
          "video_clips"
        ],
        "type": "object",
        "x-order-properties": [
          "model",
          "prompt",
          "images",
          "video_clips",
          "duration",
          "aspect_ratio",
          "resolution",
          "seed"
        ]
      },
      "PredictionResponse": {
        "type": "object",
        "properties": {
          "code": {
            "description": "HTTP status code of the response.",
            "type": "integer"
          },
          "message": {
            "description": "Human-readable message; non-empty on failure.",
            "type": "string"
          },
          "data": {
            "type": "object",
            "properties": {
              "id": {
                "description": "Unique identifier for the prediction.",
                "type": "string"
              },
              "model": {
                "description": "Model ID used for the prediction.",
                "type": "string"
              },
              "outputs": {
                "description": "Array of URLs to the generated content. Null when status is not completed.",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "nullable": true
              },
              "urls": {
                "description": "Object containing related API endpoints.",
                "type": "object",
                "properties": {
                  "get": {
                    "description": "URL to poll for the prediction result.",
                    "type": "string",
                    "format": "uri"
                  }
                }
              },
              "has_nsfw_contents": {
                "description": "Array of boolean values indicating NSFW detection for each output. Null if not applicable.",
                "type": "array",
                "items": {
                  "type": "boolean"
                },
                "nullable": true
              },
              "status": {
                "description": "Status of the task: created, processing, completed, timeout, or failed.",
                "type": "string"
              },
              "created_at": {
                "description": "ISO timestamp of when the request was created (e.g., \"2023-04-01T12:34:56.789Z\").",
                "format": "date-time",
                "type": "string"
              },
              "error": {
                "description": "Error message if the task failed, empty string otherwise.",
                "type": "string"
              },
              "error_code": {
                "description": "Error code if the task failed.",
                "type": "integer"
              },
              "executionTime": {
                "description": "Total execution time in milliseconds.",
                "type": "number"
              },
              "timings": {
                "description": "Detailed timing breakdown.",
                "type": "object",
                "properties": {
                  "inference": {
                    "description": "Inference time in milliseconds.",
                    "type": "number"
                  }
                }
              }
            }
          }
        }
      }
    },
    "securitySchemes": {
      "apiKeyAuth": {
        "in": "header",
        "name": "Authorization",
        "type": "apiKey"
      }
    }
  },
  "servers": [
    {
      "url": "https://api.atlascloud.ai"
    }
  ]
}

Plantilla de Prompt Compatible con LLM

# google/gemini-omni-flash/reference-to-video-developer

> Gemini Omni Flash is Google's multimodal video generation model. This reference-to-video variant transforms existing video clips using reference images and text prompts, enabling video style transfer, scene editing, and character insertion.


## Overview

- **Submit endpoint (POST)**: `https://api.atlascloud.ai/api/v1/model/generateVideo` — start an async generation; returns a `prediction_id`
- **Poll endpoint (GET)**: `https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}` — poll this until the prediction finishes
- **Model ID**: `google/gemini-omni-flash/reference-to-video-developer`


## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:

- **`model`** (`string`, _required_):
  model name
  - Default: `"google/gemini-omni-flash/reference-to-video-developer"`

- **`prompt`** (`string`, _required_):
  Text prompt for generation. Describes the target content, style, camera language, or character actions. Maximum 20,000 characters.

- **`images`** (`array[string]`, _optional_):
  Images to use as character, scene, or style references. Accepts 1 to 5 images when combined with a video reference (video costs 2 units out of a total quota of 7). Supported formats: PNG, JPEG, JPG, WebP. Each image is limited to 20MB.
  - Min items: 1
  - Max items: 5

- **`video_clips`** (`array[object]`, _required_):
  Source video clips to use as references for generation. Supports 1 video clip. Each video is limited to 100MB and 30 seconds duration. The trimmed segment (ends - start) must not exceed 10 seconds.
  - Min items: 1
  - Max items: 1
  - Item properties:
    - **`url`** (`string`, _required_):
      URL of the source video clip.

    - **`start`** (`integer`, _required_):
      Start time in seconds for trimming the video clip.
      - Default: `0`
      - Min: 0
      - Max: 29

    - **`ends`** (`integer`, _required_):
      End time in seconds for trimming the video clip. The difference between ends and start must not exceed 10 seconds.
      - Default: `10`
      - Min: 1
      - Max: 30


- **`duration`** (`integer`, _optional_):
  The duration of the generated video in seconds.
  - Default: `8`
  - Options: 4, 6, 8, 10

- **`aspect_ratio`** (`string`, _optional_):
  The aspect ratio of the generated video.
  - Default: `"16:9"`
  - Options: "16:9", "9:16"

- **`resolution`** (`string`, _optional_):
  The resolution of the generated video.
  - Default: `"720p"`
  - Options: "720p", "1080p", "4k"

- **`seed`** (`integer`, _optional_):
  Random seed for reproducibility. Use -1 to use a random seed.
  - Default: `-1`



**Required Parameters Example**:

```json
{
  "model": "google/gemini-omni-flash/reference-to-video-developer",
  "prompt": "",
  "video_clips": [
    {
      "url": "",
      "start": 0,
      "ends": 10
    }
  ]
}
```


**Full Example**:

```json
{
  "model": "google/gemini-omni-flash/reference-to-video-developer",
  "prompt": "",
  "images": [
    ""
  ],
  "video_clips": [
    {
      "url": "",
      "start": 0,
      "ends": 10
    }
  ],
  "duration": 8,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "seed": -1
}
```


### Output Schema

The API returns the following output format:


- **`code`** (`integer`, _optional_):
  HTTP status code of the response.

- **`message`** (`string`, _optional_):
  Human-readable message; non-empty on failure.

- **`data`** (`object`, _optional_):
  - Properties:
    - **`id`** (`string`, _optional_):
      Unique identifier for the prediction.

    - **`model`** (`string`, _optional_):
      Model ID used for the prediction.

    - **`outputs`** (`array[string]`, _optional_):
      Array of URLs to the generated content. Null when status is not completed.

    - **`urls`** (`object`, _optional_):
      Object containing related API endpoints.
      - Properties:
        - **`get`** (`string`, _optional_):
          URL to poll for the prediction result.


    - **`has_nsfw_contents`** (`array[boolean]`, _optional_):
      Array of boolean values indicating NSFW detection for each output. Null if not applicable.

    - **`status`** (`string`, _optional_):
      Status of the task: created, processing, completed, timeout, or failed.

    - **`created_at`** (`string`, _optional_):
      ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z").

    - **`error`** (`string`, _optional_):
      Error message if the task failed, empty string otherwise.

    - **`error_code`** (`integer`, _optional_):
      Error code if the task failed.

    - **`executionTime`** (`number`, _optional_):
      Total execution time in milliseconds.

    - **`timings`** (`object`, _optional_):
      Detailed timing breakdown.
      - Properties:
        - **`inference`** (`number`, _optional_):
          Inference time in milliseconds.





**Example Response**:

```json
{
  "code": 0,
  "message": "",
  "data": {
    "id": "",
    "model": "",
    "outputs": [
      ""
    ],
    "urls": {
      "get": ""
    },
    "has_nsfw_contents": [],
    "status": "",
    "created_at": "",
    "error": "",
    "error_code": 0,
    "executionTime": 0,
    "timings": {
      "inference": 0
    }
  }
}
```


## Usage Examples

### cURL

```bash
# Step 1: Start generation (async)
curl -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
  -H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "google/gemini-omni-flash/reference-to-video-developer",
  "prompt": "",
  "images": [
    ""
  ],
  "video_clips": [
    {
      "url": "",
      "start": 0,
      "ends": 10
    }
  ],
  "duration": 8,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "seed": -1
}'

# Response will contain: {"code": 200, "data": {"id": "prediction_id", "status": "processing"}}

# Step 2: Poll for result (replace {prediction_id} with the id returned above)
curl -X GET "https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}" \
  -H "Authorization: Bearer $ATLASCLOUD_API_KEY"

# Keep polling until status is "completed", "succeeded" or "failed"
# When completed, outputs will contain the generated content URL(s)
```

## Additional Resources

### Documentation

- [Model Playground](https://www.atlascloud.ai/models/google/gemini-omni-flash/reference-to-video-developer)

replace the bee in the video with the butterfly in the first image

Cargando...

Gemini Omni Flash — Reference to Video (Developer)

Model ID: google/gemini-omni-flash/reference-to-video-developer

Gemini Omni is Google's multimodal video generation model designed to create high-quality video content from diverse input types. This variant accepts a text prompt, reference images, and a source video clip, enabling the most expressive form of video generation: transforming existing footage while preserving coherence and injecting new creative direction.

Overview

Gemini Omni brings together Google's deep knowledge of physics, narrative logic, biology, culture, and visual composition to produce contextually coherent videos. Rather than simple clip synthesis, the model reasons about scene dynamics, camera language, and temporal flow to produce results that feel intentional and cinematic.

With both image and video inputs, the model can change what happens in a scene, add or remove objects, adjust camera perspective, apply new visual effects, or completely reimagine a clip's style — all while maintaining temporal coherence with the source material.

The developer tier provides direct API access with full control over generation parameters including resolution, aspect ratio, and random seed.

Key Capabilities

Video-guided generation — Provide a source video clip as a structural or stylistic reference; the model builds upon it to produce new content.
Image reference anchoring — Supply 1 to 5 reference images alongside the video to define subjects, characters, or visual style.
Precise clip trimming — Specify start and end timestamps within the source video to use only the most relevant segment (trim window ≤ 10 seconds).
Rich prompt understanding — Describe transformations, additions, camera language, and mood in a prompt of up to 20,000 characters.
Multi-resolution output — Generate at 720p, 1080p, or 4K.
Flexible aspect ratios — 16:9 landscape or 9:16 portrait.
Reproducible results — Set a fixed seed to reproduce or iterate on a specific generation.

Input Parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	`google/gemini-omni-flash/reference-to-video-developer`	Model identifier
`prompt`	string	Yes	—	Text description of the desired transformation or new content. Max 20,000 characters.
`images`	array	Yes	—	1–5 reference image URLs used alongside the video. Supported formats: PNG, JPEG, JPG, WebP. Max 20MB each.
`video_clips`	array	Yes	—	Exactly 1 source video clip. See video clip object below.
`aspect_ratio`	string	No	`16:9`	Output aspect ratio. Enum: `16:9`, `9:16`.
`resolution`	string	No	`720p`	Output resolution. Enum: `720p`, `1080p`, `4k`.
`seed`	integer	No	`-1`	Random seed for reproducibility. `-1` uses a random seed.

Video Clip Object

Each entry in video_clips is an object with the following fields:

Field	Type	Required	Description
`url`	string (URI)	Yes	URL of the source video. Max 100MB, up to 30 seconds total duration.
`start`	number	No	Start time in seconds for trimming the clip.
`end`	number	No	End time in seconds. The difference `end − start` must not exceed 10 seconds.

Resource Quota

The total input quota per request is 7 units:

Each image consumes 1 unit
The video clip consumes 2 units

Maximum images when a video is present: 5 (5 × 1 + 1 × 2 = 7).

Image Input Notes

Accepts 1 to 5 images per request (when combined with a video).
Supported codecs: PNG, JPEG, JPG, WebP.
Minimum image dimensions: 128×128 pixels.
Each image must be under 20MB.

Use Cases

Video style transfer — Re-render existing footage in a new visual style described by prompt and reference images.
Scene editing — Add, remove, or modify objects and characters in a clip using natural language.
Camera perspective changes — Shift the implied viewpoint or camera movement of existing footage.
Character insertion — Inject a character (defined by reference images) into a scene from source video.
Iterative video production — Use rough footage as scaffolding and refine it into polished content through prompt-driven iteration.

Pricing

Pricing for this variant is fixed per generation based on output resolution only. Duration does not affect the price.

Resolution	Price per Generation
720p / 1080p	$1.60
4k	$2.40

Formula: resolution == "4k" ? $2.40 : $1.60

The fixed-price model reflects the additional compute cost of processing and conditioning on video input. 720p and 1080p are identically priced.

Explorar Modelos Similares

NEW

Referencia a Video

Gemini Omni Flash Reference-to-Video

A natively multimodal Google DeepMind model that generates cinematic, sound-enabled videos from a text prompt plus 1-5 reference images, carrying a consistent subject, scene, or style across generations.

Gemini Omni Flash Image-to-Video

A natively multimodal Google DeepMind model that animates a still image into a cinematic, sound-enabled video guided by a text prompt while preserving the source subject and composition.

Gemini Omni Flash Video Edit

A natively multimodal Google DeepMind model that edits an existing video from a text prompt with optional reference images, applying scene-consistent changes and native audio while preserving the untouched footage.

Gemini Omni Flash Text-to-Video

A natively multimodal Google DeepMind model that generates cinematic videos with synchronized native audio from a text prompt alone, grounded in real-world physics for controllable, high-speed video generation.

Gemini Omni Flash Image-to-Video Developer

Gemini Omni Flash is Google's multimodal video generation model. This image-to-video variant creates subject-consistent videos from up to 7 reference images combined with a text prompt, preserving visual identity across the full generated video.

Gemini Omni Flash Text-to-Video Developer

Gemini Omni Flash is Google's multimodal video generation model. This text-to-video variant generates high-quality cinematic videos from text prompts with support for multiple resolutions, aspect ratios, and controllable duration.

Veo 3.1 Lite Text-to-video

High-efficiency Veo 3.1 Lite text-to-video: create video with synchronized audio from text prompts. Targets high-volume applications with strong price efficiency; 720p/1080p and flexible duration options. Does not support 4K outputs or Extension.

Veo 3.1 Lite Start-End Frame to Video

Veo 3.1 Lite start-end frame to video: generate motion between a first and last frame with audio. Lightweight, developer-oriented option with 8s duration and 720p/1080p. Does not support 4K outputs or Extension.

Veo 3.1 Lite Image-to-video

High-efficiency Veo 3.1 Lite image-to-video: animate an input image into video with synchronized audio. Cost-effective for scalable workflows; supports 720p/1080p and common aspect ratios. Does not support 4K outputs or Extension.

Veo3.1 Fast Image-to-video

Bring still images to life with smooth, expressive motion. Veo 3.1 Image-to-Video transforms photos or keyframes into cinematic video sequences with realistic continuity and sound.

Veo3.1 Fast Text-to-video

Generate visually compelling videos from text in record time. Veo 3.1 Fast Text-to-Video prioritizes speed and responsiveness while maintaining impressive fidelity for rapid creative iteration.

Veo3.1 Image-to-video

Quickly animate static images into motion-rich, high-quality clips. Veo 3.1 Fast Image-to-Video accelerates rendering for fast previews and iterative visual storytelling.

Veo3.1 Reference-to-video

Create richly detailed videos guided by visual references. Veo 3.1 Reference-to-Video preserves characters, style, and composition across scenes for consistent, visually coherent storytelling.

Veo3.1 Text-to-video

Generate high-fidelity videos from text prompts with Google’s most advanced generative video model. Veo 3.1 delivers cinematic quality, dynamic camera motion, and lifelike detail for storytelling and creative production.

Sync.so Lipsync v3

Sync.so Lipsync v3 (sync-3) is Sync Labs state-of-the-art lip-sync model, re-syncing the lips of an existing video to a new audio track with industry-leading naturalness.

VEED Lipsync

VEED Lipsync re-drives the lip movements of an existing talking-head video to match a new audio track, preserving identity, appearance and background.

LIP_SYNC

From

$0.013/segundo

Una sola API para toda la IA multimedia.

Explorar Todos los Modelos

Gemini Omni Flash Reference-to-Video Developer API by Google

Entrada

Salida

Parámetros

Ejemplo de código

Instalar

Autenticación

Encabezados HTTP

Enviar una solicitud

Enviar una solicitud

Cuerpo de la solicitud

Respuesta

Verificar estado

Ejemplo de polling

Valores de estado

Respuesta completada

Subir archivos

Ejemplo de carga

Respuesta

Schema de entrada

Ejemplo de cuerpo de solicitud

Schema de salida

Ejemplo de respuesta

Atlas Cloud Skills

Clientes compatibles

Instalar

Configurar clave de API

Funcionalidades

MCP Server

Clientes compatibles

Instalar

Configuración

Herramientas disponibles

API Schema

Plantilla de Prompt Compatible con LLM

Gemini Omni Flash — Reference to Video (Developer)

Overview

Key Capabilities

Input Parameters

Video Clip Object

Resource Quota

Image Input Notes

Use Cases

Pricing

Explorar Modelos Similares

Gemini Omni Flash Reference-to-Video

Gemini Omni Flash Image-to-Video

Gemini Omni Flash Video Edit

Gemini Omni Flash Text-to-Video

Gemini Omni Flash Image-to-Video Developer

Gemini Omni Flash Text-to-Video Developer

Veo 3.1 Lite Text-to-video

Veo 3.1 Lite Start-End Frame to Video

Veo 3.1 Lite Image-to-video

Veo3.1 Fast Image-to-video

Veo3.1 Fast Text-to-video

Veo3.1 Image-to-video

Veo3.1 Reference-to-video

Veo3.1 Text-to-video

Sync.so Lipsync v3

VEED Lipsync

Una sola API para toda la IA multimedia.

Join our Discord community

Entrada

Salida

Parámetros

Ejemplo de código

Instalar

Autenticación

Encabezados HTTP

Enviar una solicitud

Enviar una solicitud

Cuerpo de la solicitud

Respuesta

Verificar estado

Ejemplo de polling

Valores de estado

Respuesta completada

Subir archivos

Ejemplo de carga