ƯU ĐÃI CÓ HẠN|GIẢM 20% cho Seedance 2.0 & 2.0 Mini!
Trang chủ
Khám phá
Google
Gemini Omni
google/gemini-omni-flash/video-edit
Gemini Omni Flash Video Edit
Video-Video

Gemini Omni Flash Video Edit API by Google

google/gemini-omni-flash/video-edit
Video-edit

A natively multimodal Google DeepMind model that edits an existing video from a text prompt with optional reference images, applying scene-consistent changes and native audio while preserving the untouched footage.

Gemini Omni Flash — Video Edit

Model ID: google/gemini-omni-flash/video-edit

Gemini Omni Flash is Google DeepMind's high-performance, natively multimodal model built for high-speed video generation, editing, and cinematic control. This variant accepts a source video plus a text prompt (and, optionally, reference images), transforming an existing clip according to your instructions.

Overview

Gemini Omni Flash (gemini-omni-flash-preview) was introduced by Google alongside Nano Banana 2 Lite as a new generation of multimodal media models. Unlike traditional pipelines that stitch modalities together, Omni Flash is a single transformer that processes text, images, audio, and video simultaneously, producing output that is more cohesive, consistent, and controllable.

What sets it apart from earlier video models (such as the Veo family) is that it natively generates audio with every video — dialogue, ambience, music, and sound design are produced together with the picture rather than added afterward. The model is grounded in Gemini's real-world knowledge, so it reasons about physics, narrative logic, culture, and visual composition to produce results that feel intentional and cinematic. Generated media carries an invisible SynthID watermark.

AtlasCloud exposes Gemini Omni Flash through four endpoints — text-to-video, image-to-video, reference-to-video, and video-edit. All four route to the same gemini-omni-flash-preview model and differ only by the input modality they accept, corresponding to the model's task parameter (text_to_video, image_to_video, reference_to_video, edit). This endpoint maps to edit.

Inputs

This variant takes a source video and a text prompt, with optional reference images. The prompt describes the edit to apply — adding, removing, or transforming elements, restyling, or changing the audio — while Omni Flash preserves the rest of the clip. Because the model understands the whole scene, edits stay consistent with the surrounding footage rather than looking pasted on.

  • Video — the source clip to edit. Up to 100 MB and 30 seconds in duration.
  • Prompt — Natural-language description of the edit to apply (up to 20,000 characters).
  • Images (optional) — 1 to 5 reference images to guide the edit (e.g. a subject, object, or style to introduce). PNG/JPEG/JPG/WebP, ≤20 MB each. URL or base64.

Key Capabilities

  • Instruction-driven editing — Add, remove, replace, or restyle elements of a clip from a plain-language description.
  • Scene-consistent results — Edits blend into the existing footage, preserving untouched regions, lighting, and motion.
  • Reference-guided edits — Optionally supply up to 5 images to introduce a specific subject, object, or style.
  • Native audio generation — Edits can regenerate or adjust the accompanying soundtrack alongside the picture.
  • World-grounded realism — Physics, motion, and scene dynamics informed by Gemini's real-world knowledge.
  • Adjustable reasoning — The thinking_level control trades latency for quality on complex edits.
  • Reproducible results — Set a fixed seed to reproduce or iterate on a specific generation.

Input Parameters

ParameterTypeRequiredDefaultDescription
modelstringYesgoogle/gemini-omni-flash/video-editModel identifier
promptstringYesText description of the edit to apply. Max 20,000 characters.
videostring (uri)YesSource video to edit. ≤100 MB and ≤30 seconds.
imagesarray of string (uri)No1–5 optional reference images for character, scene, or style. PNG/JPEG/JPG/WebP, ≤20 MB each. URL or base64.
thinking_levelstringNodefaultInternal reasoning effort. Enum: default, high, low.
resolutionstringNo720pOutput resolution. Enum: 720p.
seedintegerNo-1Random seed for reproducibility. -1 uses a random seed.

Output duration and aspect ratio follow the source video, so this variant has no duration or aspect_ratio parameter.

Use Cases

  • Element edits — Add, remove, or replace objects, characters, or backgrounds in existing footage.
  • Restyling — Transform the look, color grade, or mood of a clip while keeping its motion.
  • Localization & cleanup — Swap on-screen elements or refresh assets without reshooting.
  • Reference-driven insertion — Bring a specific product or character (supplied as images) into an existing scene.
  • Iterative refinement — Apply successive edits to converge on a desired result.

Pricing

Billing is based on the duration of the source video, charged at a flat per-second rate.

SKURate
Per second of source video$0.14

Formula: clamp(video_duration, 3, 30) × $0.14

  • Billing follows the source video's duration, clamped to a 3-second minimum and a 30-second maximum.
  • Example: a 10-second source video costs 10 × $0.14 = $1.40.
  • Example: a 30-second source video costs 30 × $0.14 = $4.20.

Khám phá Các Mô hình Tương tự

Một API cho mọi AI đa phương tiện.

Khám phá tất cả mô hình

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.