Google's advanced AI-powered video-to-image generation model, designed to generate high-quality static images from video clips combined with text instructions.
Nano Banana 2 — Pro-Qualität in Flash-Geschwindigkeit
V2
Auch bekannt als Gemini 3.1 Flash Image
Das neueste Bildmodell von Google DeepMind vereint die fortschrittlichen Funktionen von Nano Banana Pro mit der Geschwindigkeit von Gemini Flash — mit 3-5x schnellerer Generierung, bis zu 4K Auflösung und Charakterkonsistenz für bis zu 5 Charaktere in einem einzigen Workflow.
Bildgenerierung der nächsten Generation
Ausgabe bis zu 4K Auflösung (512px / 1K / 2K / 4K Stufen)
10+ Seitenverhältnisse einschließlich 21:9, 1:4, 8:1 und mehr
Präzises und lesbares Text-Rendering in Bildern
Nahezu Pro-Qualität (~95%) bei Flash-Geschwindigkeit
Intelligente Bearbeitung und Konsistenz
Charakterkonsistenz für bis zu 5 Charaktere über Szenen hinweg
Objekttreue für bis zu 14 Objekte in einem Workflow
Gezielte Bearbeitungen über natürliche Sprache (entfernen, ersetzen, umposieren)
Multi-Bild-Blending und nahtlose Komposition
Was ist neu in Nano Banana 2
3-5x schneller als Pro
Aufgebaut auf der Gemini 3.1 Flash Architektur generiert Nano Banana 2 Standardbilder in 4-8 Sekunden — verglichen mit 10-20 Sekunden bei Nano Banana Pro.
Bildsuch-Grounding
Das herausragende Feature von NB2 — es kann während der Generierung reale Referenzbilder über die Google-Suche abrufen und so die Genauigkeit für Sehenswürdigkeiten, berühmte Personen und Markenlogos erheblich verbessern.
Präzises Text-Rendering
Generieren Sie präzisen, lesbaren Text für Marketing-Mockups, Grußkarten und lokalisierte Inhalte. Sie können sogar Text innerhalb eines Bildes übersetzen und lokalisieren.
Multi-Charakter-Konsistenz
Behalten Sie die visuelle Konsistenz für bis zu 5 Charaktere und 14 Objekte über Szenen hinweg bei — perfekt für Storyboards, Comics und Marketingkampagnen.
Prompt Examples & Templates
Explore curated prompt templates showcasing Nano Banana 2's key capabilities — text rendering, character consistency, search grounding, and 4K output.
Text Rendering
Marketing Mockup with Text
Generate marketing visuals with accurate, legible text — one of NB2's standout improvements
Prompt
A minimalist coffee shop promotional poster with the text 'MORNING BREW — Fresh Roasted Daily' in elegant serif font, warm earth tones, steam rising from a ceramic cup, clean layout with plenty of whitespace
Character Consistency
Multi-Scene Character
Maintain character consistency across multiple scenes — supports up to 5 characters per workflow
Prompt
A young woman with short red hair and freckles, wearing a green jacket, standing in a rainy Tokyo street at night with neon reflections on wet pavement, cinematic lighting, photorealistic
Photo to Action Figure
Person to Action Figure
Transform people from photos into collectible action figures with custom packaging
Prompt
Transform the person in the photo into an action figure, styled after [CHARACTER_NAME] from [SOURCE / CONTEXT]. Next to the figure, display the accessories including [ITEM_1], [ITEM_2], and [ITEM_3]. On the top of the toy box, write "[BOX_LABEL_TOP]", and underneath it, "[BOX_LABEL_BOTTOM]". Place the box in a [BACKGROUND_SETTING] environment.
Search Grounding
Real-World Reference Generation
Leverage Image Search Grounding to generate accurate real-world subjects like landmarks and brands
Prompt
A photorealistic aerial view of the Eiffel Tower at golden hour, with the Seine River winding through Paris below, warm sunset light casting long shadows, high detail, 4K resolution
Product Photography
Product Design Render
Create professional product photography with precise control over lighting and composition
Prompt
A frosted glass perfume bottle with a marble cap on a white marble surface, soft studio lighting from the left, subtle reflections, minimalist luxury aesthetic, product photography style
Style Transfer
Artistic Style Transformation
Apply diverse artistic styles while maintaining subject integrity
Prompt
Transform this photo into Studio Ghibli animation style, keeping the same composition and subjects, lush watercolor backgrounds, soft diffused lighting, whimsical atmosphere
4K Output
Ultra High Resolution Scene
Generate detailed scenes at up to 4K resolution with rich textures
Prompt
A cozy Japanese ramen shop interior at night, steam rising from bowls, warm amber lighting, detailed wooden counter with various condiments, a chef working in the background, 4K, ultra detailed
Anwendungsfälle
🎬
Storyboarding und Comics
📸
Produktfotografie
📊
Marketing-Mockups
📱
Social-Media-Inhalte
🔤
Text-Overlay-Design
👤
Charakter-Design
✨
Fotobearbeitung und Retusche
🎨
Visuelle Markeninhalte
Warum Nano Banana 2 wählen?
⚡
Flash-Geschwindigkeit
3-5x schneller als Nano Banana Pro mit 4-8 Sekunden Standard-Generierungszeit
🎯
Nahezu Pro-Qualität
Erreicht in den meisten Szenarien ungefähr 95% der Bildqualität von Pro
💰
Kosteneffizient
Ungefähr die Hälfte der Kosten von Nano Banana Pro — hochwertige KI-Bildgenerierung wird zugänglicher
API-Zugang:Gemini API, Vertex AI, AI Studio, Gemini CLI
Erleben Sie Nano Banana 2
Pro-Level-Bildgenerierung in Flash-Geschwindigkeit — erstellen Sie beeindruckende Visuals mit Charakterkonsistenz, Text-Rendering und 4K-Auflösungsunterstützung.
✨Kostenlose Credits zum Start
⚡Sofortiger API-Zugang
🌐Keine Einrichtung erforderlich
Google Nano Banana 2 Reference to Image
Nano Banana 2 Reference to Image (Gemini 3.1 Flash Image) is Google's advanced AI-powered video-to-image generation model, designed to generate high-quality static images from video clips combined with text instructions. Built on the same cutting-edge model as Nano Banana 2 Edit, it adds the ability to use video content as a rich reference source — extracting visual context, themes, and key frames to synthesize new images with precision and semantic awareness.
This model is ideal for creating thumbnails, posters, promotional artwork, and scene summaries by leveraging the visual richness of existing video content alongside natural language guidance.
Why Choose This?
Video as reference — Provide a video clip (HTTP URL or YouTube URL) and let the model extract its visual context to guide image generation.
Multi-image reference — Optionally upload up to 10 additional reference images to complement the video input for complex compositions.
Natural language control — Describe exactly what you want with a text prompt; the model understands context, themes, and relationships from both the video and text.
Thinking levels — Choose how much internal reasoning the model applies — higher thinking levels improve quality on complex tasks.
Media resolution control — Balance detail and token usage for input video frames with LOW, MEDIUM, or HIGH media resolution modes.
Web & image search grounding — Optionally enable real-time web or image search to enrich generation with current information.
Multi-resolution output — Generate at 1K, 2K, or 4K resolution.
Flexible aspect ratios — Multiple options including 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9.
Format choice — Export in PNG or JPEG format.
How It Works
The model analyzes your video clip by sampling frames at the specified FPS rate, then interprets the visual content within its multimodal context window. Combined with your text prompt and any additional reference images, it synthesizes a new image grounded in the video's themes, style, and key visual elements. This makes it especially powerful for creating content that is visually consistent with existing video assets.
Parameters
Core Inputs
Parameter
Required
Description
prompt
Yes
Text description of the desired output image
video_clips
Yes
Source video clip(s) for reference generation (max: 1, see below)
Reasoning depth: default, high, minimal. Higher levels improve quality on complex tasks but increase latency.
media_resolution
No
How input media frames are processed: default, low, medium, high. LOW reduces tokens per frame, allowing longer videos.
enable_web_search
No
If enabled, grounds generation with real-time web information.
enable_image_search
No
If enabled, grounds generation with real-time image search results.
How to Use
Provide a video clip — enter the video URL (HTTP or YouTube) and set start/end times and FPS sampling rate.
Write your prompt — describe the output image clearly (e.g., "Create a cinematic poster based on the key scenes in this video").
Add reference images (optional) — upload additional images to guide composition or style.
Choose aspect ratio (optional) — select a preset or leave empty for default.
Select resolution — choose 1K, 2K, or 4K based on your quality needs.
Choose output format — PNG for transparency support, JPEG for smaller file size.
Adjust advanced settings (optional) — set thinking level, media resolution, or enable search grounding.
Run — submit and download your generated image.
Pricing
The total cost is determined by the output image resolution multiplied by the number of output images, plus optional per-request fees for video clip input, web search, and image search grounding.
The video clip fee (0.07),websearchfee(0.014), and image search fee ($0.014) are each charged once per request when the respective feature is enabled, regardless of content volume.
Best Use Cases
Video Thumbnail Generation — Automatically create compelling thumbnails that reflect the video's content and mood.
Promotional Posters — Generate movie-style or campaign posters grounded in actual video footage.
Scene Summarization Art — Create visual summaries or highlight artwork from long-form video content.
Brand Content Creation — Produce consistent image assets from brand video campaigns.
Educational Infographics — Transform instructional videos into static visual materials.
Social Media Assets — Generate platform-optimized images (vertical, square, landscape) from video content.
Pro Tips
Use low FPS (0.5–1) for long videos to keep token usage within limits while still capturing key frames.
Set precise start/end times to focus the model on the most relevant segment of your video.
Combine specific text prompts with the video input — vague prompts may produce generic results.
Add reference images alongside the video to guide composition style more precisely.
Use thinking_level: high for complex scene interpretations or when visual fidelity matters most.
Set media_resolution: low when analyzing long videos to allow more frames within the context window.
2K offers excellent quality at a reasonable price — only $0.04 more than 1K per image.
YouTube URLs are supported directly — no need to download and re-upload public videos.
Notes
Both prompt and video_clips are required fields.
Maximum video clips: 1 per request.
HTTP video URLs are limited to 15MB; use YouTube URLs for larger videos.
Maximum additional reference images: 10.
FPS range: 0–24. Higher FPS captures more frames but consumes more tokens.
The video clip fee ($0.07) is a flat per-request charge, not per frame or per second.
If aspect_ratio is not selected, the model uses a default ratio.
4K resolution costs 2× the standard 1K rate.
Ensure your content and prompts comply with Google's Safety Guidelines.
Related Models
Nano Banana 2 Edit — Edit images using text prompts and reference images (no video input).
import requests
import time
# Step 1: Start image generationgenerate_url ="https://api.atlascloud.ai/api/v1/model/generateImage"headers ={"Content-Type":"application/json","Authorization":"Bearer $ATLASCLOUD_API_KEY"}data ={"model":"google/nano-banana-2/reference-to-image","prompt":"A beautiful landscape with mountains and lake","width":512,"height":512,"steps":20,"guidance_scale":7.5,}generate_response = requests.post(generate_url, headers=headers, json=data)generate_result = generate_response.json()prediction_id = generate_result["data"]["id"]# Step 2: Poll for resultpoll_url =f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"defcheck_status():whileTrue: response = requests.get(poll_url, headers={"Authorization":"Bearer $ATLASCLOUD_API_KEY"}) result = response.json()if result["data"]["status"]=="completed":print("Generated image:", result["data"]["outputs"][0])return result["data"]["outputs"][0]elif result["data"]["status"]=="failed":raise Exception(result["data"]["error"]or"Generation failed")else:# Still processing, wait 2 seconds time.sleep(2)image_url = check_status()
Installieren
Installieren Sie das erforderliche Paket für Ihre Programmiersprache.
bash
pip install requests
Authentifizierung
Alle API-Anfragen erfordern eine Authentifizierung über einen API-Schlüssel. Sie können Ihren API-Schlüssel über das Atlas Cloud Dashboard erhalten.
bash
exportATLASCLOUD_API_KEY="your-api-key-here"
HTTP-Header
python
import os
API_KEY = os.environ.get("ATLASCLOUD_API_KEY")headers ={"Content-Type":"application/json","Authorization":f"Bearer {API_KEY}"}
Schützen Sie Ihren API-Schlüssel
Geben Sie Ihren API-Schlüssel niemals in clientseitigem Code oder öffentlichen Repositories preis. Verwenden Sie stattdessen Umgebungsvariablen oder einen Backend-Proxy.
Laden Sie Dateien in den Atlas Cloud Speicher hoch und erhalten Sie eine URL, die Sie in Ihren API-Anfragen verwenden können. Verwenden Sie multipart/form-data zum Hochladen.
Atlas Cloud Skills integriert über 300 KI-Modelle direkt in Ihren KI-Coding-Assistenten. Ein Befehl zur Installation, dann verwenden Sie natürliche Sprache, um Bilder, Videos zu generieren und mit LLMs zu chatten.
Unterstützte Clients
Claude Code
OpenAI Codex
Gemini CLI
Cursor
Windsurf
VS Code
Trae
GitHub Copilot
Cline
Roo Code
Amp
Goose
Replit
40+ unterstützte clients
Installieren
bash
npx skills add AtlasCloudAI/atlas-cloud-skills
API-Schlüssel einrichten
Erhalten Sie Ihren API-Schlüssel über das Atlas Cloud Dashboard und setzen Sie ihn als Umgebungsvariable.
bash
exportATLASCLOUD_API_KEY="your-api-key-here"
Funktionen
Nach der Installation können Sie natürliche Sprache in Ihrem KI-Assistenten verwenden, um auf alle Atlas Cloud Modelle zuzugreifen.
BildgenerierungGenerieren Sie Bilder mit Modellen wie Nano Banana 2, Z-Image und mehr.
VideoerstellungErstellen Sie Videos aus Text oder Bildern mit Kling, Vidu, Veo usw.
LLM-ChatChatten Sie mit Qwen, DeepSeek und anderen großen Sprachmodellen.
Medien-UploadLaden Sie lokale Dateien für Bildbearbeitung und Bild-zu-Video-Workflows hoch.