Nano Banana 2 Reference to Image Developer
image-vers-image
DEV

Nano Banana 2 Reference-to-Image Developer API by Google

google/nano-banana-2/reference-to-image-developer
Reference-to-image-developer

Google's advanced AI-powered video-to-image generation model, designed to generate high-quality static images from video clips combined with text instructions.

Nano Banana 2 — Qualité Pro à la vitesse Flash

V2

Également connu sous le nom de Gemini 3.1 Flash Image

Le dernier modèle d'image de Google DeepMind combine les capacités avancées de Nano Banana Pro avec la vitesse de Gemini Flash — offrant une génération 3-5x plus rapide, une résolution jusqu'à 4K et une cohérence des personnages pour jusqu'à 5 personnages dans un seul workflow.

Génération d'images nouvelle génération
  • Sortie jusqu'à 4K de résolution (niveaux 512px / 1K / 2K / 4K)
  • 10+ ratios d'aspect incluant 21:9, 1:4, 8:1 et plus
  • Rendu de texte précis et lisible dans les images
  • Qualité quasi-Pro (~95%) à la vitesse Flash
Édition intelligente et cohérence
  • Cohérence des personnages pour jusqu'à 5 personnages entre les scènes
  • Fidélité des objets pour jusqu'à 14 objets dans un workflow
  • Modifications ciblées via le langage naturel (supprimer, remplacer, reposer)
  • Fusion multi-images et composition fluide

Nouveautés de Nano Banana 2

3-5x plus rapide que Pro

Construit sur l'architecture Gemini 3.1 Flash, Nano Banana 2 génère des images standard en 4-8 secondes — contre 10-20 secondes pour Nano Banana Pro.

Ancrage par recherche d'images

La fonctionnalité phare de NB2 — il peut récupérer des images de référence du monde réel via Google Search pendant la génération, améliorant considérablement la précision pour les monuments, les célébrités et les logos de marques.

Rendu de texte précis

Générez du texte précis et lisible pour les maquettes marketing, les cartes de vœux et le contenu localisé. Vous pouvez même traduire et localiser le texte dans une image.

Cohérence multi-personnages

Maintenez la cohérence visuelle pour jusqu'à 5 personnages et 14 objets entre les scènes — parfait pour les storyboards, les bandes dessinées et les campagnes marketing.

Prompt Examples & Templates

Explore curated prompt templates showcasing Nano Banana 2's key capabilities — text rendering, character consistency, search grounding, and 4K output.

Marketing Mockup with Text
Text Rendering

Marketing Mockup with Text

Generate marketing visuals with accurate, legible text — one of NB2's standout improvements
Prompt

A minimalist coffee shop promotional poster with the text 'MORNING BREW — Fresh Roasted Daily' in elegant serif font, warm earth tones, steam rising from a ceramic cup, clean layout with plenty of whitespace

Multi-Scene Character
Character Consistency

Multi-Scene Character

Maintain character consistency across multiple scenes — supports up to 5 characters per workflow
Prompt

A young woman with short red hair and freckles, wearing a green jacket, standing in a rainy Tokyo street at night with neon reflections on wet pavement, cinematic lighting, photorealistic

Person to Action Figure
Photo to Action Figure

Person to Action Figure

Transform people from photos into collectible action figures with custom packaging
Prompt

Transform the person in the photo into an action figure, styled after [CHARACTER_NAME] from [SOURCE / CONTEXT]. Next to the figure, display the accessories including [ITEM_1], [ITEM_2], and [ITEM_3]. On the top of the toy box, write "[BOX_LABEL_TOP]", and underneath it, "[BOX_LABEL_BOTTOM]". Place the box in a [BACKGROUND_SETTING] environment.

Real-World Reference Generation
Search Grounding

Real-World Reference Generation

Leverage Image Search Grounding to generate accurate real-world subjects like landmarks and brands
Prompt

A photorealistic aerial view of the Eiffel Tower at golden hour, with the Seine River winding through Paris below, warm sunset light casting long shadows, high detail, 4K resolution

Product Design Render
Product Photography

Product Design Render

Create professional product photography with precise control over lighting and composition
Prompt

A frosted glass perfume bottle with a marble cap on a white marble surface, soft studio lighting from the left, subtle reflections, minimalist luxury aesthetic, product photography style

Artistic Style Transformation
Style Transfer

Artistic Style Transformation

Apply diverse artistic styles while maintaining subject integrity
Prompt

Transform this photo into Studio Ghibli animation style, keeping the same composition and subjects, lush watercolor backgrounds, soft diffused lighting, whimsical atmosphere

Ultra High Resolution Scene
4K Output

Ultra High Resolution Scene

Generate detailed scenes at up to 4K resolution with rich textures
Prompt

A cozy Japanese ramen shop interior at night, steam rising from bowls, warm amber lighting, detailed wooden counter with various condiments, a chef working in the background, 4K, ultra detailed

Cas d'utilisation

🎬
Storyboards et bandes dessinées
📸
Photographie de produits
📊
Maquettes marketing
📱
Contenu pour réseaux sociaux
🔤
Design de superposition de texte
👤
Conception de personnages
Retouche et édition photo
🎨
Contenu visuel de marque

Pourquoi choisir Nano Banana 2 ?

Vitesse Flash

3-5x plus rapide que Nano Banana Pro avec un temps de génération standard de 4-8 secondes
🎯

Qualité quasi-Pro

Atteint environ 95% de la qualité d'image de Pro dans la plupart des scénarios
💰

Économique

Environ la moitié du coût de Nano Banana Pro — rendant la génération d'images IA de haute qualité plus accessible

Spécifications techniques

Architecture :Gemini 3.1 Flash (GEMPIX2)
Support de résolution :De 512px à 4K (niveaux 512px / 1K / 2K / 4K)
Ratios d'aspect :1:1, 4:3, 3:4, 2:3, 3:2, 16:9, 9:16, 1:4, 4:1, 8:1, 21:9
Cohérence :Jusqu'à 5 personnages + 14 objets par workflow
Sécurité du contenu :Filigrane SynthID, compatible avec la norme C2PA
Accès API :Gemini API, Vertex AI, AI Studio, Gemini CLI

Découvrez Nano Banana 2

Génération d'images de niveau Pro à la vitesse Flash — créez des visuels époustouflants avec cohérence des personnages, rendu de texte et support de résolution 4K.

Crédits gratuits pour commencer
Accès API instantané
🌐Aucune configuration requise

Google Nano Banana 2 Reference to Image Developer

Nano Banana 2 Reference to Image Developer (Gemini 3.1 Flash Image) is Google's advanced AI-powered video-to-image generation model, designed to generate high-quality static images from video clips combined with text instructions. Built on the same cutting-edge model as Nano Banana 2 Edit, it adds the ability to use video content as a rich reference source — extracting visual context, themes, and key frames to synthesize new images with precision and semantic awareness.

This is the developer-tier variant of Nano Banana 2 Reference to Image, offering a streamlined parameter set. It is ideal for API integrations and workflows where output format flexibility and per-frame media resolution control are not required.

Why Choose This?

  • Video as reference — Provide a video clip (HTTP URL or YouTube URL) and let the model extract its visual context to guide image generation.

  • Multi-image reference — Optionally upload up to 10 additional reference images to complement the video input for complex compositions.

  • Natural language control — Describe exactly what you want with a text prompt; the model understands context, themes, and relationships from both the video and text.

  • Thinking levels — Choose how much internal reasoning the model applies — higher thinking levels improve quality on complex tasks.

  • Web search grounding — Optionally enable real-time web search to enrich generation with current information.

  • Multi-resolution output — Generate at 1K, 2K, or 4K resolution.

  • Flexible aspect ratios — Multiple options including 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9.

How It Works

The model analyzes your video clip by sampling frames at the specified FPS rate, then interprets the visual content within its multimodal context window. Combined with your text prompt and any additional reference images, it synthesizes a new image grounded in the video's themes, style, and key visual elements. This makes it especially powerful for creating content that is visually consistent with existing video assets.

Parameters

Core Inputs

ParameterRequiredDescription
promptYesText description of the desired output image
video_clipsYesSource video clip(s) for reference generation (max: 1, see below)
imagesNoAdditional reference images (max: 10, click "+ Add Item" to add more)

Video Clip Fields

FieldRequiredDescription
urlYesURL of the source video clip. Supports HTTP URL or YouTube video URL. HTTP video is limited to 15MB.
startYesStart time in seconds for trimming the video clip (min: 0)
endsYesEnd time in seconds for trimming the video clip. Set 0 to use the whole video.
fpsYesFrame sampling rate (FPS) of the video clip. Range: 0–24. Lower values reduce token usage.

Output Options

ParameterRequiredDescription
aspect_ratioNoAspect ratio: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
resolutionNoOutput resolution: 1k (default), 2k, 4k

Advanced Options

ParameterRequiredDescription
thinking_levelNoReasoning depth: default, high, minimal. Higher levels improve quality on complex tasks but increase latency.
enable_web_searchNoIf enabled, grounds generation with real-time web information.

How to Use

  1. Provide a video clip — enter the video URL (HTTP or YouTube) and set start/end times and FPS sampling rate.
  2. Write your prompt — describe the output image clearly (e.g., "Create a cinematic poster based on the key scenes in this video").
  3. Add reference images (optional) — upload additional images to guide composition or style.
  4. Choose aspect ratio (optional) — select a preset or leave empty for default.
  5. Select resolution — choose 1K, 2K, or 4K based on your quality needs.
  6. Adjust advanced settings (optional) — set thinking level or enable web search grounding.
  7. Run — submit and download your generated image.

Pricing

The total cost is determined by the output image resolution multiplied by the number of output images, plus optional per-request fees for video clip input and web search grounding.

SKU Prices

SKUDescriptionUnit Price
sku_1k1K resolution output image$0.08
sku_2k2K resolution output image$0.12
sku_4k4K resolution output image$0.16
sku_video_clipVideo clip input (per request)$0.07
sku_web_searchWeb search grounding (per request)$0.014

Pricing Formula

cost = (resolution == "2k" ? sku_2k : (resolution == "4k" ? sku_4k : sku_1k)) * images + (enable_web_search ? sku_web_search : 0) + (len(video_clips) > 0 ? sku_video_clip : 0)

Examples:

ResolutionVideo ClipWeb SearchTotal Cost
1KYesNo0.08+0.08 + 0.07 = $0.15
2KYesNo0.12+0.12 + 0.07 = $0.19
4KYesNo0.16+0.16 + 0.07 = $0.23
1KYesYes0.08+0.08 + 0.07 + 0.014=0.014 = **0.164**
1KNoNo$0.08
2KNoNo$0.12
4KNoNo$0.16

The video clip fee (0.07)andwebsearchfee(0.07) and web search fee (0.014) are each charged once per request when the respective feature is enabled, regardless of content volume.

Best Use Cases

  • Video Thumbnail Generation — Automatically create compelling thumbnails that reflect the video's content and mood.
  • Promotional Posters — Generate movie-style or campaign posters grounded in actual video footage.
  • Scene Summarization Art — Create visual summaries or highlight artwork from long-form video content.
  • Brand Content Creation — Produce consistent image assets from brand video campaigns.
  • Educational Infographics — Transform instructional videos into static visual materials.
  • Social Media Assets — Generate platform-optimized images (vertical, square, landscape) from video content.

Pro Tips

  • Use low FPS (0.5–1) for long videos to keep token usage within limits while still capturing key frames.
  • Set precise start/end times to focus the model on the most relevant segment of your video.
  • Combine specific text prompts with the video input — vague prompts may produce generic results.
  • Add reference images alongside the video to guide composition style more precisely.
  • Use thinking_level: high for complex scene interpretations or when visual fidelity matters most.
  • YouTube URLs are supported directly — no need to download and re-upload public videos.
  • 2K offers excellent quality at a reasonable price — only $0.04 more than 1K per image.
  • If you need output_format (PNG/JPEG) or media_resolution control, use the standard Reference to Image model instead.

Notes

  • Both prompt and video_clips are required fields.
  • Maximum video clips: 1 per request.
  • HTTP video URLs are limited to 15MB; use YouTube URLs for larger videos.
  • Maximum additional reference images: 10.
  • FPS range: 0–24. Higher FPS captures more frames but consumes more tokens.
  • The video clip fee ($0.07) is a flat per-request charge, not per frame or per second.
  • Output format is not configurable in this variant; use the standard model if PNG/JPEG selection is required.
  • Ensure your content and prompts comply with Google's Safety Guidelines.

Découvrir des modèles similaires

Une seule API pour toute l'IA multimédia.

Explorer tous les modèles

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.