- 최대 4K 해상도 출력 (512px / 1K / 2K / 4K 단계)
- 21:9, 1:4, 8:1 등 10+ 가지 화면 비율 지원
- 이미지 내 정확하고 읽기 쉬운 텍스트 렌더링
- Flash 속도로 거의 프로 수준의 품질 (약 95%)
- 장면 간 최대 5명의 캐릭터 일관성 유지
- 하나의 워크플로에서 최대 14개 오브젝트의 충실도
- 자연어를 통한 정밀 편집 (제거, 교체, 포즈 변경)
- 다중 이미지 블렌딩 및 매끄러운 합성
Nano Banana 2의 새로운 기능
Pro보다 3-5배 빠름Gemini 3.1 Flash 아키텍처를 기반으로 구축된 Nano Banana 2는 표준 이미지를 4-8초 만에 생성합니다. Nano Banana Pro의 10-20초와 비교해 보세요.
이미지 검색 그라운딩NB2의 핵심 기능 — 생성 중 Google 검색을 통해 실제 참조 이미지를 가져와 랜드마크, 유명인, 브랜드 로고의 정확도를 크게 향상시킵니다.
정확한 텍스트 렌더링마케팅 목업, 인사 카드, 현지화 콘텐츠를 위한 정확하고 읽기 쉬운 텍스트를 생성합니다. 이미지 내 텍스트의 번역 및 현지화도 가능합니다.
다중 캐릭터 일관성장면 간 최대 5명의 캐릭터와 14개의 오브젝트에 대한 시각적 일관성 유지 — 스토리보드, 만화, 마케팅 캠페인에 완벽합니다.

Text Rendering
A minimalist coffee shop promotional poster with the text 'MORNING BREW — Fresh Roasted Daily' in elegant serif font, warm earth tones, steam rising from a ceramic cup, clean layout with plenty of whitespace

Character Consistency
A young woman with short red hair and freckles, wearing a green jacket, standing in a rainy Tokyo street at night with neon reflections on wet pavement, cinematic lighting, photorealistic

Photo to Action Figure
Transform the person in the photo into an action figure, styled after [CHARACTER_NAME] from [SOURCE / CONTEXT]. Next to the figure, display the accessories including [ITEM_1], [ITEM_2], and [ITEM_3]. On the top of the toy box, write "[BOX_LABEL_TOP]", and underneath it, "[BOX_LABEL_BOTTOM]". Place the box in a [BACKGROUND_SETTING] environment.

Search Grounding
A photorealistic aerial view of the Eiffel Tower at golden hour, with the Seine River winding through Paris below, warm sunset light casting long shadows, high detail, 4K resolution

Product Photography
A frosted glass perfume bottle with a marble cap on a white marble surface, soft studio lighting from the left, subtle reflections, minimalist luxury aesthetic, product photography style

Style Transfer
Transform this photo into Studio Ghibli animation style, keeping the same composition and subjects, lush watercolor backgrounds, soft diffused lighting, whimsical atmosphere

4K Output
A cozy Japanese ramen shop interior at night, steam rising from bowls, warm amber lighting, detailed wooden counter with various condiments, a chef working in the background, 4K, ultra detailed
왜 Nano Banana 2를 선택해야 할까요?
⚡
Flash 속도
Nano Banana Pro보다 3-5배 빠르며, 표준 생성 시간 4-8초🎯
거의 프로 수준 품질
대부분의 시나리오에서 Pro 이미지 품질의 약 95% 달성💰
비용 효율적
Nano Banana Pro 대비 약 절반의 비용 — 고품질 AI 이미지 생성을 더 쉽게 이용 가능기술 사양
아키텍처:Gemini 3.1 Flash (GEMPIX2)
해상도 지원:512px에서 4K까지 (512px / 1K / 2K / 4K 단계)
화면 비율:1:1, 4:3, 3:4, 2:3, 3:2, 16:9, 9:16, 1:4, 4:1, 8:1, 21:9
일관성:워크플로당 최대 5명 캐릭터 + 14개 오브젝트
콘텐츠 안전:SynthID 워터마크, C2PA 표준 호환
API 액세스:Gemini API, Vertex AI, AI Studio, Gemini CLI
Nano Banana 2 체험하기
Flash 속도의 프로 수준 이미지 생성 — 캐릭터 일관성, 텍스트 렌더링, 4K 해상도 지원으로 멋진 비주얼을 제작하세요.
✨무료 크레딧으로 시작
⚡즉시 API 접근
🌐설정 불필요
Google Nano Banana 2 Reference to Image Developer
Nano Banana 2 Reference to Image Developer (Gemini 3.1 Flash Image) is Google's advanced AI-powered video-to-image generation model, designed to generate high-quality static images from video clips combined with text instructions. Built on the same cutting-edge model as Nano Banana 2 Edit, it adds the ability to use video content as a rich reference source — extracting visual context, themes, and key frames to synthesize new images with precision and semantic awareness.
This is the developer-tier variant of Nano Banana 2 Reference to Image, offering a streamlined parameter set. It is ideal for API integrations and workflows where output format flexibility and per-frame media resolution control are not required.
Why Choose This?
-
Video as reference — Provide a video clip (HTTP URL or YouTube URL) and let the model extract its visual context to guide image generation.
-
Multi-image reference — Optionally upload up to 10 additional reference images to complement the video input for complex compositions.
-
Natural language control — Describe exactly what you want with a text prompt; the model understands context, themes, and relationships from both the video and text.
-
Thinking levels — Choose how much internal reasoning the model applies — higher thinking levels improve quality on complex tasks.
-
Web search grounding — Optionally enable real-time web search to enrich generation with current information.
-
Multi-resolution output — Generate at 1K, 2K, or 4K resolution.
-
Flexible aspect ratios — Multiple options including 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9.
How It Works
The model analyzes your video clip by sampling frames at the specified FPS rate, then interprets the visual content within its multimodal context window. Combined with your text prompt and any additional reference images, it synthesizes a new image grounded in the video's themes, style, and key visual elements. This makes it especially powerful for creating content that is visually consistent with existing video assets.
Parameters
| Parameter | Required | Description |
|---|
| prompt | Yes | Text description of the desired output image |
| video_clips | Yes | Source video clip(s) for reference generation (max: 1, see below) |
| images | No | Additional reference images (max: 10, click "+ Add Item" to add more) |
Video Clip Fields
| Field | Required | Description |
|---|
| url | Yes | URL of the source video clip. Supports HTTP URL or YouTube video URL. HTTP video is limited to 15MB. |
| start | Yes | Start time in seconds for trimming the video clip (min: 0) |
| ends | Yes | End time in seconds for trimming the video clip. Set 0 to use the whole video. |
| fps | Yes | Frame sampling rate (FPS) of the video clip. Range: 0–24. Lower values reduce token usage. |
Output Options
| Parameter | Required | Description |
|---|
| aspect_ratio | No | Aspect ratio: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
| resolution | No | Output resolution: 1k (default), 2k, 4k |
Advanced Options
| Parameter | Required | Description |
|---|
| thinking_level | No | Reasoning depth: default, high, minimal. Higher levels improve quality on complex tasks but increase latency. |
| enable_web_search | No | If enabled, grounds generation with real-time web information. |
How to Use
- Provide a video clip — enter the video URL (HTTP or YouTube) and set start/end times and FPS sampling rate.
- Write your prompt — describe the output image clearly (e.g., "Create a cinematic poster based on the key scenes in this video").
- Add reference images (optional) — upload additional images to guide composition or style.
- Choose aspect ratio (optional) — select a preset or leave empty for default.
- Select resolution — choose 1K, 2K, or 4K based on your quality needs.
- Adjust advanced settings (optional) — set thinking level or enable web search grounding.
- Run — submit and download your generated image.
Pricing
The total cost is determined by the output image resolution multiplied by the number of output images, plus optional per-request fees for video clip input and web search grounding.
SKU Prices
| SKU | Description | Unit Price |
|---|
| sku_1k | 1K resolution output image | $0.08 |
| sku_2k | 2K resolution output image | $0.12 |
| sku_4k | 4K resolution output image | $0.16 |
| sku_video_clip | Video clip input (per request) | $0.07 |
| sku_web_search | Web search grounding (per request) | $0.014 |
cost = (resolution == "2k" ? sku_2k : (resolution == "4k" ? sku_4k : sku_1k)) * images
+ (enable_web_search ? sku_web_search : 0)
+ (len(video_clips) > 0 ? sku_video_clip : 0)
Examples:
| Resolution | Video Clip | Web Search | Total Cost |
|---|
| 1K | Yes | No | 0.08+0.07 = $0.15 |
| 2K | Yes | No | 0.12+0.07 = $0.19 |
| 4K | Yes | No | 0.16+0.07 = $0.23 |
| 1K | Yes | Yes | 0.08+0.07 + 0.014=∗∗0.164** |
| 1K | No | No | $0.08 |
| 2K | No | No | $0.12 |
| 4K | No | No | $0.16 |
The video clip fee (0.07)andwebsearchfee(0.014) are each charged once per request when the respective feature is enabled, regardless of content volume.
Best Use Cases
- Video Thumbnail Generation — Automatically create compelling thumbnails that reflect the video's content and mood.
- Promotional Posters — Generate movie-style or campaign posters grounded in actual video footage.
- Scene Summarization Art — Create visual summaries or highlight artwork from long-form video content.
- Brand Content Creation — Produce consistent image assets from brand video campaigns.
- Educational Infographics — Transform instructional videos into static visual materials.
- Social Media Assets — Generate platform-optimized images (vertical, square, landscape) from video content.
Pro Tips
- Use low FPS (0.5–1) for long videos to keep token usage within limits while still capturing key frames.
- Set precise start/end times to focus the model on the most relevant segment of your video.
- Combine specific text prompts with the video input — vague prompts may produce generic results.
- Add reference images alongside the video to guide composition style more precisely.
- Use
thinking_level: high for complex scene interpretations or when visual fidelity matters most.
- YouTube URLs are supported directly — no need to download and re-upload public videos.
- 2K offers excellent quality at a reasonable price — only $0.04 more than 1K per image.
- If you need
output_format (PNG/JPEG) or media_resolution control, use the standard Reference to Image model instead.
Notes
- Both
prompt and video_clips are required fields.
- Maximum video clips: 1 per request.
- HTTP video URLs are limited to 15MB; use YouTube URLs for larger videos.
- Maximum additional reference images: 10.
- FPS range: 0–24. Higher FPS captures more frames but consumes more tokens.
- The video clip fee ($0.07) is a flat per-request charge, not per frame or per second.
- Output format is not configurable in this variant; use the standard model if PNG/JPEG selection is required.
- Ensure your content and prompts comply with Google's Safety Guidelines.