Startpagina
Verkennen
Midjourney Models
midjourney/v8.1/image-to-video
Midjourney V8.1 Image-to-Video
Beeld-naar-Video

Midjourney V8.1 Image-to-Video API by MIDJOURNEY

midjourney/v8.1/image-to-video
Image-to-video

Midjourney V8.1 animates an input image into four 5-second videos at 480p or 720p.

1. Introduction

Midjourney V8.1 is a text-to-image generation model developed by Midjourney, Inc., representing the latest iteration in the company's image synthesis research. This README applies to the following API model identifiers:

  • midjourney/v8.1/text-to-image
  • midjourney/v8.1/image-to-video

Midjourney V8.1 is designed to produce high-aesthetic, prompt-faithful imagery at native 2K resolution with substantially faster generation than prior versions. It is built by Midjourney, an independent, self-funded San Francisco research lab (~11–50 staff) founded in August 2021 by David Holz, and is positioned as a speed- and quality-focused evolution of the company's image pipeline rather than a full feature replacement for its predecessor.

The V8 line is a full from-scratch rewrite of Midjourney's image model, accompanied by a migration from TPU-based to GPU-native PyTorch infrastructure. The model's defining methodology is a human-preference aesthetic tuning loop combined with per-user personalization, prioritizing visually compelling output over raw fidelity to a reference dataset. Released into alpha on April 14, 2026 and reaching general availability across web and Discord on April 30, 2026, V8.1 remains in early testing; the prior V7 model continues to serve as Midjourney's documented default due to feature gaps described below.


2. Key Features & Innovations

  • Native 2K HD output without a separate upscaler: V8.1 generates directly at 2048px resolution, eliminating the dedicated upscaling step required by earlier versions. HD renders take roughly 1.33 GPU-minutes and standard-definition renders under 1 GPU-minute, with HD running approximately 3× faster and cheaper than in V8.

  • ~5× faster generation: The GPU-native PyTorch rewrite delivers an estimated fivefold speedup in generation time over previous Midjourney versions, improving iteration speed for creative workflows.

  • Improved text rendering: V8.1 renders in-image text more reliably, with quoted strings in prompts used to specify the intended text — narrowing a long-standing weakness relative to text-specialized competitors.

  • Stronger prompt-following: The model adheres more closely to prompt instructions, improving controllability and reducing the prompt-engineering effort needed to achieve a target composition.

  • Restored image conditioning: Image prompts and image weights return in V8.1, alongside backward compatibility with V7 style references (srefs), moodboards, and personalization profiles.

  • Workflow tooling: V8.1 ships with a Prompt Shortener and an updated /describe command, and its aesthetic has been re-tuned "in the spirit of V7" to preserve the look users prefer.

  • Personalized aesthetic tuning: A human-preference (RLHF-style) aesthetic tuning loop combined with per-user personalization shapes outputs toward individually preferred visual styles.


3. Model Architecture & Technical Details

Midjourney V8.1 is a complete from-scratch rewrite of the company's image model. As part of the V8 program, Midjourney migrated from TPU-based infrastructure to a GPU-native PyTorch stack; David Holz has publicly stated that the original TPU choice "set research back a year." The underlying generative approach is understood to be latent diffusion, though Midjourney has not published a technical paper or model card, and the specific backbone, parameter count, and text encoder remain undisclosed.

Training details are not publicly documented. The dataset has never been disclosed and is currently contested in copyright litigation brought by Disney, NBCUniversal, and DreamWorks (filed June 2025, amended October 2025 to also target video generation). The defining training methodology is a human-preference aesthetic tuning loop (an RLHF-style process) layered with per-user personalization, which together steer the model toward high-aesthetic, user-aligned outputs rather than optimizing for a single fixed objective.

Because V8.1 is still in alpha, several capabilities present in V7 are not yet available, which is why V7 remains the documented default. The missing features include Omni Reference (--oref), Character Reference, the --no negative prompt, multi-prompts, Quality values, the Niji model, Draft Mode, and Turbo mode.

Regarding the midjourney/v8.1/image-to-video identifier: Midjourney's video capability is separately branded V1, launched June 18, 2025, and is image-to-video only (no text-to-video). It produces 5-second base clips at 24fps, extendable to roughly 21 seconds, with a 480p base resolution and 720p plus premium HD available on higher tiers. It offers Low/High Motion, Auto/Manual settings, and looping with end-frame control (added July 2025). No V8-native or "V8.1" video model has been confirmed, so a video endpoint tagged at "v8.1" likely reflects aggregator mislabeling.


4. Performance Highlights

Midjourney has not published quantitative benchmarks, ELO scores, or arena rankings for V8.1, and the absence of a public API limits the model's presence in third-party evaluation arenas. Performance is therefore best described qualitatively:

  • Speed and efficiency: Approximately 5× faster generation overall, with native 2K HD rendering at ~1.33 GPU-minutes and SD under 1 GPU-minute.
  • Resolution: Direct 2048px output with no separate upscaling pass.
  • Text fidelity: Materially improved in-image text rendering versus prior Midjourney versions.
  • Prompt adherence: Stronger instruction-following and controllability.
  • Aesthetics: Re-tuned to preserve the visual character of V7 while improving fidelity.

The table below summarizes the competitive landscape for context. No directly comparable arena scores are available across these systems.

CategoryModelDeveloperNotable Strength
Text-to-imageMidjourney V8.1MidjourneyAesthetics, native 2K HD, speed
Text-to-imageFlux 2Black Forest LabsPhotorealism, open weights
Text-to-imageImagen 4GoogleIn-image text
Text-to-imageIdeogram v3IdeogramIn-image text
Text-to-imageGPT Image / DALL·EOpenAIInstruction-following
Text-to-imageFirefly 3AdobeCommercial licensing
VideoSoraOpenAIText-to-video
VideoVeoGoogleHigh-fidelity video
VideoRunway / Kling / LumaVariousMotion control, length

As a rule of thumb, V8.1 is preferred for speed, HD resolution, and text rendering, while V7 remains the choice for full feature coverage.


5. Intended Use & Applications

  • Concept art & pre-production: Rapid generation of high-resolution concept imagery for games, film, and product design, accelerating early ideation with fast 2K output.

  • Marketing & social content: Production of on-brand visuals and social media assets at scale, leveraging improved text rendering for graphics that include words and short phrases.

  • Film storyboarding & previsualization: Creation of storyboard frames and previs imagery, optionally animated into short clips via Midjourney's separate V1 image-to-video pipeline.

  • Brand & graphic design: Exploration of visual identities, typography-inclusive layouts, and stylistic directions using image prompts, style references, and moodboards.

  • Personalized creative iteration: Per-user aesthetic personalization tailors outputs to an individual's preferred visual style, supporting consistent look-and-feel across a body of work.

For workflows requiring features not yet in V8.1 — such as Omni Reference, Character Reference, negative prompts, or the Niji model — the V7 default remains the recommended option.

Ontdek Vergelijkbare Modellen

Eén API voor alle media-AI.

Verken alle modellen

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.