Z.ai Models on AtlasCloud | GLM

Atlas Cloud hosts the full GLM series via the Z-AI API, from GLM-4.6 to GLM-5.1. All models are bilingual and available pay-as-you-go with a 202K context window.

Large Language Models by Z.ai

Power chat, reasoning, and agents at scale with leading large language models, served fast and affordably on Atlas Cloud.

View all models

Z.ai Models API Pricing Details

Compare standard vs. our pricing across every Z.ai model.

ModelStandard Price (USD)Our Price (USD)Discount
GLM 5.1
$1.4/$4.4per 1M tokens202.8K context
$1.26/$3.96M in/outper 1M tokens202.8K context
-10%View
GLM 5 Turbo
$1.2/$4per 1M tokens262.1K context
$1.2/$4M in/outper 1M tokens262.1K context
View
GLM 5
$1/$3.2per 1M tokens202.8K context
$0.95/$3.15M in/outper 1M tokens202.8K context
View
GLM 4.7
$0.6/$2.2per 1M tokens202.8K context
$0.52/$1.85M in/outper 1M tokens202.8K context
View
GLM 4.6
$0.6/$2.2per 1M tokens202.8K context
$0.6/$2.2M in/outper 1M tokens202.8K context
View

Explore models from other providers

Instantly explore and experiment with 300+ production-ready models in the Atlas Playground. Start customizing with one click.

Z-AI API Use Cases You Can Build on Atlas Cloud

GLM's model tiers cover everything from fast bilingual chat tasks to multi-hour autonomous coding agents. Teams use GLM-5.1 for long-horizon engineering work and GLM-4.7 or GLM-5 Turbo where cost efficiency and speed take priority.

Long-horizon Database Performance Optimization

Engineering teams use GLM-5.1 to run autonomous optimization agents that iterate on production systems over hundreds of rounds. In a documented run, GLM-5.1 improved a vector database through 600 iterations and 6,000 tool calls, reaching 21,500 queries per second — six times the result achievable in a single 50-turn session. Atlas Cloud's pay-as-you-go pricing makes it practical to run these extended sessions without pre-purchasing capacity.

Autonomous Repo-scale Code Refactoring

Development teams use GLM-5.1 to execute full codebase transformations over multi-hour sessions without human checkpoints. The model plans, writes, tests, and iterates on changes continuously for up to 8 hours, handling 655 iterations in a demonstrated Linux system build from scratch. This replaces weeks of manual refactoring work on large, legacy codebases.

IDE Coding Agent Integration

Developer tools teams integrate GLM-5.1 and GLM-5 Turbo as the underlying model for AI coding workflows in Claude Code, Kilo Code, Cline, Roo Code, and OpenCode. The Z-AI API on Atlas Cloud is OpenAI-compatible, so the base URL swap is the only change required to route any of these tools through GLM. GLM-5 Turbo's 262K context window makes it especially suited for large file context in IDE workflows.

Tier-1 Support Query Automation

Operations teams build support agents using GLM-5 that combine ticket database access, knowledge base search, and escalation tooling to handle repetitive queries without human intervention. The model's multi-tool calling and streaming support make it practical for real-time customer-facing deployments. Bilingual support means the same agent handles Chinese and English tickets from a single model endpoint on Atlas Cloud.

Bilingual Document Generation at Scale

Content and business teams use GLM-4.7 to generate Word documents, PowerPoint presentations, PDFs, and Excel reports in both Chinese and English from structured prompts. At $0.52 per million input tokens, it is the most cost-efficient GLM tier for high-volume document workflows that do not require frontier-level reasoning. The 202K context window is sufficient to hold full document outlines and source material in a single call.

ML Workload Kernel Optimization

AI infrastructure teams use GLM-5.1 to run benchmark-driven optimization pipelines on machine learning workloads. On KernelBench-style tasks, GLM-5.1 performs thousands of tool-driven optimization cycles and achieves a 3.6x geometric mean speedup. The 8-hour continuous execution capability means the agent runs the full optimization loop without requiring manual restarts between sessions.

Render your enterprise vision into reality with Atlas Cloud AI.

Contact Sales

Frequently Asked Questions about Z.ai Models

Z-AI (also written as Z.ai) is the developer behind the GLM series of large language models, also known as ZhipuAI. GLM stands for General Language Model, a family spanning from GLM-4.6 to the current flagship GLM-5.1. The series is built for coding, agentic workflows, and bilingual Chinese-English production use.

GLM-5.1 reached first place on SWE-Bench Pro with a score of 58.4 on April 7, 2026, outperforming GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). It also leads CyberGym at 68.7. This makes it the top-ranked open-source model for production coding as of Q2 2026.

Yes. GLM-5.1 supports continuous autonomous execution for up to 8 hours on a single task without human input. It handles the full loop of planning, execution, iterative optimization, and delivery. This is designed specifically for long-horizon coding agent workflows in environments like Claude Code and OpenClaw-compatible setups.

GLM-5 is the base foundation model built on a 744-billion parameter MoE architecture, trained on 28.5 trillion tokens, and reached #1 Elo on Chatbot Arena for open-source models. GLM-5.1 is a post-training upgrade of the same base with significantly stronger coding, tool use, and autonomous execution. GLM-5 is priced at $0.95 per million input tokens on Atlas Cloud; GLM-5.1 is $1.26 per million input tokens.

Yes. GLM-5.1 is released under an MIT license, which permits commercial use, fine-tuning, and redistribution without restriction. Open weights are available for self-hosted deployment. Atlas Cloud provides GLM-5.1 via API for teams that prefer managed access without infrastructure overhead.

GLM-4.6, GLM-4.7, GLM-5, and GLM-5.1 all support a 202,750-token context window on Atlas Cloud. GLM-5 Turbo is the exception with a larger 262,144-token context window and a 131,072-token maximum output length. GLM-5.1 is suited for generating long code files and extended execution traces within its context limit.

Yes. All GLM models are optimized for Chinese and English with equal proficiency in both languages. You can write prompts in either language and receive consistent quality output in return. This makes GLM practical for teams building products that serve both Chinese and international markets from a single model.

GLM-4.7 starts at $0.52 per million input tokens and is the most cost-efficient tier. GLM-4.6 is $0.60, GLM-5 is $0.95, and GLM-5 Turbo is $1.20 per million input tokens. GLM-5.1, the flagship, is $1.26 per million input tokens and $3.96 per million output tokens. All models are pay-as-you-go with no monthly commitment.

Explore More Families

Seedance 2.0 Models

Seedance 2.0(by Bytedance) is a multimodal video generation model that redefines "controllable creation," moving beyond the limitations of text or start/end frames. It supports quad-modal inputs—text, image, video, and audio—and introduces an industry-leading "Universal Reference" system. By precisely replicating the composition, camera movement, and character actions from reference assets, Seedance 2.0 solves critical issues with character consistency and physical coherence, empowering creators to act as true "directors" with deep control over their output.

View Family

Grok-Imagine Models

Grok Imagine Image Quality is xAI's latest AI image generation model, delivering studio-grade visuals with up to 2K resolution and razor-sharp detail. It offers best-in-class text rendering across multiple languages, photorealistic outputs with natural lighting, rich textures, and believable physics, plus tighter prompt following and image editing with reference inputs for precise creative control. Ideal for hero images, ad creatives, product renders, and brand-grade visuals.

View Family

Gemini Omni

Gemini Omni (by Google DeepMind) is a video generation and editing model launched on May 20, 2026 at Google I/O that redefines the standard for "reasoning-driven creation," built specifically to solve the core challenge of AI video: making output that actually understands what you mean, not just what you type. It fuses Gemini's reasoning engine with generative capability, accepting any mix of images, text, video, and audio to produce consistent, knowledge-grounded output. Unlike models that start from scratch each time, Omni lets you edit through natural conversation — swapping objects, rewriting scenes, shifting styles — while keeping physics, characters, and continuity intact across every turn.

View Family

GPT Image 2 Models

GPT Image 2 is a state-of-the-art multimodal foundation model engineered for exceptional text-to-image generation with unprecedented photorealism and creative versatility. Developed by OpenAI as the evolution of the DALL-E lineage, it transforms detailed natural language descriptions into hyper-realistic imagery at up to 4K resolution. With proprietary "Neural Rendering Engine" technology for precise visual control, GPT Image 2 delivers studio-quality results with accurate anatomy, lighting, and composition—making it the premier AI tool for professional creators, enterprises, and developers demanding production-ready visual assets.

View Family

Google Models on Atlas Cloud | Gemini, Nano Bananas & Veo

Google's most powerful creative models are all available on Atlas Cloud. Veo 3.1 delivers cinematic video generation, Nano Banana 2 powers high-fidelity image creation, and Gemini brings multimodal intelligence to every workflow. Access the full Google model suite through one API key with Day-0 availability and pay-as-you-go pricing.

View Family

ByteDance Models on Atlas Cloud | Seedance & Seedream

From cinematic video generation to high-fidelity image creation, ByteDance's most powerful models are live on Atlas Cloud. Run Seedance and Seedream at scale with the lowest inference pricing and zero infrastructure overhead.

View Family

Alibaba Models on Atlas Cloud | Wan & Qwen

Atlas Cloud brings together Alibaba's full model lineup under one API: Qwen for language and image tasks, Wan for video generation up to 1080p. Access every model pay-as-you-go with no subscriptions. The Alibaba API is available via a single base URL using your existing OpenAI-compatible client.

View Family

MAI Image 2.5 Models

MAI-Image-2.5 is Microsoft's latest photorealistic image generation and editing model family, built for commercial design, product photography, and brand-ready content creation. Available in standard and Flash variants for both text-to-image and image editing, it delivers best-in-class Arena ELO scores at competitive pricing — starting from $0.03 per image. With precise text rendering, surgical editing capability, and natural portrait generation, MAI-Image-2.5 is designed for teams that need production-quality visuals without post-processing overhead.

View Family

Wan2.7 Models

Launching this March, Wan2.7 is the latest powerhouse in the Qwen ecosystem, delivering a massive upgrade in visual fidelity, audio synchronization, and motion consistency over version 2.6. This all-in-one AI video generator supports advanced features like first-and-last frame control, 3x3 grid synthesis, and instruction-based video editing. Outperforming competitors like Jimeng, Wan2.7 offers superior flexibility with support for real-person image inputs, up to five video references, and 1080P high-definition outputs spanning 2 to 15 seconds, making it the premier choice for professional digital storytelling and high-end content marketing.

View Family

Nano Banana2 Models

Nano Banana 2 (by Google), is a generative image model that perfectly balances lightning-fast rendering with exceptional visual quality. With an improved price-performance ratio, it achieves breakthrough micro-detail depiction, accurate native text rendering, and complex physical structure reconstruction. It serves as a highly efficient, commercial-grade visual production tool for developers, marketing teams, and content creators.

View Family

Doubao Models

Doubao is ByteDance's family of large language models, engineered for production-grade reasoning, coding, and high-volume agentic workloads. Spanning flagship Seed 2.0 Pro, a dedicated Code Preview variant, cost-efficient Lite and Mini tiers, plus the proven Seed 1.8 and Seed 1.6 generations, the lineup gives developers a single, OpenAI-compatible interface to scale from frontier reasoning down to latency-sensitive, high-throughput tasks. Every Doubao model on Atlas Cloud ships with a 256K-token context window, streaming, and drop-in SDK compatibility — so you can match the right model to each job without rewriting your stack.

View Family

Hunyuan 3D Generation Models

Hunyuan3D is a state-of-the-art 3D generative foundation model from Tencent that turns text prompts and single images into high-quality, textured 3D meshes. Built on a two-stage pipeline—Hunyuan3D-DiT for shape generation via flow-matching diffusion and Hunyuan3D-Paint for multi-view texture synthesis—it produces clean geometry with full PBR materials ready for game engines, AR/VR, 3D printing, and DCC tools. Available in Pro (up to 1.5M faces, 4K PBR textures) and Rapid (2–3 minute lightweight generation) tiers, with both Text-to-3D and Image-to-3D entry points, Hunyuan3D is the premier AI 3D toolkit for game developers, e-commerce teams, and 3D content studios. Generations start at $0.02 each.

View Family

Recommended Articles

Guides, tutorials, and product updates to help you get the most out of Atlas Cloud.

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.