Z.ai API for Top Open-Source GLM Coding

The Z.ai API brings ZhipuAI's full GLM series to your stack, from GLM-4.6 to the flagship GLM-5.1, which ranks first among open-source models on SWE-Bench Pro and runs autonomous coding agents for hours at a time. GLM pairs a 202K token context with balanced Chinese and English output under a permissive MIT license. Atlas Cloud serves each model through one OpenAI-compatible key with Day-0 access and transparent per-call pricing. Start today.

Large Language Models by Z.ai

Power chat, reasoning, and agents at scale with leading large language models, served fast and affordably on Atlas Cloud.

View all models

LLM

GLM

GLM is Z.ai's flagship LLM series from Zhipu AI, and the GLM API spans everything from the agentic GLM-5 to the efficient 357B MoE GLM-4.6. These models specialize in autonomous task execution, complex agent orchestration, and production-grade programming. On Atlas Cloud, a single unified endpoint gives you Day-0 access to the entire GLM family with usage-based pricing and dependable production uptime. Start building today.

6 modelsExplore GLM

Z.ai Models API Pricing Details

Compare standard vs. our pricing across every Z.ai model.

Model	Standard Price (USD)	Our Price (USD)	Discount
GLM 5.2	$1.4/$4.4per 1M tokens1048.6K context	$1.26/$3.96M in/outper 1M tokens1048.6K context	-10%	View
GLM 5.1	$1.4/$4.4per 1M tokens202.8K context	$1.26/$3.96M in/outper 1M tokens202.8K context	-10%	View
GLM 5v Turbo	$1.2/$4per 1M tokens202.8K context	$1.2/$4M in/outper 1M tokens202.8K context	—	View
GLM 5	$1/$3.2per 1M tokens202.8K context	$0.95/$3.15M in/outper 1M tokens202.8K context	—	View
GLM 4.7	$0.6/$2.2per 1M tokens202.8K context	$0.52/$1.85M in/outper 1M tokens202.8K context	—	View
GLM 4.6	$0.6/$2.2per 1M tokens202.8K context	$0.6/$2.2M in/outper 1M tokens202.8K context	—	View

Explore models from other providers

Instantly explore and experiment with 400+ production-ready models in the Atlas Playground. Start customizing with one click.

xAI

Z-AI API Use Cases You Can Build on Atlas Cloud

GLM's model tiers cover everything from fast bilingual chat tasks to multi-hour autonomous coding agents. Teams use GLM-5.1 for long-horizon engineering work and GLM-4.7 or GLM-5 Turbo where cost efficiency and speed take priority.

Long-horizon Database Performance Optimization

Engineering teams use GLM-5.1 to run autonomous optimization agents that iterate on production systems over hundreds of rounds. In a documented run, GLM-5.1 improved a vector database through 600 iterations and 6,000 tool calls, reaching 21,500 queries per second — six times the result achievable in a single 50-turn session. Atlas Cloud's pay-as-you-go pricing makes it practical to run these extended sessions without pre-purchasing capacity.

Autonomous Repo-scale Code Refactoring

Development teams use GLM-5.1 to execute full codebase transformations over multi-hour sessions without human checkpoints. The model plans, writes, tests, and iterates on changes continuously for up to 8 hours, handling 655 iterations in a demonstrated Linux system build from scratch. This replaces weeks of manual refactoring work on large, legacy codebases.

IDE Coding Agent Integration

Developer tools teams integrate GLM-5.1 and GLM-5 Turbo as the underlying model for AI coding workflows in Claude Code, Kilo Code, Cline, Roo Code, and OpenCode. The Z-AI API on Atlas Cloud is OpenAI-compatible, so the base URL swap is the only change required to route any of these tools through GLM. GLM-5 Turbo's 262K context window makes it especially suited for large file context in IDE workflows.

Tier-1 Support Query Automation

Operations teams build support agents using GLM-5 that combine ticket database access, knowledge base search, and escalation tooling to handle repetitive queries without human intervention. The model's multi-tool calling and streaming support make it practical for real-time customer-facing deployments. Bilingual support means the same agent handles Chinese and English tickets from a single model endpoint on Atlas Cloud.

Bilingual Document Generation at Scale

Content and business teams use GLM-4.7 to generate Word documents, PowerPoint presentations, PDFs, and Excel reports in both Chinese and English from structured prompts. At $0.52 per million input tokens, it is the most cost-efficient GLM tier for high-volume document workflows that do not require frontier-level reasoning. The 202K context window is sufficient to hold full document outlines and source material in a single call.

ML Workload Kernel Optimization

AI infrastructure teams use GLM-5.1 to run benchmark-driven optimization pipelines on machine learning workloads. On KernelBench-style tasks, GLM-5.1 performs thousands of tool-driven optimization cycles and achieves a 3.6x geometric mean speedup. The 8-hour continuous execution capability means the agent runs the full optimization loop without requiring manual restarts between sessions.

Render your enterprise vision into reality with Atlas Cloud AI.

Contact Sales

What Developers Ask About the Z.ai API

The Z.ai API gives developers programmatic access to the GLM series of large language models built by Z.ai, the company also known as Zhipu AI. GLM stands for General Language Model and spans releases from GLM-4.6 to the GLM-5.1 flagship, tuned for coding, agentic workflows, and bilingual Chinese and English production use. On Atlas Cloud you reach the full lineup through one OpenAI-compatible endpoint.

Atlas Cloud hosts the GLM series from GLM-4.6 up to the GLM-5.1 flagship, with GLM-4.7 and GLM-5 in between. Lighter tiers handle high-volume everyday tasks at lower cost, while GLM-5.1 targets the most demanding coding and agentic work. Every model runs pay-as-you-go through the same key.

Yes. GLM open weights, including GLM-5.1, are released under the MIT license, which permits commercial use, fine-tuning, and redistribution without restriction. If you would rather skip infrastructure overhead, Atlas Cloud serves the same models by API for managed access instead of self-hosting.

Point your existing OpenAI SDK at the Atlas Cloud base URL, set your key, and pass the GLM model name you want. Because the Z.ai API is OpenAI-compatible, most projects migrate by changing only the base URL and model string, and the models plug directly into agent tools such as Claude Code, Cline, and Roo Code. Start building today.

Both Chinese and English are first-class for GLM, which is trained for strong proficiency in each. Prompt in either language and you get consistent quality back, which makes the lineup practical for teams serving Chinese and international users from a single model rather than maintaining separate stacks.

GLM-4.6 through GLM-5.1 support a 200K token context window, enough to hold large codebases, long documents, or extended agent traces in a single request. Should your workflow produce long outputs, the same window covers big code files and multi-step execution logs without early truncation.

GLM-5.1 topped SWE-Bench Pro with a score of 58.4 in April 2026, placing it among the strongest open-source models for real-world coding. It also supports continuous autonomous execution for up to eight hours on a single task, running planning, iteration, and delivery in one loop, which suits long-horizon agent workflows in environments like Claude Code.

Every GLM model on the Z.ai API runs on transparent pay-as-you-go pricing, billed per token with no subscription or monthly commitment. Input and output tokens are metered separately, and lighter tiers such as GLM-4.7 cost less per token than the GLM-5.1 flagship, so you can match model choice to budget. Check the current per-token rate on each model card in Atlas Cloud.

Explore More Families

Seedance 2.0

The Seedance 2.0 API gives you production access to ByteDance's multimodal video model — quad-modal inputs (text, image, video, audio) and an industry-leading "Universal Reference" system that locks composition, camera movement, and character actions across shots. Integrate director-level control with one API call, a flat $0.09/s, instant key, and no waitlist — backed by enterprise-grade uptime and compliance. Seedance 2.0 Native 4K is now live!

View Family

GPT Image 2

The GPT Image 2 API gives developers access to OpenAI's latest image model, the successor to GPT Image 1.5. It generates and edits images with accurate text rendering across Latin and CJK scripts, plus strong composition for posters, mockups, and infographics. On Atlas Cloud you reach it through one unified API alongside 300+ models, with free credits, 99.99% uptime, and no OpenAI organization verification required.

View Family

Seedream 5.0 Pro

Seedream 5.0 Pro API gives developers ByteDance's controllable image editing model on Atlas Cloud. It places edits precisely with anchors and coordinates, separates images into editable layers, fuses multiple references, and matches exact colors and materials, with multilingual text at 2K and 3K. On Atlas Cloud you reach it through one key!

View Family

Gemini Omni Flash

The Gemini Omni API brings Google DeepMind's multimodal video generation and editing model, introduced at Google I/O 2026, to your stack. Gemini Omni fuses Gemini's reasoning engine with generative media, accepting any mix of text, images, video, and audio to produce consistent, knowledge-grounded output. Refine results through natural conversation, swapping objects, rewriting scenes, and shifting styles while physics, characters, and continuity stay intact. Atlas Cloud serves the full Gemini Omni Flash lineup, text-to-video, image-to-video with up to 7 reference images, and reference-to-video, through one unified API with transparent per-second pricing from $0.112 and no subscription. Start building today.

View Family

Grok Imagine

The Grok Imagine API gives developers xAI's image, video, and audio generation in one suite. It produces up to 2K images with multilingual text rendering, plus video up to 15 seconds with native, synchronized audio and reference-based editing. On Atlas Cloud one key runs every Grok Imagine mode, so you move between image, video, and audio without separate setups, from $0.02 per image and $0.05 per second.

View Family

Google

Google's most powerful creative models are all available on Atlas Cloud. Veo 3.1 delivers cinematic video generation, Nano Banana 2 powers high-fidelity image creation, and Gemini brings multimodal intelligence to every workflow. Access the full Google model suite through one API key with Day-0 availability and pay-as-you-go pricing.

View Family

Seedance 2.0 Mini

The Seedance 2.0 Mini API is the lightest, lowest-cost tier of ByteDance's Seedance video line, built for teams where throughput and unit cost matter more than maximum polish. Use it for batch generation, rapid prototyping, and draft passes, all through one OpenAI-compatible key on Atlas Cloud.

View Family

ByteDance

From cinematic video generation to high-fidelity image creation, ByteDance's most powerful models are live on Atlas Cloud. Run Seedance and Seedream at scale with the lowest inference pricing and zero infrastructure overhead.

View Family

Alibaba

Atlas Cloud brings together Alibaba's full model lineup under one API: Qwen for language and image tasks, Wan for video generation up to 1080p. Access every model pay-as-you-go with no subscriptions. The Alibaba API is available via a single base URL using your existing OpenAI-compatible client.

View Family

OpenAI

Atlas Cloud gives you access to the full OpenAI API lineup, from GPT Image 2 for image generation to Sora 2 for video. Every model is available pay-as-you-go with no monthly commitment. Plug in with a single base URL swap using the OpenAI-compatible API.

View Family

xAI

Build complete image and video pipelines using the xAI API on Atlas Cloud. Generate at 2K, edit with reference images, and animate images into audio-synced clips.

View Family

Kwaivgi

The Kwaivgi API at 15% off standard rates. Day-0 access to every new Kling release, pay-as-you-go, no seat limits. One account covers the full Kling lineup.

View Family