
Atlas Cloud hosts the full GLM series via the Z-AI API, from GLM-4.6 to GLM-5.1. All models are bilingual and available pay-as-you-go with a 202K context window.
Power chat, reasoning, and agents at scale with leading large language models, served fast and affordably on Atlas Cloud.
Compare standard vs. our pricing across every Z.ai model.
| Model | Standard Price (USD) | Our Price (USD) | Discount | |
|---|---|---|---|---|
| GLM 5.1 | $1.4/$4.4per 1M tokens202.8K context | $1.26/$3.96M in/outper 1M tokens202.8K context | -10% | View |
| GLM 5 Turbo | $1.2/$4per 1M tokens262.1K context | $1.2/$4M in/outper 1M tokens262.1K context | — | View |
| GLM 5 | $1/$3.2per 1M tokens202.8K context | $0.95/$3.15M in/outper 1M tokens202.8K context | — | View |
| GLM 4.7 | $0.6/$2.2per 1M tokens202.8K context | $0.52/$1.85M in/outper 1M tokens202.8K context | — | View |
| GLM 4.6 | $0.6/$2.2per 1M tokens202.8K context | $0.6/$2.2M in/outper 1M tokens202.8K context | — | View |
Instantly explore and experiment with 300+ production-ready models in the Atlas Playground. Start customizing with one click.
GLM's model tiers cover everything from fast bilingual chat tasks to multi-hour autonomous coding agents. Teams use GLM-5.1 for long-horizon engineering work and GLM-4.7 or GLM-5 Turbo where cost efficiency and speed take priority.
Engineering teams use GLM-5.1 to run autonomous optimization agents that iterate on production systems over hundreds of rounds. In a documented run, GLM-5.1 improved a vector database through 600 iterations and 6,000 tool calls, reaching 21,500 queries per second — six times the result achievable in a single 50-turn session. Atlas Cloud's pay-as-you-go pricing makes it practical to run these extended sessions without pre-purchasing capacity.
Development teams use GLM-5.1 to execute full codebase transformations over multi-hour sessions without human checkpoints. The model plans, writes, tests, and iterates on changes continuously for up to 8 hours, handling 655 iterations in a demonstrated Linux system build from scratch. This replaces weeks of manual refactoring work on large, legacy codebases.
Developer tools teams integrate GLM-5.1 and GLM-5 Turbo as the underlying model for AI coding workflows in Claude Code, Kilo Code, Cline, Roo Code, and OpenCode. The Z-AI API on Atlas Cloud is OpenAI-compatible, so the base URL swap is the only change required to route any of these tools through GLM. GLM-5 Turbo's 262K context window makes it especially suited for large file context in IDE workflows.
Operations teams build support agents using GLM-5 that combine ticket database access, knowledge base search, and escalation tooling to handle repetitive queries without human intervention. The model's multi-tool calling and streaming support make it practical for real-time customer-facing deployments. Bilingual support means the same agent handles Chinese and English tickets from a single model endpoint on Atlas Cloud.
Content and business teams use GLM-4.7 to generate Word documents, PowerPoint presentations, PDFs, and Excel reports in both Chinese and English from structured prompts. At $0.52 per million input tokens, it is the most cost-efficient GLM tier for high-volume document workflows that do not require frontier-level reasoning. The 202K context window is sufficient to hold full document outlines and source material in a single call.
AI infrastructure teams use GLM-5.1 to run benchmark-driven optimization pipelines on machine learning workloads. On KernelBench-style tasks, GLM-5.1 performs thousands of tool-driven optimization cycles and achieves a 3.6x geometric mean speedup. The 8-hour continuous execution capability means the agent runs the full optimization loop without requiring manual restarts between sessions.
Z-AI (also written as Z.ai) is the developer behind the GLM series of large language models, also known as ZhipuAI. GLM stands for General Language Model, a family spanning from GLM-4.6 to the current flagship GLM-5.1. The series is built for coding, agentic workflows, and bilingual Chinese-English production use.
GLM-5.1 reached first place on SWE-Bench Pro with a score of 58.4 on April 7, 2026, outperforming GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). It also leads CyberGym at 68.7. This makes it the top-ranked open-source model for production coding as of Q2 2026.
Yes. GLM-5.1 supports continuous autonomous execution for up to 8 hours on a single task without human input. It handles the full loop of planning, execution, iterative optimization, and delivery. This is designed specifically for long-horizon coding agent workflows in environments like Claude Code and OpenClaw-compatible setups.
GLM-5 is the base foundation model built on a 744-billion parameter MoE architecture, trained on 28.5 trillion tokens, and reached #1 Elo on Chatbot Arena for open-source models. GLM-5.1 is a post-training upgrade of the same base with significantly stronger coding, tool use, and autonomous execution. GLM-5 is priced at $0.95 per million input tokens on Atlas Cloud; GLM-5.1 is $1.26 per million input tokens.
Yes. GLM-5.1 is released under an MIT license, which permits commercial use, fine-tuning, and redistribution without restriction. Open weights are available for self-hosted deployment. Atlas Cloud provides GLM-5.1 via API for teams that prefer managed access without infrastructure overhead.
GLM-4.6, GLM-4.7, GLM-5, and GLM-5.1 all support a 202,750-token context window on Atlas Cloud. GLM-5 Turbo is the exception with a larger 262,144-token context window and a 131,072-token maximum output length. GLM-5.1 is suited for generating long code files and extended execution traces within its context limit.
Yes. All GLM models are optimized for Chinese and English with equal proficiency in both languages. You can write prompts in either language and receive consistent quality output in return. This makes GLM practical for teams building products that serve both Chinese and international markets from a single model.
GLM-4.7 starts at $0.52 per million input tokens and is the most cost-efficient tier. GLM-4.6 is $0.60, GLM-5 is $0.95, and GLM-5 Turbo is $1.20 per million input tokens. GLM-5.1, the flagship, is $1.26 per million input tokens and $3.96 per million output tokens. All models are pay-as-you-go with no monthly commitment.
Guides, tutorials, and product updates to help you get the most out of Atlas Cloud.
Join the Discord community for the latest model updates, prompts, and support.