Introduction
If you’ve been following along with our previous post on using Claude Code with local LLMs, you’ll know just how powerful it is to run AI-assisted coding entirely on your own machine. Now, with the release of GLM 5.2 from Zhipu AI, we have one of the most capable open-weight models available — and it’s a fantastic pairing with Claude Code.
In this guide, we’ll walk you through everything you need to know: what GLM 5.2 brings to the table, how to get it running locally, and how to wire it up to Claude Code so you can code smarter without sending a single line of your code to the cloud.
Why GLM 5.2?
GLM 5.2 is the latest in Zhipu AI’s General Language Model series, and it represents a significant leap forward for open-weight models. Here’s why developers are excited about it:
- Strong code understanding: GLM 5.2 has been trained with a heavy emphasis on code, making it highly capable for tasks like refactoring, debugging, and generating boilerplate.
- Long context window: With support for extended context lengths, it can handle large codebases and multi-file projects far more effectively than many alternatives.
- Instruction following: GLM 5.2 is highly responsive to system prompts and structured instructions — which is exactly what Claude Code relies on.
- Open weights: You can download and run it yourself, meaning full data privacy and zero per-token cost after setup.
- Multilingual capability: Particularly strong in both English and Chinese, useful for teams working across those languages.
For developers who want the power of a frontier-class coding assistant without the recurring API bills or data privacy concerns, GLM 5.2 is a compelling choice.
Prerequisites
Before we dive in, make sure you have the following ready:
- A machine with at least 16GB of RAM (32GB recommended for larger quantisations)
- A GPU with 8GB+ VRAM, or a powerful CPU for CPU-only inference
- Ollama installed (we’ll use this to serve GLM 5.2 locally)
- Claude Code installed via npm (
npm install -g @anthropic-ai/claude-code) - Node.js 18+ and a terminal you’re comfortable with
Step 1: Installing Ollama
Ollama is the easiest way to get local LLMs running on macOS, Linux, or Windows. If you don’t already have it, head to ollama.com and follow the installation instructions for your platform.
Once installed, verify it’s working:
ollama --version
You should see a version number printed. If not, consult the Ollama documentation for troubleshooting.
Step 2: Pulling GLM 5.2
With Ollama installed, pulling GLM 5.2 is a single command. Zhipu AI’s models are available directly through Ollama’s model library:
ollama pull glm4
Depending on the quantisation variant available and your internet speed, this may take a few minutes. Ollama will download the model weights and store them locally. Once downloaded, you can verify the model is available:
ollama list
You should see GLM listed among your local models.
Choosing the Right Quantisation
GLM 5.2 may be available in several quantised variants (Q4, Q5, Q8, etc.). As a rule of thumb:
- Q4_K_M — Best balance of speed and quality for most consumer hardware
- Q5_K_M — Slightly higher quality, slightly more VRAM needed
- Q8_0 — Near full-precision quality, requires significant VRAM (16GB+)
If you’re on a machine with 8–12GB VRAM, start with Q4_K_M. If you have 16GB or more, try Q5 or Q8 for noticeably better output.
Step 3: Starting the Ollama Server
Ollama runs as a local HTTP server, which is how Claude Code (and other tools) will communicate with it. Start it with:
ollama serve
By default, Ollama listens on http://localhost:11434. Keep this terminal window open — you’ll need the server running throughout your coding session.
You can test it’s working by running a quick query:
ollama run glm4 "Say hello in three words."
Step 4: Configuring Claude Code to Use a Local Model
Claude Code supports pointing to a custom base URL for its API calls, which means you can redirect it from Anthropic’s servers to your local Ollama instance. The key is that Ollama exposes an OpenAI-compatible API endpoint, and Claude Code can be configured to use this.
Setting Environment Variables
In your terminal (or your shell profile such as ~/.zshrc or ~/.bashrc), set the following:
export ANTHROPIC_BASE_URL=http://localhost:11434/v1
export ANTHROPIC_API_KEY=ollama
The ANTHROPIC_API_KEY can be set to any string when using a local model — Ollama doesn’t validate it, but Claude Code expects the variable to be present.
Specifying the Model
You’ll also want to tell Claude Code which model to use. You can do this via the --model flag when launching, or by setting it in your Claude Code configuration:
claude --model glm4
Or, if you want this to be your default, add it to your Claude Code config file (usually at ~/.claude/config.json):
{
"model": "glm4",
"baseUrl": "http://localhost:11434/v1"
}
Step 5: Your First Local Coding Session
With everything configured, launch Claude Code in your project directory:
cd ~/your-project
claude
You should see Claude Code start up and begin communicating with your local GLM 5.2 instance. Try a few tasks to verify it’s working correctly:
- “Explain what this file does” — point it at a complex file in your project
- “Write unit tests for this function” — test its code generation
- “Refactor this to use async/await” — check its understanding of modern patterns
You should find GLM 5.2 handles these tasks competently, with strong reasoning about your codebase.
Performance Tips and Optimisations
Adjust Context Length
GLM 5.2’s long context window is one of its strengths, but very long contexts slow inference. For most coding tasks, a context of 8k–16k tokens is more than sufficient. You can configure this in Ollama’s model settings.
Use a MODELFILE for Tuning
Ollama allows you to create a custom Modelfile to tweak GLM 5.2’s behaviour — adjusting the system prompt, temperature, and other parameters. For coding tasks, a lower temperature (around 0.1–0.3) generally produces more consistent, less hallucinated output:
FROM glm4
PARAMETER temperature 0.2
PARAMETER num_ctx 16384
SYSTEM """
You are an expert software engineer. When writing code, always follow best practices, write clean and readable code, and include comments where helpful.
"""
Save this as Modelfile and build it:
ollama create glm4-coder -f Modelfile
Then point Claude Code at glm4-coder instead.
GPU Acceleration
Ollama automatically detects and uses your GPU if available. If you’re on macOS with Apple Silicon, Metal acceleration is used by default. On Linux or Windows with an NVIDIA GPU, ensure your CUDA drivers are up to date for the best performance.
GLM 5.2 vs Other Local Models for Coding
How does GLM 5.2 stack up against other popular local coding models?
| Model | Code Quality | Speed (on consumer GPU) | Context Window | Best For |
|---|---|---|---|---|
| GLM 5.2 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 128k+ | Full-stack, multilingual teams |
| Qwen2.5-Coder | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 128k | Pure coding tasks |
| DeepSeek-Coder V2 | ⭐⭐⭐⭐ | ⭐⭐⭐ | 128k | Large project navigation |
| Llama 3.1 70B | ⭐⭐⭐⭐ | ⭐⭐ | 128k | General reasoning + code |
| Mistral Nemo | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 128k | Fast iteration, lighter hardware |
GLM 5.2 is particularly competitive on instruction-following and tool-use tasks, which matters a lot for Claude Code’s agentic workflows.
Limitations to Be Aware Of
Running a local model is powerful, but it’s not without trade-offs:
- Speed: Even with a good GPU, GLM 5.2 will be slower than Claude’s hosted API for very large tasks.
- Agentic reliability: Frontier hosted models like Claude Sonnet are still ahead on complex multi-step agentic tasks. Local models can occasionally lose track of context in very long sessions.
- Tool use: Some of Claude Code’s more advanced tool-use features may behave differently with a local model that wasn’t specifically trained for those interfaces.
- No internet access: Your local GLM instance can’t browse the web or access live documentation — it’s working purely from its training data.
For most day-to-day development tasks — writing functions, refactoring, generating tests, explaining code — these limitations are rarely a problem. For complex orchestrated agentic workflows, you may want to keep a hosted model as a fallback.
Privacy and Security Benefits
One of the biggest reasons to run local is privacy. When you use Claude Code with a local GLM 5.2 model:
- Your code never leaves your machine. Nothing is sent to Anthropic’s servers, Zhipu AI’s servers, or anywhere else.
- No API keys are exposed to third-party services during inference.
- Proprietary or sensitive codebases can be worked on freely, without compliance concerns about data egress.
- Air-gapped environments are possible — once the model is downloaded, you can work with no network connection at all.
For developers working in regulated industries, on client code with strict NDAs, or simply those who value their privacy, this is a compelling advantage over any hosted solution.
Troubleshooting Common Issues
Claude Code Can’t Connect to Ollama
Make sure ollama serve is running in a separate terminal. Double-check that your ANTHROPIC_BASE_URL is set to http://localhost:11434/v1 — note the /v1 at the end, which is required for the OpenAI-compatible endpoint.
Model Responses Feel Slow
Try a smaller quantisation (Q4 instead of Q8), reduce the context window in your Modelfile, or close other GPU-intensive applications to free up VRAM.
Unexpected or Repetitive Output
Lower the temperature in your Modelfile (try 0.1 or 0.2 for coding). Also try adding a stronger system prompt that instructs the model to be concise and focused.
Claude Code Crashes on Startup
Ensure you have the latest version of Claude Code installed (npm update -g @anthropic-ai/claude-code) and that your Node.js version is 18 or above (node --version).
Conclusion
GLM 5.2 is one of the most capable open-weight models available today, and pairing it with Claude Code gives you a genuinely impressive local AI coding assistant. You get the power of agentic, context-aware coding help with complete privacy, no ongoing costs, and no dependency on cloud services.
Whether you’re a developer working with sensitive code, someone keen to avoid cloud API bills, or just an enthusiast who likes running things locally, this setup is well worth the initial configuration effort. Once it’s running, it becomes a natural part of your development workflow.
If you found this guide helpful, check out our other posts on local LLM tooling, and drop a comment below with your experience running GLM 5.2 — I’d love to hear how it’s working for you.
Have questions or ran into a problem not covered here? Leave a comment below or reach out — I’m happy to help.

Leave a Reply
You must be logged in to post a comment.