Run Codex Locally with LM Studio

2025-11-18 · AI, Codex, LM Studio, Local LLM

Want to use Codex with maximum privacy and control? Running it against a local model is the answer. By connecting Codex to a model like Qwen 2.5 Coder hosted by LM Studio on your own machine, you gain several advantages:

Privacy: Your code and prompts never leave your computer.
Offline Capability: Code anywhere, even without an internet connection.
No Rate Limits: You’re not subject to API quotas or rate limits.
Cost-Effective: Avoid paying for API calls, especially during heavy development.
Customization: Easily experiment with different models and context settings.

This guide will walk you through the entire process, from installation to a fully functional local setup.

🤔 Problem

By default, Codex is configured to use cloud-based models. This is great for getting started quickly, but it’s not ideal for all scenarios. If you’re working with proprietary code or sensitive data, you need a solution that keeps everything on your local machine.

While powerful tools like LM Studio make it possible to run state-of-the-art coding models like qwen/qwen2.5-coder-14b locally, connecting them to Codex isn’t a one-click process. You need to:

Install and configure both Codex and LM Studio.
Download a multi-gigabyte model.
Manually configure context window settings.
Tell Codex to talk to your local server instead of a remote API.

This guide solves that by providing a clear, step-by-step path.

🛠️ Solution

Follow this four-step loop to move Codex fully on-device.

Prerequisites

Before starting, ensure your hardware can handle running a 14B parameter model locally:

RAM: At least 16GB of system RAM (32GB recommended).
GPU (Optional but Recommended): An NVIDIA GPU with 12GB+ VRAM (e.g., RTX 3060 12GB, 4070, or better) or an Apple Silicon Mac (M1/M2/M3 Pro or Max) for best performance.
Storage: ~15GB of free space for the model and software.

1. Install Codex CLI

First, get the Codex binary set up on your machine.

Download the latest Codex release for macOS, Linux, or Windows from your private portal.
Extract it into a folder on your PATH, for example ~/bin/codex.
Run codex --version to confirm the binary is executable.
Run codex auth login if your organization requires an initial handshake (even for local usage, some versions validate the license key).

2. Install LM Studio

LM Studio is the easiest way to run local LLMs with an OpenAI-compatible server.

Head to lmstudio.ai and download the installer for your OS.
Run the installer.
Launch LM Studio. It will ask you to choose a folder to store models.
- Tip: Models are large (often 10GB+). Choose a location on a fast SSD to ensure models load quickly.

3. Configure LM Studio with Qwen 2.5 Coder 14B

We’ll use Qwen 2.5 Coder 14B, widely considered the best open-weight coding model in its size class as of late 2025.

Search: Click the magnifying glass icon in LM Studio and type qwen 2.5 coder 14b.
Download: Look for the qwen/qwen2.5-coder-14b-instruct repository (usually on the right). Choose the Q4_K_M quantization level.
- Why Q4_K_M? It balances size (~9GB) and performance perfectly. Higher quantizations (Q6, Q8) offer diminishing returns for coding tasks while consuming significantly more RAM.
Load: Navigate to the Local Server tab (the <-> icon).
Select Model: Use the dropdown at the very top to select the Qwen model you just downloaded.
Set Context: In the right-hand sidebar, find Context Length. Set this to 8192 or 16384 if your hardware supports it. Coding tasks require seeing multiple files, so a larger context window is critical.
Start Server: Click the green Start Server button.
- Verify it’s running by checking the logs for: Server listening on http://localhost:1234/v1.

4. Point Codex at LM Studio

Now, we update Codex’s configuration to ignore the cloud and talk to localhost.

Locate your configuration file:
- macOS/Linux: ~/.codex/config.toml
- Windows: %USERPROFILE%\.config\codex\config.toml
Open the file in your favorite editor and add the following configuration:

# Set the default model and provider
model = "qwen/qwen2.5-coder-14b"
model_provider = "lmstudio"

# Define the LM Studio provider
[model_providers.lmstudio]
name = "LM Studio"
# The default LM Studio server address
base_url = "http://localhost:1234/v1"
# Tell Codex this endpoint behaves like OpenAI's chat API
wire_api = "chat"
# Retries help if the local model is busy processing a previous token
request_max_retries = 4
stream_max_retries = 10
# Important: Local generation can be slower than cloud.
# 5 minutes (300,000ms) prevents timeouts on long code blocks.
stream_idle_timeout_ms = 300000

Save the file. Codex is now re-wired.

🧪 Example

Let’s verify everything is working.

Check LM Studio: Ensure the server is running in the LM Studio window.

Run a Command: Open your terminal and ask Codex to write some code:

codex "Write a Python function to parse a CSV file and return a dictionary of records."

Observe:
- Terminal: You should see code streaming into your terminal.
- LM Studio: Watch the “Server Logs” area. You will see a POST /v1/chat/completions request, followed by streaming token generation.

If the response is coherent and pertains to CSV parsing, congratulations! You are running on local silicon.

❌ Troubleshooting

“Connection Refused” Error

Is LM Studio running?
Did you click “Start Server”?
Is the port 1234 correct? Check the port in LM Studio’s server tab.

Model Hallucinations / Gibberish

Ensure you downloaded the Instruct version of Qwen 2.5 Coder, not the Base version. The Base version completes text but doesn’t follow instructions well.

Slow Generation

If token generation is extremely slow (< 5 tokens/sec), your model might be offloading to system RAM instead of GPU VRAM. Try a smaller model (like Qwen 2.5 Coder 7B) or a lower quantization (Q3_K_S).

🚀 Take it further

Your local setup is ready, but you can optimize it for a true power-user workflow:

Enable GPU Offload: In LM Studio, ensure the “GPU Offload” slider is set to Max. This moves the entire model to VRAM for significantly faster generation.
Local Tools: If your Codex version supports tool use, enable “unsafe” mode to allow the local model to read files, run shell commands, and edit code directly. Since it’s local, the security risk is contained to your user environment.
Multi-Model Architecture: Run two instances of LM Studio on different ports (e.g., 1234 and 1235). Configure Codex to use a small model (7B) for quick chat and a large model (32B) for complex architecture planning.
Backup: Backup your ~/.config/codex folder. It’s the brain of your operation now.

✅ Done

That’s it! You’ve successfully decoupled Codex from the cloud and are now running a powerful, private AI development environment right on your desktop. Enjoy the privacy, control, and freedom of local-first AI.