Local · Open source · MIT

The local LLM proxy with real failover.

One 11 MB binary sits between your IDE and every LLM provider you use — OpenAI, OpenRouter, Groq, Gemini, Azure, and even your ChatGPT Plus/Pro subscription. Stack multiple keys behind one model name. Secrets stay in your OS keyring.

Download for OS View on GitHub

Single 11 MB binary • No Docker • No Python • MIT licensed

OperatorLM tray app and embedded admin UI demo showing multi-provider failover

Works out of the box with anything that speaks OpenAI — or already points at Ollama (same port 11434).

Cursor Continue Cline Zed Claude Code Aider Open WebUI Ollama (compatible)

Your LLM workflow is too fragile.

You hit a 429 on OpenAI and your IDE just… stops. You paste API keys into a config file. You pay for credits even though you're already paying for ChatGPT Plus. You spin up a Python service in Docker just to route between two providers.

It shouldn't be this hard.

Multi-account aliasing

Stack three OpenAI keys, two OpenRouter accounts, and a free Groq backup behind a single model name. OperatorLM walks the list until one succeeds.

Production-grade failover

Per-target 3-state circuit breaker. Retries with exponential backoff and jitter. Sliding-window RPM limiter. Different cooldowns for 429s, 5xx, and network errors.

Zero secrets on disk

Keys live in Windows Credential Manager, macOS Keychain, or the Linux Secret Service. The TOML config file holds only references — never the keys themselves.

One model name.
Many backends.

Most local proxies route one model to one upstream. OperatorLM lets one model name fan out across N keys and providers — in priority order, with per-target rate limits.

Your IDE just says model: "gpt-5.2". OperatorLM walks the keys until one succeeds. Hit a 429 on key #1? Circuit-broken for 15s — key #2 takes over instantly.

[[aliases]]
name = "gpt-5.2"
targets = [
  { provider = "openai", target_model = "gpt-5.2", priority = 1 },
  { provider = "openrouter", target_model = "openai/gpt-5.2", priority = 2 },
  { provider = "azure", target_model = "gpt-5.2", priority = 3 }
]

Failover that actually fails over.

Mechanism

What it does

Default

Circuit Breaker

Trips after consecutive failures to stop hammering a dead target. Recovers to half-open after a cooldown.

3 failures → 15s cooldown

Retry with Jitter

Automatically retries 5xx and network errors with exponential backoff and randomized jitter.

2 retries (100ms, 200ms)

RPM Limiter

Sliding-window rate limit to prevent hitting 429s on targets with known quotas.

Unlimited

Smart 429 Handling

Instant failover on 429 Too Many Requests without exhausting retries.

Enabled

All of these are tunable live from the admin UI — no restart needed.

Already paying for ChatGPT Plus? Use it as a backend.

Sign in once via OAuth. Get your Plus/Pro quota wired into the same OpenAI-compatible API your tools already speak.

Experimental Feature

The chatgpt-codex provider is unofficial and not endorsed by OpenAI. It reuses the public OAuth client ID from OpenAI's Codex CLI. OpenAI can rotate or revoke it at any time. Usage may violate OpenAI's Terms of Service. Only /v1/responses is supported. Use at your own risk.

Five steps. No magic.

01

Receive

Your IDE sends a standard OpenAI POST /v1/chat/completions request to 127.0.0.1:11434.

02

Resolve

OperatorLM looks up the requested model in your aliases and expands it into a prioritized list of target keys/providers.

03

Inject

For the current target, it retrieves the API key securely from your OS keyring and injects it.

04

Try-retry-break

It attempts the request. If it hits a 429, it instantly falls back to the next priority target. If it hits a 500, it retries.

05

Audit

The final response is streamed back to your IDE, and a redacted event is written to the local audit log.

OperatorLM request flow: IDE → local proxy → provider with circuit-breaker failover

How OperatorLM compares.

Feature	OperatorLM	LiteLLM	OmniRoute
Language	Go (native)	Python	TypeScript
Distribution	Single binary (11MB)	Docker / pip	npm / Docker
Tray App / GUI	✅ Built-in	❌ Server only	❌ Server only
Key Storage	OS Keyring	.env / DB	Environment
Multi-key aliasing	✅ First-class	✅ Supported	✅ Supported
Circuit Breaker	✅ Yes	⚠️ Basic	❌ No

Verdict: Pick OperatorLM if you want a desktop-first, single-binary proxy that handles failover and multi-account routing like a production service — without a Python service or sending your keys through someone else's cloud.

Install in 30 seconds.

Select your operating system to download v0.1.0.

Download Binary

Download OperatorLM-windows-amd64.exe. Double-click. Tray icon appears (no console flash). On first launch, SmartScreen may flag it — More info → Run anyway.

Download Binary

Download OperatorLM-darwin-arm64, chmod +x, run. On first launch, right-click → Open in Finder.

Download Binary

Download OperatorLM-darwin-amd64, chmod +x, run. On first launch, right-click → Open in Finder.

Download Binary

Download OperatorLM-linux-amd64, chmod +x, run. Tray icon appears.

Download Binary

Run with OPERATORLM_NO_TRAY=1 ./OperatorLM-linux-amd64. Controlled via SIGINT/SIGTERM.

✓ Browse to http://127.0.0.1:11434/admin/ → add a provider → paste an API key → send a test request from the Try It tab.

Use it from anything.

Drop-in replacement for OpenAI SDKs. Just change the base URL.

curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:11434/v1",
    api_key="not-needed" 
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello!"}]
)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://127.0.0.1:11434/v1',
  apiKey: 'not-needed'
});

const response = await openai.chat.completions.create({
  model: 'gpt-5.2',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Security model.

Zero telemetry. Zero phone-home. Your keys stay on your machine.

Loopback-only

Binds exclusively to 127.0.0.1. The admin UI is unreachable from the network unless you proxy it yourself.
Host-header validation

Rejects DNS-rebinding attempts out of the box.
Optional local API key

Gates /admin/* and /v1/* if you need local authorization.
OS Keyring integration

API keys live in Windows Credential Manager, macOS Keychain, or Linux Secret Service via libsecret. The TOML config holds references, never the keys.
Redacted audit logs

Every request lands in a JSONL audit log. Authorization headers are stripped by default.

Frequently asked.

Is this a fork of Ollama?

No. Ollama runs models locally. OperatorLM routes requests to remote providers. We happen to bind on the same port (11434) so anything pointed at Ollama works unchanged.

Do you see my API keys?

No. There is no server, no telemetry. Keys live in your OS keyring on your machine.

Does the ChatGPT Plus backend work for chat/completions?

No — only /v1/responses, and only Codex/GPT-5.x models. See the experimental disclaimer.

Can I run it on a server?

Yes — OPERATORLM_NO_TRAY=1 skips the tray and runs as a plain HTTP server. Reach it over WireGuard/Tailscale.

Why not just use LiteLLM?

LiteLLM is great if you want a Python service. OperatorLM is a single native binary, has a real tray app, stores keys in the OS keyring, and ships an embedded admin UI.

What providers are supported?

OpenAI, OpenRouter, Groq, Google Gemini, Azure OpenAI, ChatGPT Plus/Pro (experimental), plus a generic 'custom' type for any OpenAI-compatible endpoint.

How big is the binary?

About 11 MB. ~50 MB RAM at idle.

What's the license?

MIT.

The local LLM proxy with real failover.

Your LLM workflow is too fragile.

Multi-account aliasing

Production-grade failover

Zero secrets on disk

One model name.Many backends.

Failover that actually fails over.

Already paying for ChatGPT Plus? Use it as a backend.

Experimental Feature

Five steps. No magic.

Receive

Resolve

Inject

Try-retry-break

Audit

How OperatorLM compares.

Install in 30 seconds.

Use it from anything.

Security model.

Loopback-only

Host-header validation

Optional local API key

OS Keyring integration

Redacted audit logs

Frequently asked.

One model name.
Many backends.