Local · Open source · MIT

The local LLM proxy with real failover.

One 11 MB binary sits between your IDE and every LLM provider you use — OpenAI, OpenRouter, Groq, Gemini, Azure, and even your ChatGPT Plus/Pro subscription. Stack multiple keys behind one model name. Secrets stay in your OS keyring.

Single 11 MB binary No Docker No Python MIT licensed

OperatorLM tray app and embedded admin UI demo showing multi-provider failover

Works out of the box with anything that speaks OpenAI — or already points at Ollama (same port 11434).

Cursor Continue Cline Zed Claude Code Aider Open WebUI Ollama (compatible)

Your LLM workflow is too fragile.

You hit a 429 on OpenAI and your IDE just… stops. You paste API keys into a config file. You pay for credits even though you're already paying for ChatGPT Plus. You spin up a Python service in Docker just to route between two providers.

It shouldn't be this hard.

Multi-account aliasing

Stack three OpenAI keys, two OpenRouter accounts, and a free Groq backup behind a single model name. OperatorLM walks the list until one succeeds.

Production-grade failover

Per-target 3-state circuit breaker. Retries with exponential backoff and jitter. Sliding-window RPM limiter. Different cooldowns for 429s, 5xx, and network errors.

Zero secrets on disk

Keys live in Windows Credential Manager, macOS Keychain, or the Linux Secret Service. The TOML config file holds only references — never the keys themselves.

One model name.
Many backends.

Most local proxies route one model to one upstream. OperatorLM lets one model name fan out across N keys and providers — in priority order, with per-target rate limits.

Your IDE just says model: "gpt-5.2". OperatorLM walks the keys until one succeeds. Hit a 429 on key #1? Circuit-broken for 15s — key #2 takes over instantly.

[[aliases]]
name = "gpt-5.2"
targets = [
  { provider = "openai", target_model = "gpt-5.2", priority = 1 },
  { provider = "openrouter", target_model = "openai/gpt-5.2", priority = 2 },
  { provider = "azure", target_model = "gpt-5.2", priority = 3 }
]

Failover that actually fails over.

Mechanism
What it does
Default
Circuit Breaker
Trips after consecutive failures to stop hammering a dead target. Recovers to half-open after a cooldown.
3 failures → 15s cooldown
Retry with Jitter
Automatically retries 5xx and network errors with exponential backoff and randomized jitter.
2 retries (100ms, 200ms)
RPM Limiter
Sliding-window rate limit to prevent hitting 429s on targets with known quotas.
Unlimited
Smart 429 Handling
Instant failover on 429 Too Many Requests without exhausting retries.
Enabled

All of these are tunable live from the admin UI — no restart needed.

Already paying for ChatGPT Plus? Use it as a backend.

Sign in once via OAuth. Get your Plus/Pro quota wired into the same OpenAI-compatible API your tools already speak.

Experimental Feature

The chatgpt-codex provider is unofficial and not endorsed by OpenAI. It reuses the public OAuth client ID from OpenAI's Codex CLI. OpenAI can rotate or revoke it at any time. Usage may violate OpenAI's Terms of Service. Only /v1/responses is supported. Use at your own risk.

Five steps. No magic.

01

Receive

Your IDE sends a standard OpenAI POST /v1/chat/completions request to 127.0.0.1:11434.

02

Resolve

OperatorLM looks up the requested model in your aliases and expands it into a prioritized list of target keys/providers.

03

Inject

For the current target, it retrieves the API key securely from your OS keyring and injects it.

04

Try-retry-break

It attempts the request. If it hits a 429, it instantly falls back to the next priority target. If it hits a 500, it retries.

05

Audit

The final response is streamed back to your IDE, and a redacted event is written to the local audit log.

OperatorLM request flow: IDE → local proxy → provider with circuit-breaker failover

How OperatorLM compares.

Feature OperatorLM LiteLLM OmniRoute
Language Go (native) Python TypeScript
Distribution Single binary (11MB) Docker / pip npm / Docker
Tray App / GUI Built-in Server only Server only
Key Storage OS Keyring .env / DB Environment
Multi-key aliasing First-class Supported Supported
Circuit Breaker Yes ⚠️ Basic No

Verdict: Pick OperatorLM if you want a desktop-first, single-binary proxy that handles failover and multi-account routing like a production service — without a Python service or sending your keys through someone else's cloud.

Install in 30 seconds.

Select your operating system to download v0.1.0.

Download Binary
Download OperatorLM-windows-amd64.exe. Double-click. Tray icon appears (no console flash). On first launch, SmartScreen may flag it — More info → Run anyway.

Browse to http://127.0.0.1:11434/admin/ → add a provider → paste an API key → send a test request from the Try It tab.

Use it from anything.

Drop-in replacement for OpenAI SDKs. Just change the base URL.

curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Security model.

Zero telemetry. Zero phone-home. Your keys stay on your machine.

  • Loopback-only

    Binds exclusively to 127.0.0.1. The admin UI is unreachable from the network unless you proxy it yourself.

  • Host-header validation

    Rejects DNS-rebinding attempts out of the box.

  • Optional local API key

    Gates /admin/* and /v1/* if you need local authorization.

  • OS Keyring integration

    API keys live in Windows Credential Manager, macOS Keychain, or Linux Secret Service via libsecret. The TOML config holds references, never the keys.

  • Redacted audit logs

    Every request lands in a JSONL audit log. Authorization headers are stripped by default.

Frequently asked.

Is this a fork of Ollama?

No. Ollama runs models locally. OperatorLM routes requests to remote providers. We happen to bind on the same port (11434) so anything pointed at Ollama works unchanged.

Do you see my API keys?

No. There is no server, no telemetry. Keys live in your OS keyring on your machine.

Does the ChatGPT Plus backend work for chat/completions?

No — only /v1/responses, and only Codex/GPT-5.x models. See the experimental disclaimer.

Can I run it on a server?

Yes — OPERATORLM_NO_TRAY=1 skips the tray and runs as a plain HTTP server. Reach it over WireGuard/Tailscale.

Why not just use LiteLLM?

LiteLLM is great if you want a Python service. OperatorLM is a single native binary, has a real tray app, stores keys in the OS keyring, and ships an embedded admin UI.

What providers are supported?

OpenAI, OpenRouter, Groq, Google Gemini, Azure OpenAI, ChatGPT Plus/Pro (experimental), plus a generic 'custom' type for any OpenAI-compatible endpoint.

How big is the binary?

About 11 MB. ~50 MB RAM at idle.

What's the license?

MIT.