The one-click
hybrid LLM solution
Automatically route your LLM coding tasks between frontier and local models while working directly in the coding tool you already use. Cut costs, keep data private, and build faster.
Free · v0.1.9 · Requires Apple Silicon · 18 GB+ RAM
The problem
AI coding costs are spiraling
Token bills are climbing and usage caps are tightening as teams burn frontier-model credits on routine work. A growing consensus points to the fix: route everyday tasks to capable local models and save the cloud for the hard problems, cutting costs without giving up quality.
Read moreGetting started
Up and running in minutes
Drag to Applications
One DMG, everything bundled for your Mac. No Homebrew, no Python, no dependencies to manage.
Download your models
Pick the local models you want. Glass Slipper downloads and tunes them for your Apple Silicon hardware, with no configuration required.
Connect and go
Point your coding tool's MCP at Glass Slipper. The router classifies every task and delegates automatically in the background.
Live demo
See it in action
The router classifies each task and sends the cheap ones to a local model automatically, in the background. You keep working as usual.
Real output from Claude Code with Glass Slipper installed
Features
The best of cloud and local, automatically
Frontier models handle the hard thinking. Local models handle the rest. Glass Slipper routes between them so you don't have to.
Intelligent task routing
The router classifies each incoming task and decides which model should handle it: frontier models for hard thinking, local models for the grunt work. It learns how to delegate better over time.
Tokenmax without the limits
Stop burning credits and hitting usage caps. Local models take on most of your tasks at the same quality, keeping your costs down while you keep shipping.
Stay in your tool
Glass Slipper works in the background with Claude, Codex, or Cursor. There's no new interface to learn, so you keep working exactly where you already are.
Privacy when you need it
Some data should never leave your machine. Tell your tool to route sensitive work through Glass Slipper and it stays on device.
Zero configuration
A single DMG with everything bundled: a Rust MCP server, a tuned harness, and a vendored llama-server for Apple Silicon. No tinkering with temperature, quantization, or top_k.
Fast on-device inference
Built and tuned for Apple Silicon, local inference runs fast, speeding up routine tasks instead of waiting on a round trip to the cloud.
Initial public release
- +Intelligent router that classifies tasks and delegates between cloud and local models automatically
- +MCP server with built-in local_summarize, local_explain, and local_review tools
- +One-click model download and llama-server management from the menu bar
- +Local inference tuned for Apple Silicon, so there's no tinkering with temperature, quantization, or top_k
- +No telemetry, no auto-update, fully self-contained bundle (no Homebrew or Python required)
Further reading
Resources
The industry is waking up to runaway inference costs and the case for routing. Here's what we're reading.

The token bill comes due: inside the industry scramble to manage AI's runaway costs
Read article
Can tech companies learn to love cheaper AI models?
Read article
Model routing is a fix for AI overspending. That's a problem for OpenAI and Anthropic
Read articleStay in the loop
Monthly release notes. No spam, unsubscribe any time.