The one-click
hybrid LLM solution

Automatically route your LLM coding tasks between frontier and local models while working directly in the coding tool you already use. Cut costs, keep data private, and build faster.

Download for Mac View on GitHub

Free · v0.2.5 · Requires Apple Silicon · 18 GB+ RAM

The problem

AI coding costs are spiraling

Token bills are climbing and usage caps are tightening as teams burn frontier-model credits on routine work. A growing consensus points to the fix: route everyday tasks to capable local models and save the cloud for the hard problems, cutting costs without giving up quality.

Getting started

Up and running in minutes

Drag to Applications

One DMG, everything bundled for your Mac. No Homebrew, no Python, no dependencies to manage.

Download your models

Pick the local models you want. Glass Slipper downloads and tunes them for your Apple Silicon hardware, with no configuration required.

Connect and go

Point your coding tool's MCP at Glass Slipper. The router classifies every task and delegates automatically in the background.

Live demo

See it in action

The router classifies each task and sends the cheap ones to a local model automatically, in the background. You keep working as usual.

claude-code

❯ summarize Code/cpython/objects/bytesobject.c using the local model

⏺ glass-slipper · local_summarize (MCP)

command: "cat ~/Code/cpython/Objects/bytesobject.c"

context_tokens: 15000

⏺ Here's the summary from your local model:

bytesobject.c implements Python's bytes type in C:

· Core operations: creation, concatenation, slicing

· Formatting: sprintf-style (%) format syntax

· String methods: split, strip, translate, hex

· Internal utilities: _PyBytesWriter buffer management

✻ Saved ~12,400 tokens · Completed in 2m 11s

Real output from Claude Code with Glass Slipper installed

Features

The best of cloud and local, automatically

Frontier models handle the hard thinking. Local models handle the rest. Glass Slipper routes between them so you don't have to.

Intelligent task routing

The router classifies each incoming task and decides which model should handle it: frontier models for hard thinking, local models for the grunt work. It learns how to delegate better over time.

Tokenmax without the limits

Stop burning credits and hitting usage caps. Local models take on most of your tasks at the same quality, keeping your costs down while you keep shipping.

Stay in your tool

Glass Slipper works in the background with Claude, Codex, or Cursor. There's no new interface to learn, so you keep working exactly where you already are.

Privacy when you need it

Some data should never leave your machine. Tell your tool to route sensitive work through Glass Slipper and it stays on device.

Zero configuration

A single DMG with everything bundled: a Rust MCP server, a tuned harness, and a vendored llama-server for Apple Silicon. No tinkering with temperature, quantization, or top_k.

Fast on-device inference

Built and tuned for Apple Silicon, local inference runs fast, speeding up routine tasks instead of waiting on a round trip to the cloud.

Changelog

What's new

Glass Slipper is in early development. Sign up for release notes.

v0.1.9

Jun 1, 2025

Initial public release

+Intelligent router that classifies tasks and delegates between cloud and local models automatically
+MCP server with built-in local_summarize, local_explain, and local_review tools
+One-click model download and llama-server management from the menu bar
+Local inference tuned for Apple Silicon, so there's no tinkering with temperature, quantization, or top_k
+No telemetry, no auto-update, fully self-contained bundle (no Homebrew or Python required)