Houtini LM

Name: Houtini LM
Rating: 4.8 (32 reviews)
Author: houtini-ai

Routes routine coding tasks from Claude to your local LLM to save on API costs while keeping Claude for complex reasoning and architecture work.

Provides expert prompt engineering capabilities through LM Studio integration, featuring 35+ specialized functions for code analysis, generation, security audits, documentation creation, and creative tasks with intelligent context window management and caching optimization.

11280 views5Local (stdio)

auth security developer tools

GitHub Website

What it does

Generate boilerplate code and test stubs
Create commit messages and documentation
Perform code reviews and explanations
Convert between data formats
Generate mock data and type definitions
Audit code for security issues

Best for

Developers with high Claude API costsTeams running local LLMs on GPU serversProjects requiring private code processingLong coding sessions with repetitive tasks

35+ specialized functionsWorks with LM Studio, Ollama, vLLMSession tracking shows token savings

About Houtini LM

Houtini LM is a community-built MCP server published by houtini-ai that provides AI assistants with tools and capabilities via the Model Context Protocol. Houtini LM delivers advanced prompt engineering with 35+ functions for code analysis, generation, security audits, and d It is categorized under auth security, developer tools.

How to install

You can install Houtini LM in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Houtini LM is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

@houtini/lm

I built this because I kept leaving Claude Code running overnight on big refactors and the token bill was painful. A huge chunk of that spend goes on bounded tasks any decent model handles fine - generating boilerplate, explaining code, drafting commit messages, converting formats. Stuff that doesn't need Claude's reasoning or tool access.

Houtini LM connects Claude Code to a local LLM on your network. Claude keeps doing the hard work - architecture, planning, multi-file changes - and offloads the grunt work to your local model. Free. No rate limits. Private.

The session footer tracks everything Claude offloads, so you can watch the savings stack up.

How it works

Claude Code (orchestrator)
   │
   ├─ Complex reasoning, planning, architecture → Claude API (your tokens)
   │
   └─ Bounded grunt work → houtini-lm ──HTTP/SSE──> Your local LLM (free)
       • Boilerplate & test stubs          Qwen, Llama, Mistral, DeepSeek...
       • Code review & explanations        LM Studio, Ollama, vLLM, llama.cpp
       • Commit messages & docs
       • Format conversion
       • Mock data & type definitions

Claude's the architect. Your local model's the drafter. Claude QAs everything.

Quick start

Claude Code

claude mcp add houtini-lm -- npx -y @houtini/lm

That's it. If LM Studio's running on localhost:1234 (the default), Claude can start delegating straight away.

LLM on a different machine

I've got a GPU box on my local network running Qwen 3 Coder Next in LM Studio. If you've got a similar setup, point the URL at it:

claude mcp add houtini-lm -e LM_STUDIO_URL=http://192.168.1.50:1234 -- npx -y @houtini/lm

Claude Desktop

Drop this into your claude_desktop_config.json:

{
  "mcpServers": {
    "houtini-lm": {
      "command": "npx",
      "args": ["-y", "@houtini/lm"],
      "env": {
        "LM_STUDIO_URL": "http://localhost:1234"
      }
    }
  }
}

What gets offloaded

Delegate to the local model - bounded, well-defined tasks:

Task	Why it works locally
Generate test stubs	Clear input (source), clear output (tests)
Explain a function	Summarisation doesn't need tool access
Draft commit messages	Diff in, message out
Code review	Paste full source, ask for bugs
Convert formats	JSON↔YAML, snake_case↔camelCase
Generate mock data	Schema in, data out
Write type definitions	Source in, types out
Brainstorm approaches	Doesn't commit to anything

Keep on Claude - anything that needs reasoning, tool access, or multi-step orchestration:

Architectural decisions
Reading/writing files
Running tests and interpreting results
Multi-file refactoring plans
Anything that needs to call other tools

The tool descriptions are written to nudge Claude into planning delegation at the start of large tasks, not just using it when it happens to think of it.

Token tracking

Every response includes a session footer:

Model: qwen/qwen3-coder-next | This call: 145→248 tokens | Session: 12,450 tokens offloaded across 23 calls

The discover tool reports cumulative session stats too. Claude sees this data and (I've found) it reinforces the delegation habit throughout long-running tasks. The more it sees it's saving tokens, the more it looks for things to offload.

Tools

`chat`

The workhorse. Send a task, get an answer. The description includes planning triggers that nudge Claude to identify offloadable work when it's starting a big task.

Parameter	Required	Default	What it does
`message`	yes	-	The task. Be specific about output format.
`system`	no	-	Persona - "Senior TypeScript dev" not "helpful assistant"
`temperature`	no	0.3	0.1 for code, 0.3 for analysis, 0.7 for creative
`max_tokens`	no	2048	Lower for quick answers, higher for generation

`custom_prompt`

Three-part prompt: system, context, instruction. Keeping them separate prevents context bleed - consistently outperforms stuffing everything into one message, especially with local models.

Parameter	Required	Default	What it does
`instruction`	yes	-	What to produce. Under 50 words works best.
`system`	no	-	Persona + constraints, under 30 words
`context`	no	-	Complete data to analyse. Never truncate.
`temperature`	no	0.3	0.1 for review, 0.3 for analysis
`max_tokens`	no	2048	Match to expected output length

`code_task`

Built for code analysis. Pre-configured system prompt, locked to temperature 0.2 for focused output.

Parameter	Required	Default	What it does
`code`	yes	-	Complete source code. Never truncate.
`task`	yes	-	"Find bugs", "Explain this", "Write tests"
`language`	no	-	"typescript", "python", "rust", etc.
`max_tokens`	no	2048	Match to expected output length

`discover`

Health check. Returns model name, context window, latency, and cumulative session stats. Call before delegating if you're not sure the LLM's available.

`list_models`

Lists everything loaded on the LLM server with context window sizes.

Getting good results from local models

Qwen, Llama, DeepSeek - they score brilliantly on coding benchmarks now. The gap between a good and bad result is almost always prompt quality, not model capability. I've spent a fair bit of time on this.

Send complete code. Local models hallucinate details when you give them truncated input. If a file's too large, send the relevant function - not a snippet with ... in the middle.

Be explicit about output format. "Return a JSON array" or "respond in bullet points" - don't leave it open-ended. Smaller models need this.

Set a specific persona. "Expert Rust developer who cares about memory safety" gets noticeably better results than "helpful assistant."

State constraints. "No preamble", "reference line numbers", "max 5 bullet points" - tell the model what not to do as well as what to do.

Include surrounding context. For code generation, send imports, types, and function signatures - not just the function body.

One call at a time. If your LLM server runs a single model, parallel calls queue up and stack timeouts. Send them sequentially.

Configuration

Variable	Default	What it does
`LM_STUDIO_URL`	`http://localhost:1234`	Base URL of the OpenAI-compatible API
`LM_STUDIO_MODEL`	(auto-detect)	Model identifier - leave blank to use whatever's loaded
`LM_STUDIO_PASSWORD`	(none)	Bearer token for authenticated endpoints
`LM_CONTEXT_WINDOW`	`100000`	Fallback context window if the API doesn't report it

Compatible endpoints

Works with anything that speaks the OpenAI /v1/chat/completions API:

What	URL	Notes
LM Studio	`http://localhost:1234`	Default, zero config
Ollama	`http://localhost:11434`	Set `LM_STUDIO_URL`
vLLM	`http://localhost:8000`	Native OpenAI API
llama.cpp	`http://localhost:8080`	Server mode
Any OpenAI-compatible API	Any URL	Set URL + password

Streaming and timeouts

All inference uses Server-Sent Events streaming. Tokens arrive incrementally, keeping the connection alive. If generation takes longer than 55 seconds, you get a partial result instead of a timeout error - the footer shows ⚠ TRUNCATED when this happens.

The 55-second soft timeout exists because the MCP SDK has a hard ~60s client-side timeout. Without streaming, any response that took longer than 60 seconds just vanished. Not ideal.

Development

git clone https://github.com/houtini-ai/lm.git
cd lm
npm install
npm run build

Licence

MIT

Alternatives

Chrome DevTools MCP

chromedevtools

28.1k

AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo

OfficialPopular

50711

Chrome DevTools

chromedevtools

28.1k

Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f

OfficialPopular

3.9k172

GitHub

github

27.6k

Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packag

OfficialRemotePopular

4.5k232

Repomix

yamadashy

22.3k

Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool

OfficialPopular

1.0k7

Related Skills

Browse all skills

openai-knowledge

Use when working with the OpenAI API (Responses API) or OpenAI platform features (tools, streaming, Realtime API, auth, models, rate limits, MCP) and you need authoritative, up-to-date documentation (schemas, examples, limits, edge cases). Prefer the OpenAI Developer Documentation MCP server tools when available; otherwise guide the user to enable `openaiDeveloperDocs`.

agent-skills-tools

Security audit and validation tools for the Agent Skills ecosystem. Scan skill packages for common vulnerabilities like credential leaks, unauthorized file access, and Git history secrets. Use when you need to audit skills for security before installation, validate skill packages against Agent Skills standards, or ensure your skills follow best practices.

azure-identity-rust

Azure Identity SDK for Rust authentication. Use for DeveloperToolsCredential, ManagedIdentityCredential, ClientSecretCredential, and token-based authentication. Triggers: "azure-identity", "DeveloperToolsCredential", "authentication rust", "managed identity rust", "credential rust".

ccxt-typescript

CCXT cryptocurrency exchange library for TypeScript and JavaScript developers (Node.js and browser). Covers both REST API (standard) and WebSocket API (real-time). Helps install CCXT, connect to exchanges, fetch market data, place orders, stream live tickers/orderbooks, handle authentication, and manage errors. Use when working with crypto exchanges in TypeScript/JavaScript projects, trading bots, arbitrage systems, or portfolio management tools. Includes both REST and WebSocket examples.

dotnet-backend

.NET/C# backend developer for ASP.NET Core APIs with Entity Framework Core. Builds REST APIs, minimal APIs, gRPC services, authentication with Identity/JWT, authorization, database operations, background services, SignalR real-time features. Activates for: .NET, C#, ASP.NET Core, Entity Framework Core, EF Core, .NET Core, minimal API, Web API, gRPC, authentication .NET, Identity, JWT .NET, authorization, LINQ, async/await C#, background service, IHostedService, SignalR, SQL Server, PostgreSQL .NET, dependency injection, middleware .NET.

109

supabase-developer

Build full-stack applications with Supabase (PostgreSQL, Auth, Storage, Real-time, Edge Functions). Use when implementing authentication, database design with RLS, file storage, real-time features, or serverless functions.