One command to go from bare hardware to a fully working local AI API and management dashboard. No cloud required. No API keys. No data leaving your network.
Three steps from bare hardware to a working AI API. No PhD required.
Run the one-line installer. WarpHost detects Docker, checks for NVIDIA GPUs, and sets up everything automatically.
WarpHost scans your hardware — GPU model, VRAM, CPU, RAM — and recommends the best models for your setup.
Pull a model and start serving. You get an OpenAI-compatible API and a management dashboard instantly.
Drop-in replacement for OpenAI's API. Point any client at localhost:8811/v1 and it just works.
Automatically detects your NVIDIA GPU, VRAM, and system specs. Recommends the best models for your hardware.
Clean web UI to monitor your system, manage models, and test with a built-in chat playground.
Browse a curated catalog, pull models with one click, switch between them instantly.
Runs in Docker with NVIDIA Container Toolkit for GPU passthrough. Clean, isolated, easy to update.
No data leaves your network. No API keys. No cloud dependency. Your hardware, your models, your data.
18 curated models from 3B to 70B. From laptop-friendly to datacenter-grade.
Meta's edge model. Runs on any GPU or CPU-only. Great starting point.
Punches way above its weight. Thinking modes, coding, multilingual.
Best all-rounder at 8B. The sweet spot for most hardware.
Google's standout. 128K context and 140+ languages.
Outperforms OpenAI o1-mini. Best reasoning you can run locally.
Top-ranked open-source model. Rivals models 10x its size.
Matches GPT-4o on coding benchmarks. State of the art for code.
Meta's best. Rivals Llama 3.1 405B. Top-tier local quality.
WarpHost is free, open source, and ready to run on your hardware today.