Open Source · AGPLv3

Run open-source AI on
your own hardware.

One command to go from bare hardware to a fully working local AI API and management dashboard. No cloud required. No API keys. No data leaving your network.

curl -fsSL https://warphost.io/install | bash

How it works

Three steps from bare hardware to a working AI API. No PhD required.

1

Install

Run the one-line installer. WarpHost detects Docker, checks for NVIDIA GPUs, and sets up everything automatically.

2

Detect & Recommend

WarpHost scans your hardware — GPU model, VRAM, CPU, RAM — and recommends the best models for your setup.

3

Run

Pull a model and start serving. You get an OpenAI-compatible API and a management dashboard instantly.

Everything you need to run AI locally

OpenAI-Compatible API

Drop-in replacement for OpenAI's API. Point any client at localhost:8811/v1 and it just works.

Hardware Auto-Detection

Automatically detects your NVIDIA GPU, VRAM, and system specs. Recommends the best models for your hardware.

Management Dashboard

Clean web UI to monitor your system, manage models, and test with a built-in chat playground.

One-Click Model Management

Browse a curated catalog, pull models with one click, switch between them instantly.

Docker Native

Runs in Docker with NVIDIA Container Toolkit for GPU passthrough. Clean, isolated, easy to update.

100% Local & Private

No data leaves your network. No API keys. No cloud dependency. Your hardware, your models, your data.

Supported Models

18 curated models from 3B to 70B. From laptop-friendly to datacenter-grade.

Llama 3.2 3B

4 GB

Meta's edge model. Runs on any GPU or CPU-only. Great starting point.

Qwen3 4B

4 GB

Punches way above its weight. Thinking modes, coding, multilingual.

Qwen3 8B

8 GB

Best all-rounder at 8B. The sweet spot for most hardware.

Gemma 3 12B

10 GB

Google's standout. 128K context and 140+ languages.

DeepSeek R1 32B

24 GB

Outperforms OpenAI o1-mini. Best reasoning you can run locally.

Qwen3 32B

24 GB

Top-ranked open-source model. Rivals models 10x its size.

Qwen2.5 Coder 32B

24 GB

Matches GPT-4o on coding benchmarks. State of the art for code.

Llama 3.3 70B

48 GB

Meta's best. Rivals Llama 3.1 405B. Top-tier local quality.

View all 18 models →

Ready to get started?

WarpHost is free, open source, and ready to run on your hardware today.