The Local LLM Index / Quantization & Formats / #90

raketenkater/llm-server

by raketenkater · Quantization & Formats · updated 1d ago

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.

61
momentum
223
stars
11
forks
#90
rank
cudaggufgolanginference-serverllama-cppllamacppllmlocal-llmlocalllamametalmoemulti-gpu
View on GitHub →