The Local LLM Index / Quantization & Formats / #90
raketenkater/llm-server
by raketenkater · Quantization & Formats · updated 1d ago
Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.
61
momentum
223
stars
11
forks
#90
rank
cudaggufgolanginference-serverllama-cppllamacppllmlocal-llmlocalllamametalmoemulti-gpu
View on GitHub →