The Local LLM Index / Quantization & Formats / #14
Michael-A-Kuykendall/shimmy
by Michael-A-Kuykendall · Quantization & Formats · updated 1d ago
⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.
74
momentum
5,417
stars
515
forks
#14
rank
api-servercommand-line-tooldeveloper-toolsggufhuggingfacehuggingface-modelshuggingface-transformersinference-serverllamallamacppllm-inferencelocal-ai
View on GitHub →