The Local LLM Index / Quantization & Formats / #13

Michael-A-Kuykendall/shimmy

by Michael-A-Kuykendall · Quantization & Formats · updated 3d ago

⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.

momentum

5,715

stars

549

forks

#13

rank

api-servercommand-line-tooldeveloper-toolsggufhuggingfacehuggingface-modelshuggingface-transformersinference-serverllamallamacppllm-inferencelocal-ai

More in Quantization & Formats