The Local LLM Index / Quantization & Formats / #136

SharpAI/SwiftLM

by SharpAI · Quantization & Formats · updated 2mo ago

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.

momentum

727

stars

forks

#136

rank

apple-siliinferenceiosllmmetalmlxmoeon-device-aiopenai-apiswift

View on GitHub →

SharpAI/SwiftLM

More in Quantization & Formats