The Local LLM Index / Inference Engines / #116

ferrumox/fox

by ferrumox · Inference Engines · updated 7d ago

High-performance LLM inference engine — drop-in replacement for Ollama with faster multi-turn inference, lower TTFT, and higher throughput through prefix caching and continuous batching.

momentum

170

stars

forks

#116

rank

View on GitHub →

ferrumox/fox

More in Inference Engines