The Local LLM Index / Quantization & Formats / #131
psmarter/mini-infer
by psmarter · Quantization & Formats · updated 1mo ago
LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
50
momentum
270
stars
16
forks
#131
rank
continuous-batchingcudainferenceinference-enginekv-cachelanguage-modelllmmachine-learningmoepagedattentionpytorchquantization
View on GitHub →