The Local LLM Index / Quantization & Formats / #196

SqueezeAILab/KVQuant

by SqueezeAILab · Quantization & Formats · updated 1y ago

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

30
momentum
427
stars
46
forks
#196
rank
compressionefficient-inferenceefficient-modellarge-language-modelsllamallmlocalllamalocalllmmistralmodel-compressionnatural-language-processingquantization
View on GitHub →