The Local LLM Index / Quantization & Formats / #203

SqueezeAILab/KVQuant

by SqueezeAILab · Quantization & Formats · updated 1y ago

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

momentum

431

stars

forks

#203

rank

compressionefficient-inferenceefficient-modellarge-language-modelsllamallmlocalllamalocalllmmistralmodel-compressionnatural-language-processingquantization

View on GitHub →

SqueezeAILab/KVQuant

More in Quantization & Formats