The Local LLM Index / Quantization & Formats / #59

ModelCloud/GPTQModel

by ModelCloud · Quantization & Formats · updated today

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

66
momentum
1,177
stars
187
forks
#59
rank
gptqoptimumpeftquantizationsglangtransformersvllm
View on GitHub →