The Local LLM Index / Quantization & Formats / #45
intel/auto-round
by intel · Quantization & Formats · updated today
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
68
momentum
1,451
stars
140
forks
#45
rank
diffusersggufint4llmsmxfp4nvfp4omniquantizationroundingsglangtransformersvllm
View on GitHub →