The Local LLM Index / Quantization & Formats / #45

intel/auto-round

by intel · Quantization & Formats · updated today

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

68
momentum
1,451
stars
140
forks
#45
rank
diffusersggufint4llmsmxfp4nvfp4omniquantizationroundingsglangtransformersvllm
View on GitHub →