The Local LLM Index / Quantization & Formats / #43

intel/auto-round

by intel · Quantization & Formats · updated today

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

momentum

1,542

stars

156

forks

#43

rank

diffusersggufint4llmsmxfp4nvfp4omniquantizationroundingsglangtransformersvllm

View on GitHub →

intel/auto-round

More in Quantization & Formats