The Local LLM Index / Inference Engines / #118

jmaczan/tiny-vllm

by jmaczan · Inference Engines · updated 2mo ago

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

54
momentum
792
stars
51
forks
#118
rank
aiattentionbatchingcoursecppcudahpcinferencellmllm-inferencepagedattentiontiny-vllm
View on GitHub →