The Local LLM Index / Inference Engines / #118
jmaczan/tiny-vllm
by jmaczan · Inference Engines · updated 2mo ago
Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
54
momentum
792
stars
51
forks
#118
rank
aiattentionbatchingcoursecppcudahpcinferencellmllm-inferencepagedattentiontiny-vllm
View on GitHub →