The Local LLM Index / Inference Engines / #90

jmaczan/tiny-vllm

by jmaczan · Inference Engines · updated 25d ago

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

momentum

958

stars

forks

#90

rank

aiattentionbatchingcoursecppcudahpcinferencellmllm-inferencepagedattentiontiny-vllm

More in Inference Engines