The Local LLM Index / Inference Engines / #147
xaskasdf/ntransformer
by xaskasdf · Inference Engines · updated 3mo ago
High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.
43
momentum
461
stars
20
forks
#147
rank
High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.