The Local LLM Index / Inference Engines / #182

AI-Hypercomputer/JetStream

by AI-Hypercomputer · Inference Engines · updated 5mo ago

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

34
momentum
445
stars
66
forks
#182
rank
gemmagptgpuinferencejaxlarge-language-modelsllamallama2llmllm-inferencellmopsmlops
View on GitHub →