GPULlama3: GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

(May 2025 - now)

GPULlama3 is a GPU-accelerated Llama3 inference engine implemented in pure Java using TornadoVM.

Llama3 models written in native Java automatically accelerated on GPUs with TornadoVM. Runs Llama3 inference efficiently using TornadoVM for GPU acceleration. Currently, it supports Llama3, Mistral, Qwen2.5, Qwen3, Phi3, and IBM Granite models in the GGUF format.

Code of this project is open-source.

Thanos Stratikopoulos