This is an experimental project and is no longer maintained.
The implementation and algorithms are heavily inspired by llama.cpp and vLLM.
High-performance LLM inference engine in Go, optimized with SIMD (AVX2/AVX512).
Build with Makefile:
make buildThis produces binaries in bin/: makarna, quantize, convert.
Build with CUDA:
make build-cudaProduces bin/makarna-cuda.
Alternatively, use Go install:
go install ./cmd/...Convert HuggingFace models (.safetensors) to .mak format.
convert <hf_dir> <output.mak> [flags]Flags:
--quant <type>Options: q2_k, q3_k, q4_k, q5_k, q6_k, q8_k.--mixEnable smart mix quantization.--workers <n>Number of parallel workers.--max-inflight-mb <n>Memory limit during conversion.
Quantize an existing .mak file to a K-quant format.
quantize <input.mak> <output.mak> <type> [flags]Flags:
--mixEnable smart mix mode.
Inference CLI.
run-model -model <file.mak> -prompt "text" [flags]Common Flags:
-steps <n>Max tokens (default 10).-temp <f>Temperature (default 0.7).-top-k <n>Top-K (default 40).-top-p <f>Top-P (default 0.9).-rep-penalty <f>Repetition penalty (default 1.1).-chatUse chat formatting.-threads <n>CPU threads (-1 = 90% of cores).-n-gpu-layers <n>Layers to offload to GPU (-1=auto).-gpu-budget <f>GPU memory fraction (0.0-1.0).-mmapUse mmap for weights.-profile-log <val>Profile output (true, report, or ).-listen <addr>Start OpenAI-compatible server on .
Dedicated OpenAI-compatible API server.
openai -model <file.mak> [flags]Flags:
-listen <addr>Default is :8080.-max-seq-len <n>Max context length.-n-gpu-layers <n>Number of GPU layers.
MAK v2 supports K-quants (block size 256):
q8_k: 8-bit.q6_k: 6-bit.q5_k: 5-bit.q4_k: 4-bit (recommended).q3_k: 3-bit.q2_k: 2-bit.
Convert and quantize:
convert /models/Qwen3-1.7B-Instruct model-q4k.mak --quant q4_k --mixRun inference:
run-model -model model-q4k.mak -prompt "Explaining quantum physics" -steps 100Start API server:
run-model -model model-q4k.mak -listen :8080 -chatTests:
go test ./...
go test -tags cuda ./... # Requires GPUBenchmarks:
go test -bench=. ./pkg/tensor/...