Experimental Project (Archived)

This is an experimental project and is no longer maintained.

The implementation and algorithms are heavily inspired by llama.cpp and vLLM.

Makarna Engine

High-performance LLM inference engine in Go, optimized with SIMD (AVX2/AVX512).

Installation

Build with Makefile:

make build

This produces binaries in bin/: makarna, quantize, convert.

Build with CUDA:

make build-cuda

Produces bin/makarna-cuda.

Alternatively, use Go install:

go install ./cmd/...

Commands

convert

Convert HuggingFace models (.safetensors) to .mak format.

convert <hf_dir> <output.mak> [flags]

Flags:

--quant <type> Options: q2_k, q3_k, q4_k, q5_k, q6_k, q8_k.
--mix Enable smart mix quantization.
--workers <n> Number of parallel workers.
--max-inflight-mb <n> Memory limit during conversion.

quantize

Quantize an existing .mak file to a K-quant format.

quantize <input.mak> <output.mak> <type> [flags]

Flags:

--mix Enable smart mix mode.

run-model

Inference CLI.

run-model -model <file.mak> -prompt "text" [flags]

Common Flags:

-steps <n> Max tokens (default 10).
-temp <f> Temperature (default 0.7).
-top-k <n> Top-K (default 40).
-top-p <f> Top-P (default 0.9).
-rep-penalty <f> Repetition penalty (default 1.1).
-chat Use chat formatting.
-threads <n> CPU threads (-1 = 90% of cores).
-n-gpu-layers <n> Layers to offload to GPU (-1=auto).
-gpu-budget <f> GPU memory fraction (0.0-1.0).
-mmap Use mmap for weights.
-profile-log <val> Profile output (true, report, or ).
-listen <addr> Start OpenAI-compatible server on .

openai

Dedicated OpenAI-compatible API server.

openai -model <file.mak> [flags]

Flags:

-listen <addr> Default is :8080.
-max-seq-len <n> Max context length.
-n-gpu-layers <n> Number of GPU layers.

Quantization Types

MAK v2 supports K-quants (block size 256):

q8_k: 8-bit.
q6_k: 6-bit.
q5_k: 5-bit.
q4_k: 4-bit (recommended).
q3_k: 3-bit.
q2_k: 2-bit.

Examples

Convert and quantize:

convert /models/Qwen3-1.7B-Instruct model-q4k.mak --quant q4_k --mix

Run inference:

run-model -model model-q4k.mak -prompt "Explaining quantum physics" -steps 100

Start API server:

run-model -model model-q4k.mak -listen :8080 -chat

Development

Tests:

go test ./...
go test -tags cuda ./... # Requires GPU

Benchmarks:

go test -bench=. ./pkg/tensor/...

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd		cmd
pkg		pkg
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experimental Project (Archived)

Makarna Engine

Installation

Commands

convert

quantize

run-model

openai

Quantization Types

Examples

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Experimental Project (Archived)

Makarna Engine

Installation

Commands

convert

quantize

run-model

openai

Quantization Types

Examples

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages