- Romania
Stars
- All languages
- ActionScript
- Astro
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Crystal
- Cython
- Dart
- Dockerfile
- Elixir
- F#
- GDScript
- Gherkin
- Go
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- Less
- Lua
- MDX
- Makefile
- Mermaid
- Nim
- Nunjucks
- Nushell
- Objective-C
- PHP
- Perl
- Python
- ReScript
- Rich Text Format
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Svelte
- Swift
- TSQL
- TeX
- TypeScript
- V
- Vala
- Vim Script
- Vim Snippet
- Visual Basic .NET
- Vue
- Zig
Community recipes for serving LLMs on RTX 3090. Multi-engine (vLLM, llama.cpp, SGLang) and model-agnostic. Currently shipping Qwen3.6-27B configs for 1× and 2× cards.
Orchestrate an entire AI dev team on as little as 5GB VRAM. An AI coding agent built like a systems engineer. Ephemeral context, zero token bloat, exact-match diffs. Stop wasting money on 10k token…
A pi extension that replaces the default footer with a live observability bar and provides a full dashboard command.
Skills for Real Engineers. Straight from my .claude directory.
🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman
An experiment, what if Gemma had a Desktop app tuned for the model and offline scenarios?
Autonomous experiment loop extension for pi
An open-source alternative to Claude Cowork (powered by opencode)
Search-based optimizer for MLX/Metal on Apple Silicon.
vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon
Community maintained hardware plugin for vLLM on Apple Silicon
First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic…
The original local LLM interface. Text, vision, tool-calling, training. UI + API, 100% offline and private.
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
A native macOS app to manage skills across coding agents — Claude Code, Cursor, Copilot CLI, Codex, Gemini CLI
Pi extension for async subagent delegation with truncation, artifacts, and session sharing
🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models
A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.
Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.
I set out to implement TurboQuant (PolarQuant + QJL) for Gemma 4 31B's KV cache — a 31 billion parameter model running on a single Mac. It doesn't work on this model. What I built instead is faster.
A novel metal kernel that implements TurboQuant such that llama3.170b can run on a consumer Mac book
a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task
control your applications using pi-coding-agent. fully invisible.
Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tuning & inference on M-series silicon.
AI You Control: Choose your models. Own your data. Eliminate vendor lock-in.
Ask the oracle when you're stuck. Invoke GPT-5 Pro with a custom context and files.





