Skip to content
View ichim-david's full-sized avatar
  • Romania

Sponsors

@sourcegraph-community

Sponsoring

@plone

Organizations

@plone @eaudeweb @collective @eea

Block or report ichim-david

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Community recipes for serving LLMs on RTX 3090. Multi-engine (vLLM, llama.cpp, SGLang) and model-agnostic. Currently shipping Qwen3.6-27B configs for 1× and 2× cards.

Shell 285 12 Updated May 2, 2026

Minimal CLI coding agent by Mistral

Python 4,067 470 Updated Apr 30, 2026

Orchestrate an entire AI dev team on as little as 5GB VRAM. An AI coding agent built like a systems engineer. Ephemeral context, zero token bloat, exact-match diffs. Stop wasting money on 10k token…

Go 253 21 Updated May 1, 2026

A pi extension that replaces the default footer with a live observability bar and provides a full dashboard command.

TypeScript 7 Updated Apr 29, 2026

Skills for Real Engineers. Straight from my .claude directory.

Shell 52,871 4,438 Updated Apr 30, 2026

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

Python 52,079 2,791 Updated May 1, 2026

An experiment, what if Gemma had a Desktop app tuned for the model and offline scenarios?

TypeScript 62 5 Updated May 1, 2026

Autonomous experiment loop extension for pi

TypeScript 6,335 366 Updated Apr 29, 2026

An open-source alternative to Claude Cowork (powered by opencode)

TypeScript 14,595 1,425 Updated May 1, 2026
Python 3 Updated Apr 30, 2026

Search-based optimizer for MLX/Metal on Apple Silicon.

Python 8 Updated Apr 30, 2026

vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon

Python 234 15 Updated May 2, 2026

Community maintained hardware plugin for vLLM on Apple Silicon

Python 1,059 116 Updated May 2, 2026

First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic…

Python 16 Updated Apr 26, 2026

The original local LLM interface. Text, vision, tool-calling, training. UI + API, 100% offline and private.

Python 46,915 5,970 Updated Apr 27, 2026

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

Python 1,229 124 Updated Apr 30, 2026

A native macOS app to manage skills across coding agents — Claude Code, Cursor, Copilot CLI, Codex, Gemini CLI

Swift 138 13 Updated Apr 27, 2026

Pi extension for async subagent delegation with truncation, artifacts, and session sharing

TypeScript 1,127 149 Updated May 2, 2026

🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models

Python 7,948 768 Updated May 1, 2026

A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.

1,685 127 Updated Apr 24, 2026

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

C++ 255 25 Updated May 1, 2026

I set out to implement TurboQuant (PolarQuant + QJL) for Gemma 4 31B's KV cache — a 31 billion parameter model running on a single Mac. It doesn't work on this model. What I built instead is faster.

C++ 5 Updated Apr 13, 2026

A novel metal kernel that implements TurboQuant such that llama3.170b can run on a consumer Mac book

C++ 8 Updated Apr 14, 2026

a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task

Python 953 69 Updated May 2, 2026

control your applications using pi-coding-agent. fully invisible.

TypeScript 490 39 Updated Apr 28, 2026

Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tuning & inference on M-series silicon.

Objective-C 85 6 Updated Mar 6, 2026

AI You Control: Choose your models. Own your data. Eliminate vendor lock-in.

TypeScript 4,449 295 Updated May 1, 2026

Ask the oracle when you're stuck. Invoke GPT-5 Pro with a custom context and files.

TypeScript 2,166 208 Updated May 1, 2026

vMLX Swift Engine.

Swift 2 3 Updated May 1, 2026
Next