Stars
A framework for efficient model inference with omni-modality models
Zotero MCP: Connects your Zotero research library with Claude and other AI assistants via the Model Context Protocol to discuss papers, get summaries, analyze citations, and more.
The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
Official inference repo for FLUX.2 models
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Ongoing research training transformer models at scale
A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration techniques.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
A curated list of materials on AI efficiency
ValueCell is a community-driven, multi-agent platform for financial applications.
[ICCV 2025] CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
how to optimize some algorithm in cuda.
Official inference repo for FLUX.1 models
📝A simple and elegant markdown editor, available for Linux, macOS and Windows.