Stars
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Minimalistic large language model 3D-parallelism training
A framework for few-shot evaluation of language models.
lightweight, standalone C++ inference engine for Google's Gemma models.
PyTorch native quantization and sparsity for training and inference
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Python toolbox for optimization on Riemannian manifolds with support for automatic differentiation
Train transformer language models with reinforcement learning.
Fast and memory-efficient exact attention
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
PyTorch extensions for high performance and large scale training.
Ongoing research training transformer models at scale
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
A fast, clean, responsive Hugo theme.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.
Generative Models by Stability AI
Accessible large language models via k-bit quantization for PyTorch.
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
Open source code for paper "On the Learning and Learnability of Quasimetrics".

