- Vancouver, BC, Canada
-
23:40
(UTC -07:00) - rwightman.com
- @wightmanr
Highlights
- Pro
Stars
HeadTTS: Free neural text-to-speech (Kokoro) with timestamps and visemes for lip-sync. Runs in-browser (WebGPU/WASM) or on local Node.js WebSocket/REST server (CPU).
Talking Head (3D): A JavaScript class for real-time lip-sync using full-body 3D avatars.
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Copper is an operating system for robots - build, run, and replay your entire robot deterministically.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
An implementation of PSGD Kron in JAX for distributed training in JAX or Flax
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation preconditioner and more)
An implementation of PSGD Kron second-order optimizer for PyTorch
Janus-Series: Unified Multimodal Understanding and Generation Models
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, an…
[TIP2024] MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers
[ICLR 2026] When it comes to optimizers, it's always better to be safe than sorry
Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.
Supercharge Your PyTorch Image Models: Bag of Tricks to 8x Faster Inference with ONNX Runtime & Optimizations
Scenic: A Jax Library for Computer Vision Research and Beyond
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Robot Utility Models are trained on a diverse set of environments and objects, and then can be deployed in novel environments with novel objects without any further data or training.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
A PyTorch native platform for training generative AI models







