- Santa Clara
- https://www.linkedin.com/in/rdspring1
- @ryanspring13
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedDec 8, 2025 -
lightning-thunder Public
Forked from Lightning-AI/lightning-thunderSource to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.
Python Apache License 2.0 UpdatedDec 6, 2025 -
NvFuser Public
Forked from NVIDIA/FuserA Fusion Code Generator for NVIDIA GPUs
C++ Other UpdatedOct 15, 2025 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedJul 21, 2025 -
-
pytorch Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
Python Other UpdatedJul 10, 2025 -
AITemplate Public
Forked from facebookincubator/AITemplateAITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Python Apache License 2.0 UpdatedJul 27, 2023 -
vector-search-class-notes Public
Forked from edoliberty/vector-search-class-notesClass notes for the course "Long Term Memory in AI - Vector Search and Databases" COS 495 @ Princeton Fall 2023
TeX MIT License UpdatedJun 14, 2023 -
Auto-GPT Public
Forked from Significant-Gravitas/AutoGPTAn experimental open-source attempt to make GPT-4 fully autonomous.
Python MIT License UpdatedApr 3, 2023 -
twitter-algorithm-ml Public
Forked from twitter/the-algorithm-mlSource code for Twitter's Recommendation Algorithm
Python GNU Affero General Public License v3.0 UpdatedApr 1, 2023 -
nvprims-torchdynamo Public
Forked from pytorch/torchdynamoA Python-level JIT compiler designed to make unmodified PyTorch programs faster.
Python BSD 3-Clause "New" or "Revised" License UpdatedNov 9, 2022 -
Autodiff-Puzzles Public
Forked from srush/Autodiff-PuzzlesJupyter Notebook MIT License UpdatedOct 31, 2022 -
Autopilot-TensorFlow Public
Forked from SullyChen/Autopilot-TensorFlowA TensorFlow implementation of this Nvidia paper: https://arxiv.org/pdf/1604.07316.pdf with some changes
Jupyter Notebook MIT License UpdatedOct 10, 2022 -
tutel Public
Forked from microsoft/TutelTutel MoE: An Optimized Mixture-of-Experts Implementation
Python MIT License UpdatedSep 3, 2022 -
micrograd Public
Forked from karpathy/microgradA tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Jupyter Notebook MIT License UpdatedAug 29, 2022 -
minGPT Public
Forked from karpathy/minGPTA minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Python MIT License UpdatedAug 5, 2022 -
Optimizing-SGEMM-on-NVIDIA-Turing-GPUs Public
Forked from yzhaiustc/Optimizing-SGEMM-on-NVIDIA-Turing-GPUsOptimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Python GNU General Public License v3.0 UpdatedJun 18, 2022 -
RzLinear Public
Forked from apd10/RzLinearA compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
-
LSH_DeepLearning Public
Scalable and Sustainable Deep Learning via Randomized Hashing
-
cuda-training-series Public
Forked from olcf/cuda-training-seriesTraining materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Cuda UpdatedApr 4, 2022 -
-
Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F Public
Forked from yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512FStepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
C GNU General Public License v3.0 UpdatedFeb 3, 2022 -
mongoose Public
Forked from HazyResearch/mongooseA Learnable LSH Framework for Efficient NN Training
Python MIT License UpdatedJul 22, 2021 -
Optimizing-DGEMV-on-Intel-CPUs Public
Forked from yzhaiustc/Optimizing-DGEMV-on-Intel-CPUsHighly optimized DGEMV on CPU with both serial and parallel performance better than MKL and OpenBLAS.
C GNU General Public License v3.0 UpdatedMay 24, 2021 -
-
cs231n Public
Forked from AutomanHan/standford-cs231n-2018Solutions to Stanford CS231n Spring 2018 Course Assignments.
Jupyter Notebook UpdatedNov 18, 2020 -
Count-Sketch-Optimizers Public
A compressed adaptive optimizer for training large-scale deep learning models using PyTorch
-
MISSION Public
MISSION: Ultra Large-Scale Feature Selection using Count-Sketches
-
PyTorch_GBW_LM Public
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
-
LSH-Mutual-Information Public
Use LSH Sampling for Mutual Information Estimation





