-
transformer-llm-LeetCUDA Public
Forked from xlite-dev/LeetCUDA📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
Cuda GNU General Public License v3.0 UpdatedMay 28, 2025 -
cccl-nvidia-logical-algorithm Public
Forked from NVIDIA/ccclCUDA Core Compute Libraries
C++ Other UpdatedMay 14, 2025 -
rvv-examples Public
Forked from nibrunie/rvv-examplesExample of RISC-V Vector programming
C UpdatedMay 11, 2025 -
sparse_conv_triton Public
Forked from l1351868270/ld_tritontriton ops
Python MIT License UpdatedApr 29, 2025 -
mlir-gpu-gpu-offload Public
Forked from sdiehl/gpu-offloadCompile MLIR to PTX and execute it on NVIDIA GPUs
Jupyter Notebook MIT License UpdatedApr 16, 2025 -
bevGaussianLSS Public
Forked from HCIS-Lab/GaussianLSSOfficial PyTorch implementation of "GaussianLSS - Toward Real-world BEV Perception via Depth Uncertainty Estimation using Gaussian Splatting" (CVPR 2025).
Jupyter Notebook MIT License UpdatedApr 3, 2025 -
hvx_hexagon_examples Public
Forked from Geeloon/hexagon_examplessome hexagon intrinsic examples based on Qualcomm Hexagon
UpdatedMar 7, 2025 -
FlagGems-torch-triton-mlir Public
Forked from flagos-ai/FlagGemsFlagGems is an operator library for large language models implemented in Triton Language.
Python Apache License 2.0 UpdatedFeb 27, 2025 -
sort-gpu-layer-GPUSorting Public
Forked from b0nes164/GPUSortingState of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
Cuda Other UpdatedDec 14, 2024 -
Awesome-Autonomous-Driving Public
Forked from autodriving-heart/Awesome-Autonomous-Drivingawesome-autonomous-driving-bev-occ-lidar
UpdatedAug 19, 2024 -
awesome-cuda-and-hpc Public
Forked from coderonion/awesome-cuda-and-hpchpc🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
UpdatedJul 16, 2024 -
Polygeist Public
Forked from llvm/Polygeistmlir add new dialect accorrding to paper optimizationC/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!
C++ Other UpdatedJul 3, 2024 -
hackhackawesome-web-hacking Public
Forked from infoslack/awesome-web-hackingA list of web application security
MIT License UpdatedJun 2, 2024 -
hackhackAwesome-Hacking Public
Forked from Hack-with-Github/Awesome-HackingA collection of various awesome lists for hackers, pentesters and security researchers
Creative Commons Zero v1.0 Universal UpdatedMay 26, 2024 -
AISystem Public
Forked from Infrasys-AI/AISystemAISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Jupyter Notebook Apache License 2.0 UpdatedMay 21, 2024 -
ao Public
Forked from pytorch/aoNative PyTorch library for quantization and sparsity
Python BSD 3-Clause "New" or "Revised" License UpdatedMay 16, 2024 -
scalehls_mlir_papers Public
Forked from UIUC-ChenLab/scalehlsmlir_papers A scalable High-Level Synthesis framework on MLIR
MLIR Other UpdatedMay 15, 2024 -
-
only_train_once Public
Forked from tianyic/only_train_once_personal_footprintOTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
Python MIT License UpdatedMay 6, 2024 -
Cute-Gemm-Optimization Public
Forked from DD-DuDa/Cute-LearningJupyter Notebook MIT License UpdatedMay 5, 2024 -
Occupancy-research-paper-benchmark Public
Forked from keithAND2020/awesome-Occupancy-researchPapers on occupation, including monocular and multi-view in autonomous driving scenarios
UpdatedApr 24, 2024 -
resource-stream Public
Forked from gpu-mode/resource-streamCUDA related news and material links
MIT License UpdatedApr 17, 2024 -
llm.c Public
Forked from karpathy/llm.cLLM training in simple, raw C/CUDA
Cuda MIT License UpdatedApr 14, 2024 -
SHARK-Turbine Public
Forked from nod-ai/AMD-SHARK-ModelDevUnified compiler/runtime for interfacing with PyTorch Dynamo.
Python Apache License 2.0 UpdatedApr 11, 2024 -
google-research Public
Forked from google-research/google-researchGoogle Research
Jupyter Notebook Apache License 2.0 UpdatedApr 5, 2024 -
torch-xla-SPMD Public
Forked from HeegyuKim/torch-xla-SPMDPytorch/XLA SPMD Test code in Google TPU
Python MIT License UpdatedApr 3, 2024 -
gpu-toolkit Public
Forked from nasa03/CUDA-Hackers-Toolkit🦚 🧰 Collection of basic GPU algorithms implemented in CUDA C++.
Cuda MIT License UpdatedMar 30, 2024 -
cuda-beginner-course-cpp-version Public
Forked from coderonion/cuda-beginner-course-cpp-versionbilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码
Cuda MIT License UpdatedMar 24, 2024 -
riscv-v-spec Public
Forked from riscvarchive/riscv-v-specWorking draft of the proposed RISC-V V vector extension
Assembly Creative Commons Attribution 4.0 International UpdatedMar 17, 2024 -