Highlights
- Pro
Stars
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Development repository for the Triton language and compiler
Ship correct and fast LLM kernels to PyTorch
[BETA] The Official Neurosity Python SDK 🤯
NVIDIA curated collection of educational resources related to general purpose GPU programming.
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
CMake Tools provides a robust, convenient workflow for CMake projects in VS Code. It simplifies configurations with CMake presets, supports IntelliSense and built-in debugging for CMake scripts, an…
A GPU-accelerated cross-platform terminal emulator and multiplexer written by @wez and implemented in Rust
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
CUDA and OpenMP implementations of C2R/R2C inplace transposition
Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantia…
A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Run compilers and inspect assembly directly from Neovim with the help of https://godbolt.org
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
Nodejs extension host for vim & neovim, load extensions like VSCode and host language servers.
Extended Vim syntax highlighting for C and C++ (C++11..26)






