Skip to content
View River861's full-sized avatar
Focusing
Focusing

Block or report River861

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.

Python 134 29 Updated Feb 22, 2026

[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inputs, making it easy to integrate both visual understanding an…

Python 106 5 Updated Apr 23, 2025

The Intelligent GUI Agent for Mobile Phones

Python 1,788 222 Updated Apr 2, 2026

A fine-grained remote paging system for memory disaggregation.

C 6 2 Updated May 28, 2025

Open-source release for HDTX (ATC'25)

C++ 3 3 Updated Jun 3, 2025

learning how CUDA works

Cuda 384 47 Updated Mar 3, 2025

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,614 639 Updated Feb 15, 2025

Modified version of PyTorch able to work with changes to GPGPU-Sim

C++ 56 29 Updated Nov 18, 2022

We did some modifications/enhancements for original FlexGen.

Python 6 3 Updated Oct 10, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 798 92 Updated Apr 6, 2025
Jupyter Notebook 132 15 Updated Nov 11, 2024

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,368 183 Updated Mar 12, 2026

A user level library for applications to transparently use Intel DSA.

C 42 10 Updated Jan 23, 2026

Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Makefile 101 21 Updated Sep 2, 2021

Wapplique: Testing WebAssembly Runtime via Execution Context-aware Bytecode Mutation

WebAssembly 4 1 Updated Nov 4, 2024

Fast OS-level support for GPU checkpoint and restore

C++ 280 30 Updated Sep 28, 2025

example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory

C 153 36 Updated Jul 30, 2024

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 182 37 Updated Jul 10, 2024

This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value Stores.

C++ 24 2 Updated Oct 20, 2024

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

C++ 112 35 Updated Oct 5, 2024

This is the implementation repository of our SOSP'23 paper: Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System.

C++ 38 12 Updated Sep 24, 2023

A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

428 37 Updated Feb 21, 2026

A collection of awesome researchers and papers about disaggregated memory.

182 17 Updated Apr 3, 2026

AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [ASE'23]

Java 39 3 Updated Feb 20, 2024

This is the implementation repository of our FAST'23 paper: FUSEE: A Fully Memory-Disaggregated Key-Value Store.

C++ 62 14 Updated Feb 14, 2023

✍️ A static blog writing client (一个静态博客写作客户端)

TypeScript 10,277 833 Updated Jul 26, 2023

[Android] 这是一个自定义Loading View库。暂停更新

Java 595 107 Updated Mar 11, 2020

CodeHub is an iOS application written using Xamarin

C# 22,640 609 Updated Jun 22, 2022