-
Fudan University
- Shanghai, China
- https://river861.github.io
- https://orcid.org/0009-0005-0712-2195
Stars
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inputs, making it easy to integrate both visual understanding an…
The Intelligent GUI Agent for Mobile Phones
A fine-grained remote paging system for memory disaggregation.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Modified version of PyTorch able to work with changes to GPGPU-Sim
We did some modifications/enhancements for original FlexGen.
Disaggregated serving system for Large Language Models (LLMs).
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
A user level library for applications to transparently use Intel DSA.
Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks
Wapplique: Testing WebAssembly Runtime via Execution Context-aware Bytecode Mutation
Fast OS-level support for GPU checkpoint and restore
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value Stores.
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
This is the implementation repository of our SOSP'23 paper: Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System.
A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).
A collection of awesome researchers and papers about disaggregated memory.
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [ASE'23]
This is the implementation repository of our FAST'23 paper: FUSEE: A Fully Memory-Disaggregated Key-Value Store.
✍️ A static blog writing client (一个静态博客写作客户端)
CodeHub is an iOS application written using Xamarin




