-
Shanghai Jiao Tong University
- Shanghai, China
- https://zzp1012.github.io
- @zhanpeng_zhou
Starred repositories
🧸 Lobe Vidol - Making Virtual Idols Accessible for EveryOne
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Implementation for paper CauScientist: Teaching LLMs to Respect Data for Causal Discovery.
[ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)
mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations
implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880
shehper / scaling_laws
Forked from karpathy/nanoGPTAn open-source implementation of Scaling Laws for Neural Language Models using nanoGPT
[AAAI2026 (oral)] On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD (Open Source Code)
A Minimalist Optimizer Design for LLM Pretraining
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Code to generate figures of paper "When do spectral gradient updates help in deep learning?"
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
MiniMax-M2, a model built for Max coding & agentic workflows.
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
LLM-Merging: Building LLMs Efficiently through Merging
The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Process" (arxiv 2407.20311) and "Physics of Language Models Part 2…
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
sii-research / predictive-consistency-learning
Forked from Thinklab-SJTU/predictive-consistency-learning[ICML 2025] Generative Modeling Reinvents Supervised Learning: Label Repurposing with Predictive Consistency Learning
Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam

