Skip to content
View zzp1012's full-sized avatar

Block or report zzp1012

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

🧸 Lobe Vidol - Making Virtual Idols Accessible for EveryOne

TypeScript 899 123 Updated Mar 8, 2025

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python 3,414 233 Updated Jan 14, 2026

Implementation for paper CauScientist: Teaching LLMs to Respect Data for Causal Discovery.

4 Updated Jan 18, 2026

Spectral Sphere Optimizer

Python 90 1 Updated Jan 14, 2026

[ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)

Python 15 1 Updated Nov 4, 2024

mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations

Python 58 1 Updated Jan 12, 2026

implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880

Python 274 22 Updated Jan 4, 2026

An open-source implementation of Scaling Laws for Neural Language Models using nanoGPT

Python 53 7 Updated Dec 8, 2023
Jupyter Notebook 204 3 Updated Dec 19, 2025

[AAAI2026 (oral)] On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD (Open Source Code)

Jupyter Notebook 4 Updated Nov 17, 2025

A Minimalist Optimizer Design for LLM Pretraining

Python 14 Updated Aug 7, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 38,226 4,583 Updated Jan 18, 2026

Code to generate figures of paper "When do spectral gradient updates help in deep learning?"

Python 13 Updated Dec 3, 2025
Python 4 Updated Jan 2, 2026

Nano vLLM

Python 11,197 1,463 Updated Nov 3, 2025

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 6,195 665 Updated Jan 18, 2026

MiniMax-M2, a model built for Max coding & agentic workflows.

2,317 183 Updated Nov 13, 2025

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Python 5,808 596 Updated Jan 16, 2025

LLM-Merging: Building LLMs Efficiently through Merging

Jupyter Notebook 209 43 Updated Sep 24, 2024

The best ChatGPT that $100 can buy.

Python 40,999 5,314 Updated Jan 29, 2026

dLLM: Simple Diffusion Language Modeling

Python 1,673 166 Updated Jan 6, 2026

The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Process" (arxiv 2407.20311) and "Physics of Language Models Part 2…

Python 84 8 Updated Jan 12, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,699 2,029 Updated Jan 13, 2026

[ICML 2025] Generative Modeling Reinvents Supervised Learning: Label Repurposing with Predictive Consistency Learning

Python 2 Updated Jul 14, 2025

Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality

HTML 315 16 Updated Jan 5, 2026

Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training

Python 132 5 Updated Apr 17, 2024

Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam

Python 85 4 Updated Jul 28, 2024
Python 30 Updated Dec 2, 2024
Next