Add Metal GPU backend with MNIST training (98.2% accuracy) by alok · Pull Request #64 · lecopivo/SciLean

alok · 2025-12-18T00:27:07Z

Summary

Metal GPU backend for accelerated ML on Apple Silicon.

Features

GPU-resident buffers with type-safe CpuBuffer/GpuBuffer API
Optimized GEMM kernels (simdgroup tiling, double-buffered)
Fused ML ops: biasRelu, biasGelu, biasAdd, softmax, layerNorm
Flash attention kernels (causal and non-causal)
Conv2D/MaxPool2D/AvgPool2D for CNN inference
Mini-batch training with GPU buffer slicing
Command buffer batching for reduced dispatch overhead

MNIST Results

98.2% accuracy on full 60k training set
10 epochs, 256-sample mini-batches
~230ms per epoch (234 batches)

Files

Metal/kmeans.metal - Metal shader kernels
Metal/metal_backend.mm - C++ dispatch layer
SciLean/FFI/Metal.lean - Lean bindings
examples/GpuMNIST.lean - Training example

Features: - GPU-resident buffers with type-safe CpuBuffer/GpuBuffer transfers - Optimized GEMM kernels (simdgroup tiling, double-buffered) - Fused ML ops: biasRelu, biasGelu, biasAdd, softmax, layerNorm - Flash attention kernels (causal and non-causal) - Conv2D/MaxPool2D/AvgPool2D for CNN inference - Mini-batch training with GPU buffer slicing - Command buffer batching for reduced dispatch overhead Results: 98.2% accuracy on full 60k MNIST, 10 epochs, ~230ms/epoch

alok closed this Dec 18, 2025

alok deleted the pr/gpu-metal-clean branch December 18, 2025 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Metal GPU backend with MNIST training (98.2% accuracy)#64

Add Metal GPU backend with MNIST training (98.2% accuracy)#64
alok wants to merge 1 commit intolecopivo:masterfrom
alok:pr/gpu-metal-clean

alok commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alok commented Dec 18, 2025

Summary

Features

MNIST Results

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant