Skip to content

Conversation

@hugovk
Copy link

@hugovk hugovk commented Jan 10, 2026

We can apply @henryiii's improvement to packaging in pypa/packaging#1030 (see also https://iscinumpy.dev/post/packaging-faster/) to improve the performance of normalize_name and make it ~3.4 times faster.

Benchmark

Run normalize_name(n) on every name in PyPI:

# benchmark_names_distlib.py
import sqlite3
import timeit
from distlib.util import normalize_name

# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
# Or ues pre-cached files from:
# https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

CACHE_FILE = "/tmp/bench/names.txt"
DB_FILE = "/tmp/bench/pypi-data.sqlite"

try:
    with open(CACHE_FILE) as f:
        TEST_ALL_NAMES = [line.rstrip("\n") for line in f]
except FileNotFoundError:
    TEST_ALL_NAMES = []
    with sqlite3.connect(DB_FILE) as conn:
        with open(CACHE_FILE, "w") as cache:
            for (name,) in conn.execute("SELECT name FROM projects"):
                if name:
                    TEST_ALL_NAMES.append(name)
                    cache.write(name + "\n")


def bench():
    for n in TEST_ALL_NAMES:
        normalize_name(n)


if __name__ == "__main__":
    print(f"Loaded {len(TEST_ALL_NAMES):,} names")
    t = timeit.timeit("bench()", globals=globals(), number=1)
    print(f"Time: {t:.4f} seconds")

Benchmark data can be found at https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

Before

With Python 3.14 on macOS:

python benchmark_names_distlib.py
Loaded 8,344,947 names
Time: 4.6224 seconds

After

python benchmark_names_distlib.py
Loaded 8,344,947 names
Time: 1.3598 seconds

3.4 times faster.

@hugovk
Copy link
Author

hugovk commented Jan 19, 2026

Following on from pypa/packaging#1064, this is slower with 3.12 and 3.13, so marking as draft for now.

Testing on Python 3.8 to 3.14 on macOS (python.org versions) using hyperfine:

Python master (s) PR (s) Result
3.8 7.699 ± 0.620 4.620 ± 0.040 PR 1.67x faster
3.9 7.463 ± 0.480 4.775 ± 0.131 PR 1.56x faster
3.10 6.042 ± 0.019 3.947 ± 0.213 PR 1.53x faster
3.11 5.437 ± 0.144 3.598 ± 0.144 PR 1.51x faster
3.12 5.707 ± 0.358 6.907 ± 0.059 ⚠️ master 1.21x faster
3.13 5.248 ± 0.067 6.479 ± 0.163 ⚠️ master 1.23x faster
3.14 5.784 ± 0.605 2.391 ± 0.061 PR 2.42x faster
Details
hyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.8 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.8 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      7.699 s ±  0.620 s    [User: 6.703 s, System: 0.455 s]
  Range (min … max):    7.054 s …  8.290 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      4.620 s ±  0.040 s    [User: 4.381 s, System: 0.199 s]
  Range (min … max):    4.574 s …  4.650 s    3 runs

Summary
  PR ran
    1.67 ± 0.14 times faster than master

distlib on  speedup-canonicalize_name [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 49shyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.9 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.9 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      7.463 s ±  0.480 s    [User: 6.627 s, System: 0.416 s]
  Range (min … max):    7.111 s …  8.009 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      4.775 s ±  0.131 s    [User: 4.379 s, System: 0.296 s]
  Range (min … max):    4.679 s …  4.924 s    3 runs

Summary
  PR ran
    1.56 ± 0.11 times faster than master

distlib on  speedup-canonicalize_name [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 50shyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.10 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.10 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      6.042 s ±  0.019 s    [User: 5.561 s, System: 0.329 s]
  Range (min … max):    6.021 s …  6.054 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      3.947 s ±  0.213 s    [User: 3.503 s, System: 0.311 s]
  Range (min … max):    3.751 s …  4.174 s    3 runs

Summary
  PR ran
    1.53 ± 0.08 times faster than master

distlib on  speedup-canonicalize_name [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 41shyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.11 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.11 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      5.437 s ±  0.144 s    [User: 4.972 s, System: 0.274 s]
  Range (min … max):    5.279 s …  5.563 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      3.598 s ±  0.144 s    [User: 3.198 s, System: 0.265 s]
  Range (min … max):    3.463 s …  3.750 s    3 runs

Summary
  PR ran
    1.51 ± 0.07 times faster than master

distlib on  speedup-canonicalize_name [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 43shyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.12 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.12 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      5.707 s ±  0.358 s    [User: 5.005 s, System: 0.369 s]
  Range (min … max):    5.439 s …  6.113 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      6.907 s ±  0.059 s    [User: 6.439 s, System: 0.331 s]
  Range (min … max):    6.846 s …  6.963 s    3 runs

Summary
  master ran
    1.21 ± 0.08 times faster than PR

distlib on  speedup-canonicalize_name [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 48shyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.13 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.13 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      5.248 s ±  0.067 s    [User: 4.906 s, System: 0.226 s]
  Range (min … max):    5.183 s …  5.316 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      6.479 s ±  0.163 s    [User: 6.002 s, System: 0.315 s]
  Range (min … max):    6.328 s …  6.652 s    3 runs

Summary
  master ran
    1.23 ± 0.03 times faster than PR

distlib on  speedup-canonicalize_name [?] via 🐍 v3.14.2 via 💎 v3.1.3 took 31shyperfine --warmup 1 -r 3 \
    -n master --prepare 'git checkout master' 'python3.14 benchmark_names_distlib.py' \
    -n PR --prepare 'git checkout speedup-canonicalize_name -q' 'python3.14 benchmark_names_distlib.py'
Benchmark 1: master
  Time (mean ± σ):      5.784 s ±  0.605 s    [User: 5.174 s, System: 0.275 s]
  Range (min … max):    5.370 s …  6.478 s    3 runs

Benchmark 2: PR
  Time (mean ± σ):      2.391 s ±  0.061 s    [User: 2.067 s, System: 0.223 s]
  Range (min … max):    2.321 s …  2.431 s    3 runs

Summary
  PR ran
    2.42 ± 0.26 times faster than master

@hugovk hugovk marked this pull request as draft January 19, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant