Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

@ArthurZucker ArthurZucker commented Nov 28, 2025

Updater stub for better typing.
Superseed #1865

@ArthurZucker ArthurZucker changed the base branch from austinleedavis/main to main November 28, 2025 09:30
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker added the python Issue related to the python binding label Nov 28, 2025
@ArthurZucker ArthurZucker requested a review from Copilot November 28, 2025 18:53
Copilot finished reviewing on behalf of ArthurZucker November 28, 2025 18:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances type checking support for the tokenizers library by improving stub file generation and adding comprehensive type annotations. It supersedes PR #1865 with a more complete solution for static type checking.

Key Changes:

  • Enhanced stub.py to generate better type stubs with property setters, magic methods, and improved signature handling
  • Updated PyO3 Rust bindings with explicit text_signature annotations for better Python type visibility
  • Added type ignore comments throughout test and example files to suppress expected type checker warnings
  • Introduced type checking workflow integration (using "ty" tool)

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
bindings/python/stub.py Major refactoring: added OVERRIDES dict, improved signature extraction, property setter generation, and better error handling for ruff formatting
bindings/python/src/*.rs Added/updated text_signature attributes on PyO3 constructors and methods to expose proper signatures to Python type checkers
bindings/python/py_src/**/*.pyi Generated stub files with complete property setters, magic methods (__getstate__, __setstate__, __getitem__, etc.), and improved type hints
bindings/python/tests/**/*.py Added # type: ignore comments for intentional type violations in tests (e.g., testing error handling with invalid inputs)
bindings/python/scripts/*.py Added # type: ignore[import] for optional/external dependencies like sentencepiece, transformers, jieba, tiktoken
bindings/python/pyproject.toml Added "ty" to testing dependencies, removed deprecated black configuration
bindings/python/Makefile Integrated "ty check" commands into style checking targets
.github/workflows/python.yml Added type checking step in CI workflow
bindings/python/docs/pyo3.md New documentation explaining PyO3 usage patterns for Python bindings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

impl PySequence {
#[new]
#[pyo3(text_signature = None)]
#[pyo3(signature= (normalizers) ,text_signature = "(self, normalizers)")]
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential spacing inconsistency: line 401 has signature= (normalizers) with a space before the opening parenthesis, while other lines use signature=(...). This should be signature=(normalizers) to maintain consistency.

Suggested change
#[pyo3(signature= (normalizers) ,text_signature = "(self, normalizers)")]
#[pyo3(signature=(normalizers),text_signature = "(self, normalizers)")]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Issue related to the python binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants