Skip to content

Conversation

@jagot
Copy link

@jagot jagot commented Nov 5, 2025

Purpose

I have fixed an issue in which valid C++ identifiers with Unicode characters were rejected. To this end, I have

  • added the dependency regex, which is a drop-in replacement for the standard Python module re, but with extended capabilities (most pertinently the character classes \p{XID_Start} and \p{XID_Continue}), and updated CHANGES.rst to reflect this fact;
  • added a test that triggers the error without the fix;
  • ensured all test suites pass locally.

References

@jakobandersen
Copy link
Contributor

Could you add a test where the unicode identifier would matter for name mangling? E.g., where it is part of a type name or as the function name?
I think the mangling scheme should just work, but I don't currently know if there are any pitfalls.

@jagot
Copy link
Author

jagot commented Nov 5, 2025

That's a good catch; if I modify my initial example to

.. Unicode Identifiers documentation master file, created by
   sphinx-quickstart on Wed Nov  5 10:18:18 2025.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Unicode Identifiers documentation
=================================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

.. cpp:function:: void my_fancy_function(int ξ)

                  Print the value of ξ.

.. cpp:struct:: template<typename T> MyStruct
                
                My very own struct
                
                .. cpp:member:: T α

                                A magic number

.. cpp:function:: template<typename Ξ> Ξ ξcompute(Ξ ξ)

.. cpp:struct:: ΨStruct
                  
.. cpp:function:: ΨStruct ψcompute(double i)
                  
.. cpp:function:: PsiStruct psicompute(double i)

it prints the warnings

/Users/jagot/prog/sphinx/.idea/index.rst:21: WARNING: Index id generation for C++ object "T α" failed, please report as bug (id=_CPPv4N8MyStruct1αE).
/Users/jagot/prog/sphinx/.idea/index.rst:25: WARNING: Index id generation for C++ object "template<typename Ξ> Ξ ξcompute(Ξ ξ)" failed, please report as bug (id=_CPPv4I0E8ξcompute1Ξ1Ξ).
/Users/jagot/prog/sphinx/.idea/index.rst:27: WARNING: Index id generation for C++ object "ΨStruct" failed, please report as bug (id=_CPPv47ΨStruct).
/Users/jagot/prog/sphinx/.idea/index.rst:29: WARNING: Index id generation for C++ object "ΨStruct ψcompute(double i)" failed, please report as bug (id=_CPPv48ψcomputed).

The resulting documentation looks reasonable though:
image

@jakobandersen
Copy link
Contributor

Right, I forgot about that check. It's basically an old sanity check that I suggest should be deleted:

if not re.compile(r'^[a-zA-Z0-9_]*$').match(newest_id):
logger.warning(
'Index id generation for C++ object "%s" failed, please '
'report as bug (id=%s).',
ast,
newest_id,
location=self.get_location(),
)

@jagot
Copy link
Author

jagot commented Nov 5, 2025

Removing that test indeed silences the warnings. However, I am not clever enough to figure out what the id_dict should be for the following test cases:

check('function', 'template<typename Ξ> Ξ ξcompute(Ξ ξ)', {???})
check('struct', 'struct ΨStruct', {???})
check('function', 'ΨStruct ψcompute(double i)', {???})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using Unicode identifiers in C++ fails

2 participants