Skip to content

Conversation

jklukas
Copy link
Contributor

@jklukas jklukas commented Sep 18, 2025

Fixes #3022. To quote @rickeylev:

Under bzlmod, the repo mapping can become quite large (i.e. tens of megabytes) because its size scales as a factor of the number of repos in the transitive dependencies.

To address this, the --incompatible_compact_repo_mapping_manifest flag was introduced. This changes the repo mapping formation to use prefixes (instead of exact repo names) for mapping things.

To make this work with the runfiles library, the code has to be updated to handle these prefixes instead of just exact strings.

Copy link
Contributor

Summary of Changes

Hello @jklukas, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for Bazel's --incompatible_compact_repo_mapping_manifest flag into the Python runfiles library. This is a critical enhancement for Bazel users leveraging bzlmod, as it addresses the issue of excessively large repository mapping manifests by introducing prefix-based mapping resolution. The change ensures that the runfiles library can correctly locate files even when the repository mapping is in its compact, memory-optimized format, thereby improving the efficiency and scalability of Bazel builds with extensive dependency trees.

Highlights

  • Compact Repository Mapping Support: The Python runfiles library now supports Bazel's --incompatible_compact_repo_mapping_manifest flag.
  • Memory Optimization: This feature enables prefix-based repository mappings, significantly reducing the size of the repo mapping manifest (from tens of megabytes to much less) for large bzlmod dependency graphs.
  • New _RepositoryMapping Class: A new internal class _RepositoryMapping was introduced to intelligently parse and resolve both exact and prefix-based repository mappings, ensuring correct runfiles resolution.
  • Enhanced Rlocation Method: The Rlocation method in the Runfiles class has been updated to utilize the new _RepositoryMapping logic, handling the complexities of prefix-based lookups with proper precedence.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@jklukas
Copy link
Contributor Author

jklukas commented Sep 18, 2025

cc @fmeum for review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Bazel's --incompatible_compact_repo_mapping_manifest flag, which is a great feature for improving performance with bzlmod. The implementation is solid and includes a comprehensive set of tests. My review includes suggestions for a performance optimization in the new mapping logic, a fix for an outdated docstring, and some refactoring opportunities in both the implementation and the tests to improve maintainability and readability.

jklukas and others added 4 commits September 18, 2025 17:09
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

assert (
source_repo is not None
), "BUG: if the `source_repo` is None, we should never go past the `if` statement above"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This assertion may still be important. I need to reassess above whether we need to match the previous conditional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an equivalent assertion is needed with the new logic, but I'm open to more thoughts on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the assert. I'm not entirely sure if we have test coverage of all the cases.

Looking at the code, I don't see why the comment wouldn't still apply. source_repo can end up None here if the repo mapping is empty. (which, according to comments, occurs for workspace mode)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored.

Copy link
Collaborator

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to push some changes, but noticed that "maintainer can modify" isn't set.

Mostly LGTM, but please remove all the Mapping API compatibility code. As an internal class, there's no need for it to support the wide range of dict operations.

return _RepositoryMapping(exact_mappings, prefixed_mappings)

# Mapping protocol implementation
def __getitem__(self, key: Tuple[str, str]) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to e.g. get_canonical or similar, just not __getitem__. This is an internal custom class, so there's not need to add the abstraction or complications of making it look dict-like.


assert (
source_repo is not None
), "BUG: if the `source_repo` is None, we should never go past the `if` statement above"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the assert. I'm not entirely sure if we have test coverage of all the cases.

Looking at the code, I don't see why the comment wouldn't still apply. source_repo can end up None here if the repo mapping is empty. (which, according to comments, occurs for workspace mode)

@jklukas jklukas requested a review from rickeylev September 22, 2025 13:53
@jklukas
Copy link
Contributor Author

jklukas commented Sep 22, 2025

Thank you for the quick review! I believe all comments are now addressed. I removed the Mapping compat; now we just have a lookup method and an is_empty method as needed for the rest of the logic.

@jklukas
Copy link
Contributor Author

jklukas commented Sep 22, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Bazel's --incompatible_compact_repo_mapping_manifest flag, which uses prefix-based repository mappings. The implementation introduces a _RepositoryMapping class to handle both exact and new prefix-based mappings, which is a clean way to encapsulate the logic. The accompanying tests are thorough and cover various scenarios including precedence and ordering.

I've found a couple of areas for improvement. First, in Rlocation, there's an opportunity to refactor the logic to avoid a redundant lookup and remove some unreachable code, making it more efficient and maintainable. Second, in _RepositoryMapping.create_from_file, adding a check for empty lines in the manifest file would make the parsing more robust.

Overall, this is a great addition that addresses an important performance issue with large dependency graphs in Bazel.

Comment on lines +89 to +98
for line in content.splitlines():
source_canonical, target_apparent, target_canonical = line.split(",")
if source_canonical.endswith("*"):
# This is a prefixed mapping - remove the '*' for prefix matching
prefix = source_canonical[:-1]
prefixed_mappings[(prefix, target_apparent)] = target_canonical
else:
# This is an exact mapping
exact_mappings[(source_canonical, target_apparent)] = target_canonical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop here doesn't account for empty or whitespace-only lines that content.splitlines() might produce. This could lead to a ValueError when trying to unpack the result of line.split(',') on the next line. It's safer to strip each line and skip it if it's empty.

Suggested change
for line in content.splitlines():
source_canonical, target_apparent, target_canonical = line.split(",")
if source_canonical.endswith("*"):
# This is a prefixed mapping - remove the '*' for prefix matching
prefix = source_canonical[:-1]
prefixed_mappings[(prefix, target_apparent)] = target_canonical
else:
# This is an exact mapping
exact_mappings[(source_canonical, target_apparent)] = target_canonical
for line in content.splitlines():
line = line.strip()
if not line:
continue
source_canonical, target_apparent, target_canonical = line.split(",")
if source_canonical.endswith("*"):
# This is a prefixed mapping - remove the '*' for prefix matching
prefix = source_canonical[:-1]
prefixed_mappings[(prefix, target_apparent)] = target_canonical
else:
# This is an exact mapping
exact_mappings[(source_canonical, target_apparent)] = target_canonical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to error out anyway if repo mappings diverged from expected format in the ways described here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support incompatible_compact_repo_mapping_manifest in runfiles library
2 participants