Skip to content

feat: add /check-golf bot verifying statements are unchanged#1355

Open
Vilin97 wants to merge 5 commits into
leanprover-community:masterfrom
Vilin97:check-golf-bot
Open

feat: add /check-golf bot verifying statements are unchanged#1355
Vilin97 wants to merge 5 commits into
leanprover-community:masterfrom
Vilin97:check-golf-bot

Conversation

@Vilin97

@Vilin97 Vilin97 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What

Adds a /check-golf bot that verifies a PR only golfs proofs: that no declaration statement (its signature/type) changed and only proofs and definition bodies changed. Comment /check-golf on any PR and the bot posts its findings as a single comment, editing that same comment on subsequent runs.

Files

  • scripts/check_golf.py — compares two revisions of the repo. For every named declaration in the changed .lean files it decides whether the statement was preserved and only the proof/body changed. It also renders the Markdown report and upserts it as a PR comment (POST first, PATCH the marked comment afterwards).
  • .github/workflows/check-golf.yml — triggers on an issue_comment containing /check-golf on a PR, resolves the merge-base and head, runs the script, and comments.
  • scripts/README.md — documents the tool and how to run it locally.

How it works

The Lean sources are parsed textually (no build required, so the check is fast):

  • comments and string literals are blanked, whitespace is ignored;
  • top-level declarations are segmented on column-0 command starts (robust even when a proof body contains brackets that cannot be balanced textually, e.g. a by block closed by indentation);
  • each statement is split from its body at the top-level := / where / equation | arm, so := in default binders and by blocks inside a type (⟨x, by omega⟩) do not confuse the split;
  • by proof terms embedded inside a type are treated as proof-irrelevant, so golfing them counts as a proof change, not a statement change.

Anonymous instances and examples are not tracked (they have no stable name).

Testing

Run locally against two revisions:

scripts/check_golf.py --base <merge-base> --head <head-sha>

Validated against a large real golf PR: the checker reports 0 statement changes across 312 changed files, and a name-level cross-check confirms no named declaration is dropped by the parser. The post→edit comment logic is covered by a mocked-API test.

Add scripts/check_golf.py and a check-golf GitHub workflow. When a
`/check-golf` comment is posted on a PR, the bot compares the PR's base
and head, checks that no declaration statement (signature/type) changed
and only proofs/bodies changed, and upserts a single findings comment
(posting it once, editing it on later runs).

The Lean sources are parsed textually (no build): comments and
whitespace are ignored, declarations are segmented on column-0 command
starts, each statement is split from its body at the top-level `:=`,
`where`, or equation `|` arm, and `by` proof terms embedded inside a
type are treated as proof-irrelevant.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Thank you for this PR, which will now be reviewed. If submitting to ./Physlib or ./QuantumInfo, please see our review guidelines if you are not familiar with the process. You should expect a back and forth with a reviewer before your PR is merged. See also that link for how to add appropriate labels to your PR. The PR will also go through a number of automated checks. You can learn more about these here, including how to run them locally.

If you are submitting to ./PhyslibAlpha there will be a lighter review process, though your PR must still pass the automated checks.

If you want to bring attention to this PR, please write a message on this thread of the Lean Zulip.

Important: If a reviewer adds an awaiting-author label to your PR, once you have addressed the review comments, please remove that label by adding a comment with -awaiting-author. This helps us keep track of reviews.

Vilin97 and others added 2 commits July 3, 2026 00:24
The report now also counts, among statement-preserving changes:
- proofs where only a newline was removed (identical tokens, fewer lines),
  which were previously not surfaced at all; and
- proofs where tactics were crammed onto one line with a `;` (excluding the
  `<;>` combinator), with the number of joins introduced.

Also relabel "definition bodies changed" to "definition values changed" and
spell out in the footer that it means a def/instance/abbrev whose type is
unchanged but whose defining term changed.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
A `def`/`instance`/`abbrev` often carries proof obligations discharged by
`by` blocks. Golfing only those does not change the definition's data, so it
should not read as "the definition changed". The report now masks a
definition's `by` proof blocks and splits its changes into:
- "definition proofs golfed" (data unchanged), and
- "definition values changed" (the defining term changed outside proofs).

On PR leanprover-community#1332 this moves 76 of the 110 former "definition body" changes into
the proof-only bucket, leaving 34 genuine value changes.

Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@jstoobysmith

Copy link
Copy Markdown
Member

Just checking: This python file is generated from scratch or is based on something that already exists (if so we should probably reference it)? Either way it looks great - and I think it could act as the base of a lot of cool automated commands we could include. Going through it now to understand it better

Vilin97 and others added 2 commits July 3, 2026 10:16
A `def foo : T := by refine …` builds its *value* with tactics, so a
change inside that `by` block can change the definition itself, not just
a discharged proof obligation. Masking it as a proof made such edits
look like harmless proof golfs.

Only classify a changed def body as "proof golfed" when the term outside
`by` proof blocks is identical AND the body is not itself a single `by`
block (mask != "<by>"). Otherwise report it as a definition-value change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
…ve changed)"

The bucket mixes genuine defining-term changes with defs where only proof
parts changed but that can't be proven textually — e.g. a `def := by
refine … ?_` whose data is untouched, or a structure-instance whose proof
fields were rewritten from tactic to term mode. Asserting "data changed"
over-claims those. Relabel to "Definition bodies changed (data *may* have
changed, type unchanged)" and add an explanatory note; the high-confidence
"Definition proofs golfed (data unchanged)" bucket is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
@Vilin97

Vilin97 commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

The script was produced by Opus 4.8 from this prompt:

Details /goal make a script such that when the comment /check-golf is posted checks that the statements did not change, and only the proofs changed. it should post a comment to the PR with its findings, and if the comment is posted again, it should edit the message. test your script first, then open a PR, and make it pass CI. After that, open the second PR where you modify the script to also change the compile time and heartbeats. For both PRs, report the corresponding findings for PR https://github.com//pull/1332. Work in a new worktree.

Just making sure that you saw the second PR adding heartbeat computation to /check-golf.

@Vilin97

Vilin97 commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Lmk when it's good to merge or if you want me to change anything. And likewise for the PR measuring heartbeats: #1356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants