feat: add /check-golf bot verifying statements are unchanged#1355
feat: add /check-golf bot verifying statements are unchanged#1355Vilin97 wants to merge 5 commits into
Conversation
Add scripts/check_golf.py and a check-golf GitHub workflow. When a `/check-golf` comment is posted on a PR, the bot compares the PR's base and head, checks that no declaration statement (signature/type) changed and only proofs/bodies changed, and upserts a single findings comment (posting it once, editing it on later runs). The Lean sources are parsed textually (no build): comments and whitespace are ignored, declarations are segmented on column-0 command starts, each statement is split from its body at the top-level `:=`, `where`, or equation `|` arm, and `by` proof terms embedded inside a type are treated as proof-irrelevant. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
|
Thank you for this PR, which will now be reviewed. If submitting to ./Physlib or ./QuantumInfo, please see our review guidelines if you are not familiar with the process. You should expect a back and forth with a reviewer before your PR is merged. See also that link for how to add appropriate labels to your PR. The PR will also go through a number of automated checks. You can learn more about these here, including how to run them locally. If you are submitting to ./PhyslibAlpha there will be a lighter review process, though your PR must still pass the automated checks. If you want to bring attention to this PR, please write a message on this thread of the Lean Zulip. Important: If a reviewer adds an |
The report now also counts, among statement-preserving changes: - proofs where only a newline was removed (identical tokens, fewer lines), which were previously not surfaced at all; and - proofs where tactics were crammed onto one line with a `;` (excluding the `<;>` combinator), with the number of joins introduced. Also relabel "definition bodies changed" to "definition values changed" and spell out in the footer that it means a def/instance/abbrev whose type is unchanged but whose defining term changed. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
A `def`/`instance`/`abbrev` often carries proof obligations discharged by `by` blocks. Golfing only those does not change the definition's data, so it should not read as "the definition changed". The report now masks a definition's `by` proof blocks and splits its changes into: - "definition proofs golfed" (data unchanged), and - "definition values changed" (the defining term changed outside proofs). On PR leanprover-community#1332 this moves 76 of the 110 former "definition body" changes into the proof-only bucket, leaving 34 genuine value changes. Co-authored-by: Claude Opus 4.8 <no-reply+claude-opus-4-8@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
|
Just checking: This python file is generated from scratch or is based on something that already exists (if so we should probably reference it)? Either way it looks great - and I think it could act as the base of a lot of cool automated commands we could include. Going through it now to understand it better |
A `def foo : T := by refine …` builds its *value* with tactics, so a change inside that `by` block can change the definition itself, not just a discharged proof obligation. Masking it as a proof made such edits look like harmless proof golfs. Only classify a changed def body as "proof golfed" when the term outside `by` proof blocks is identical AND the body is not itself a single `by` block (mask != "<by>"). Otherwise report it as a definition-value change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
…ve changed)" The bucket mixes genuine defining-term changes with defs where only proof parts changed but that can't be proven textually — e.g. a `def := by refine … ?_` whose data is untouched, or a structure-instance whose proof fields were rewritten from tactic to term mode. Asserting "data changed" over-claims those. Relabel to "Definition bodies changed (data *may* have changed, type unchanged)" and add an explanatory note; the high-confidence "Definition proofs golfed (data unchanged)" bucket is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vjiv6nckrp5cJT9rpgZaKT
|
The script was produced by Opus 4.8 from this prompt: Details/goal make a script such that when the comment /check-golf is posted checks that the statements did not change, and only the proofs changed. it should post a comment to the PR with its findings, and if the comment is posted again, it should edit the message. test your script first, then open a PR, and make it pass CI. After that, open the second PR where you modify the script to also change the compile time and heartbeats. For both PRs, report the corresponding findings for PR https://github.com//pull/1332. Work in a new worktree.Just making sure that you saw the second PR adding heartbeat computation to |
|
Lmk when it's good to merge or if you want me to change anything. And likewise for the PR measuring heartbeats: #1356 |
What
Adds a
/check-golfbot that verifies a PR only golfs proofs: that no declaration statement (its signature/type) changed and only proofs and definition bodies changed. Comment/check-golfon any PR and the bot posts its findings as a single comment, editing that same comment on subsequent runs.Files
scripts/check_golf.py— compares two revisions of the repo. For every named declaration in the changed.leanfiles it decides whether the statement was preserved and only the proof/body changed. It also renders the Markdown report and upserts it as a PR comment (POST first, PATCH the marked comment afterwards)..github/workflows/check-golf.yml— triggers on anissue_commentcontaining/check-golfon a PR, resolves the merge-base and head, runs the script, and comments.scripts/README.md— documents the tool and how to run it locally.How it works
The Lean sources are parsed textually (no build required, so the check is fast):
byblock closed by indentation);:=/where/ equation|arm, so:=in default binders andbyblocks inside a type (⟨x, by omega⟩) do not confuse the split;byproof terms embedded inside a type are treated as proof-irrelevant, so golfing them counts as a proof change, not a statement change.Anonymous instances and
examples are not tracked (they have no stable name).Testing
Run locally against two revisions:
Validated against a large real golf PR: the checker reports 0 statement changes across 312 changed files, and a name-level cross-check confirms no named declaration is dropped by the parser. The post→edit comment logic is covered by a mocked-API test.