fix(grammar): drop unit self-loop rules to prevent LALR/CYK infinite loop#1594
Open
JamBalaya56562 wants to merge 1 commit into
Open
fix(grammar): drop unit self-loop rules to prevent LALR/CYK infinite loop#1594JamBalaya56562 wants to merge 1 commit into
JamBalaya56562 wants to merge 1 commit into
Conversation
Fixes lark-parser#1585. EBNF expansion of `X*` inside an `X` rule produces a bare `X : X` alternative once the empty branch is folded in. That rule never consumes input, so when LALR's priority-based conflict resolution selects it the parser's reduce/shift loop spins forever; CYK's unit-rule elimination loops for the same reason. Without a priority annotation the same grammar surfaced as a Reduce/Reduce GrammarError. Filter such non-progressing self-references at rule construction so no backend ever sees them.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1585.
Summary
Lark('start.1: \"a\" | start start*', parser='lalr').parse('aa')(and the same withparser='cyk') hangs in an infinite loop. EBNF expansion ofX*inside anXrule produces a bareX : Xalternative once the empty branch is folded in. That rule never consumes input — when LALR's priority-based conflict resolution selects it, the reduce/shift loop spins forever; CYK's unit-rule elimination loops for the same reason. Without the priority annotation the same grammar surfaces as aReduce/Reduce GrammarErrorinstead.This PR filters such non-progressing self-references at rule construction so no backend ever sees them.
Reproduction / verification
Reproduced and verified fixed via vivarium.
Test plan
test_issue_1585_priority_recursive_star(Earley + LALR matrix intest_parser.py)test_issue_1585_cykintest_grammar.pytest_alias_in_terminalupdated:a: a | \"a\"now parses cleanly underlalrandearleyinstead of raisingGrammarError; the pure non-progressing casea: ais preserved as theGrammarErrorexample.Implementation notes
The single-line fix lives in
lark/load_grammar.pyinside_compile_rule(around the expansion loop):We skip the alternative entirely rather than try to rewrite it, because such an alternative is semantically equivalent to "match nothing extra and recurse," which contributes no language. Dropping it preserves all other alternatives, including the recursive cases produced by
X*once the empty branch is removed.Why this surfaces only for LALR and CYK:
start.1, priority-based conflict resolution picks the unit self-rule on shift/reduce ambiguity and the parser loops. Without the priority, the conflict is unresolved and the grammar is rejected at construction time (the oldGrammarError).X → Xas a self-edge and never terminates.🤖 Generated with Claude Code