Skip to content

fix(grammar): drop unit self-loop rules to prevent LALR/CYK infinite loop#1594

Open
JamBalaya56562 wants to merge 1 commit into
lark-parser:masterfrom
JamBalaya56562:fix/issue-1585-unit-self-loop
Open

fix(grammar): drop unit self-loop rules to prevent LALR/CYK infinite loop#1594
JamBalaya56562 wants to merge 1 commit into
lark-parser:masterfrom
JamBalaya56562:fix/issue-1585-unit-self-loop

Conversation

@JamBalaya56562
Copy link
Copy Markdown

Fixes #1585.

Summary

Lark('start.1: \"a\" | start start*', parser='lalr').parse('aa') (and the same with parser='cyk') hangs in an infinite loop. EBNF expansion of X* inside an X rule produces a bare X : X alternative once the empty branch is folded in. That rule never consumes input — when LALR's priority-based conflict resolution selects it, the reduce/shift loop spins forever; CYK's unit-rule elimination loops for the same reason. Without the priority annotation the same grammar surfaces as a Reduce/Reduce GrammarError instead.

This PR filters such non-progressing self-references at rule construction so no backend ever sees them.

Reproduction / verification

Reproduced and verified fixed via vivarium.

Test plan

  • New regression test test_issue_1585_priority_recursive_star (Earley + LALR matrix in test_parser.py)
  • New regression test test_issue_1585_cyk in test_grammar.py
  • Existing test_alias_in_terminal updated: a: a | \"a\" now parses cleanly under lalr and earley instead of raising GrammarError; the pure non-progressing case a: a is preserved as the GrammarError example.
Implementation notes

The single-line fix lives in lark/load_grammar.py inside _compile_rule (around the expansion loop):

if len(expansion) == 1 and expansion[0] == NonTerminal(name):
    continue

We skip the alternative entirely rather than try to rewrite it, because such an alternative is semantically equivalent to "match nothing extra and recurse," which contributes no language. Dropping it preserves all other alternatives, including the recursive cases produced by X* once the empty branch is removed.

Why this surfaces only for LALR and CYK:

  • LALR: with start.1, priority-based conflict resolution picks the unit self-rule on shift/reduce ambiguity and the parser loops. Without the priority, the conflict is unresolved and the grammar is rejected at construction time (the old GrammarError).
  • CYK: the unit-production elimination pass treats X → X as a self-edge and never terminates.
  • Earley happens to tolerate it because its chart bookkeeping deduplicates items, so the issue was previously masked there.

🤖 Generated with Claude Code

Fixes lark-parser#1585. EBNF expansion of `X*` inside an `X` rule produces a bare
`X : X` alternative once the empty branch is folded in. That rule never
consumes input, so when LALR's priority-based conflict resolution
selects it the parser's reduce/shift loop spins forever; CYK's
unit-rule elimination loops for the same reason. Without a priority
annotation the same grammar surfaced as a Reduce/Reduce GrammarError.

Filter such non-progressing self-references at rule construction so no
backend ever sees them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Infinite loop in lalr and cyk

2 participants