Fix extraction of reference links in Markdown #1841

mre · 2025-09-05T22:45:14Z

Summary

Fixes #1657 by properly handling different types of Markdown reference links that were not being extracted by lychee:

Reference links: [text][ref]
Collapsed links: [text][]
Shortcut links: [text]

Root Cause

The issue was in the markdown extraction logic where reference link destinations were processed using extract_raw_uri_from_plaintext(), which relies on the linkify crate. Linkify only recognizes URLs with schemes (http://, https://, etc.) and ignores relative file paths like "target.md".

Solution

Enable footnote support (ENABLE_FOOTNOTES) to properly differentiate footnotes from reference links in pulldown-cmark
Add explicit handling to skip footnote events (FootnoteReference, FootnoteDefinition) since they're not links to check
Create RawUri directly for reference links to handle relative file paths that linkify doesn't recognize

This approach leverages pulldown-cmark's built-in semantic distinction between footnotes and reference links rather than using heuristics.

Test Plan

Added comprehensive test case covering all reference link types
Verified footnotes are properly ignored (existing test still passes)
All existing markdown extraction tests pass
Manual testing confirms all four link types are now extracted correctly

Verification

Before:

[link1](target1.md) ✅ extracted
[link2][ref2] ❌ not extracted  
[link3][] ❌ not extracted
[link4] ❌ not extracted

After:

[link1](target1.md) ✅ extracted
[link2][ref2] ✅ extracted
[link3][] ✅ extracted  
[link4] ✅ extracted

The fix ensures lychee can now properly validate all types of Markdown reference links while maintaining backward compatibility.

Resolves #1657 by properly handling different types of Markdown reference links: - Reference links: [text][ref] - Collapsed links: [text][] - Shortcut links: [text] The issue was that reference link destinations were processed using extract_raw_uri_from_plaintext(), which relies on the linkify crate that only recognizes URLs with schemes (http://, https://, etc.) and ignores relative file paths like "target.md". Solution: 1. Enable footnote support (ENABLE_FOOTNOTES) to properly differentiate footnotes from reference links in pulldown-cmark 2. Add explicit handling to skip footnote events (FootnoteReference, FootnoteDefinition) since they're not links to check 3. Create RawUri directly for reference links to handle relative file paths that linkify doesn't recognize This approach is semantically correct - it leverages pulldown-cmark's built-in distinction between footnotes and reference links rather than using heuristics. Tests added to verify all reference link types are extracted correctly while footnotes are properly ignored.

- Remove unnecessary hashes from raw string literal - Add allow annotations for explicit footnote event handling - Add allow annotation for function length (102 lines vs 100 limit)

thomas-zahner

That's great!

mre force-pushed the issue-1657 branch from 00af8fd to e81155e Compare September 5, 2025 22:46

Fix clippy warnings

5cf292d

- Remove unnecessary hashes from raw string literal - Add allow annotations for explicit footnote event handling - Add allow annotation for function length (102 lines vs 100 limit)

mre mentioned this pull request Sep 6, 2025

Cannot extract relative reference links in Markdown #1657

Closed

thomas-zahner approved these changes Sep 11, 2025

View reviewed changes

mre merged commit 67a1571 into master Sep 11, 2025
6 checks passed

mre deleted the issue-1657 branch September 11, 2025 22:35

mre mentioned this pull request Sep 11, 2025

chore: release v0.21.0 #1825

Closed

This was referenced Oct 21, 2025

chore: release v0.21.0 #1878

Closed

chore: release v0.21.0 #1879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix extraction of reference links in Markdown #1841

Fix extraction of reference links in Markdown #1841

Uh oh!

mre commented Sep 5, 2025

Uh oh!

thomas-zahner left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix extraction of reference links in Markdown #1841

Fix extraction of reference links in Markdown #1841

Uh oh!

Conversation

mre commented Sep 5, 2025

Summary

Root Cause

Solution

Test Plan

Verification

Uh oh!

thomas-zahner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants