Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix: add support for name attributes in HTML fragment extraction
Fixes fragment checking for JavaDoc-generated HTML which uses
<a name="anchor"> instead of id attributes for anchors.

This resolves a regression where lychee v0.20.1 was failing to find
fragments that worked in v0.18.1, particularly for JavaDoc URLs like:
- https://example.com/javadoc/Class.html#method--
- https://example.com/javadoc/Class.html#skip.navbar.top

The fix maintains backward compatibility by checking both 'id' and
'name' attributes when extracting fragments from HTML documents.

Resolves #1838
  • Loading branch information
mre committed Sep 5, 2025
commit 7572c09beb7282db3c862d941c60517b70194b7d
38 changes: 38 additions & 0 deletions lychee-lib/src/extract/html/html5gum.rs
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,12 @@ impl LinkExtractor {
self.fragments.insert(id.to_string());
}

// Also check for 'name' attributes for backward compatibility with older HTML
// standards and JavaDoc-generated HTML which uses <a name="anchor"> instead of id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/a#name or https://stackoverflow.com/a/484781

So apparently in HTML 4.01it both could be used. Theoretically, it's no longer valid HTML 5, but it's still used by some tools/sites because of historical reasons.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, neat. I've extended the documentation accordingly.

if let Some(name) = self.current_attributes.get("name") {
self.fragments.insert(name.to_string());
}

self.current_attributes.clear();
}
}
Expand Down Expand Up @@ -705,4 +711,36 @@ mod tests {
let uris = extract_html(input, false);
assert!(uris.is_empty());
}

#[test]
fn test_extract_fragments_with_name_attributes() {
// Test for JavaDoc-style name attributes used for anchors
let input = r#"
<html>
<body>
<h1 id="title">Title</h1>
<a name="skip.navbar.top"></a>
<a name="method.summary"></a>
<div>
<a name="clear--"></a>
<h2 id="section">Section</h2>
<a name="method.detail"></a>
</div>
<a name="skip.navbar.bottom"></a>
</body>
</html>
"#;

let expected = HashSet::from([
"title".to_string(),
"section".to_string(),
"skip.navbar.top".to_string(),
"method.summary".to_string(),
"clear--".to_string(),
"method.detail".to_string(),
"skip.navbar.bottom".to_string(),
]);
let actual = extract_html_fragments(input);
assert_eq!(actual, expected);
}
}
Loading