-
Notifications
You must be signed in to change notification settings - Fork 886
Fix raw html reference issue #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.
Just occurred to me that abbreviations and footnotes should be checked for this kind of breakage and fixed if it occurs. I will verify and patch those if necessary in this pull as well. |
Peserve abbreviation line when stripping and preserve a line for each footnote block. Footnotes should also accumulate the extraneous padding.
Abbreviations and footnotes are now handled. Footnotes were a little special as we had to preserve a line for each block. We also had to account for unnecessary trailing empty lines. Now that I say it out loud, I should get a test in for trailing empty lines as well.... |
Looks like footnotes changed in the tests slightly. Ugh, forgot to run the tests one more time... |
When processing footnotes, we don't actually care to process the extra whitespace at the end of a footnote, but we want it to calculate lines to preserve.
Tests are now passing. |
This looks good. One question though: why not just modify the raw HTML processor so that this didn’t matter? To be clear, I haven’t looked into it myself. It just seems like that might have been the first approach I would have explored. Maybe there’s a good reason? |
I am open to suggestions, but the reason this issue occurs is that the raw HTML preprocessor is not aware of references. It parses the blocks and such first populating the tag_data. Then the footnote, abbr, and link reference preprocessor gets run and then removes entire blocks from the preprocessed file. Then the raw HTML parser references indexes it thought was good that are no longer good. At least with abbr and link references, maybe you could post process the tag data after the references get stripped and rebuild them proper. With the footnote references, the extension actually utilizes the tag_data when constructing the footnote which adds even more complications. Honestly, its a messy situations and this was the easiest way to solve the issue. Maybe there is a "better" way to approach it, but this seemed like the least invasive approach. Maybe if I spent more time getting to understand the raw HTML parser another cleaner (more involved) approach may make itself manifest. Any ideas? |
I'm not in a hurry to get this merged. I can mull over this more and see if there is a better way. We can consider this a first draft. If we can't come up with something better, this may be decent stopgap solution. |
Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.