Skip to content

Conversation

facelessuser
Copy link
Collaborator

Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.

Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.
@facelessuser
Copy link
Collaborator Author

Just occurred to me that abbreviations and footnotes should be checked for this kind of breakage and fixed if it occurs. I will verify and patch those if necessary in this pull as well.

Peserve abbreviation line when stripping and preserve a line for each footnote block.  Footnotes should also accumulate the extraneous padding.
@facelessuser
Copy link
Collaborator Author

facelessuser commented Oct 7, 2017

Abbreviations and footnotes are now handled. Footnotes were a little special as we had to preserve a line for each block. We also had to account for unnecessary trailing empty lines. Now that I say it out loud, I should get a test in for trailing empty lines as well....

@facelessuser
Copy link
Collaborator Author

Looks like footnotes changed in the tests slightly. Ugh, forgot to run the tests one more time...

When processing footnotes, we don't actually care to process the extra whitespace at the end of a footnote, but we want it to calculate lines to preserve.
@facelessuser
Copy link
Collaborator Author

Tests are now passing.

@waylan
Copy link
Member

waylan commented Oct 7, 2017

This looks good. One question though: why not just modify the raw HTML processor so that this didn’t matter? To be clear, I haven’t looked into it myself. It just seems like that might have been the first approach I would have explored. Maybe there’s a good reason?

@facelessuser
Copy link
Collaborator Author

One question though: why not just modify the raw HTML processor so that this didn’t matter?

I am open to suggestions, but the reason this issue occurs is that the raw HTML preprocessor is not aware of references. It parses the blocks and such first populating the tag_data. Then the footnote, abbr, and link reference preprocessor gets run and then removes entire blocks from the preprocessed file. Then the raw HTML parser references indexes it thought was good that are no longer good.

At least with abbr and link references, maybe you could post process the tag data after the references get stripped and rebuild them proper. With the footnote references, the extension actually utilizes the tag_data when constructing the footnote which adds even more complications.

Honestly, its a messy situations and this was the easiest way to solve the issue. Maybe there is a "better" way to approach it, but this seemed like the least invasive approach. Maybe if I spent more time getting to understand the raw HTML parser another cleaner (more involved) approach may make itself manifest.

Any ideas?

@facelessuser
Copy link
Collaborator Author

I'm not in a hurry to get this merged. I can mull over this more and see if there is a better way. We can consider this a first draft. If we can't come up with something better, this may be decent stopgap solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants