-
-
Notifications
You must be signed in to change notification settings - Fork 469
Description
Given this HTML:
<a title="This is a "test" of double quotes" href="http://www.example.com">Hello</a>
When passed into Dom::load(), the parser ends up correctly finding the element, but misparses the attributes and body text. The attributes (from var_dump($element->getAttributes())) appear like so:
array(1) { ["title"]=> string(10) "This is a " }
and the body appears like so (from var_dump($element->text())):
string(58) "est" of double quotes" href="http://www.example.com">Hello"
I realize that putting double quotes inside an attribute is noncomformant to HTML, but ideally PHPHtmlParser should be tolerant of such things and parse the element anyway, much in the way web browsers do. While it may be impossible to accurately determine what the intended title attribute's correct value is, it should be possible to ensure that the element text does not include content from before the > marker.