Skip to content

HTML attributes with double quotes in them break parsing #37

@dirtside

Description

@dirtside

Given this HTML:

<a title="This is a "test" of double quotes" href="http://www.example.com">Hello</a>

When passed into Dom::load(), the parser ends up correctly finding the element, but misparses the attributes and body text. The attributes (from var_dump($element->getAttributes())) appear like so:

array(1) { ["title"]=> string(10) "This is a " }

and the body appears like so (from var_dump($element->text())):

string(58) "est" of double quotes" href="http://www.example.com">Hello"

I realize that putting double quotes inside an attribute is noncomformant to HTML, but ideally PHPHtmlParser should be tolerant of such things and parse the element anyway, much in the way web browsers do. While it may be impossible to accurately determine what the intended title attribute's correct value is, it should be possible to ensure that the element text does not include content from before the > marker.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions