Skip to content

$dom->load's automatic behaviour is a dangerous vulnerability #155

@donatj

Description

@donatj

The behaviour of Dom's load method is a serious security risk - particularly when accepting any sort of input from end users.

Nothing from an untrusted source should ever be loaded with Dom's load method as it can very easily be abused to expose sensitive file system information or trigger remote server calls.

The given examples do not do a good enough job of making this fact clear.

Examples

For example if one is using the parser to clean up rich text editor output, the abusive end user need only modify their POST field to be /etc/hosts and suddenly the content of their post is your hosts file.

Likewise the user could post things like http://192.168.1.1 to poke around the local network the machine itself is hosted in.

Searching GitHub I have found at least three cases of POST or GET data being fed directly into load. There are likely more as the library is quite popular and the danger is not made clear.

Resolution

I believe there should at the very least be a very strongly worded warning in the README as well as the method itselfs phpdocs noting this danger.

The only warning I see right now is about how load might give you trouble if your content is too long.

If the string is to long, depending on your file system, the load() method will throw a warning. If this happens you can just call the above method to bypass the is_file() check in the load() method.

I think that in and of itself is a codesmell.

While I can somewhat see the convenience of the load method, if it was my library I would argue the danger outweighs the convenience.

You should default to using a specific load method unless you have good reason the type might vary at runtime - I feel like is exceedingly rare.

Ideally I'd argue for removal or at the very least deprecation of load - I don't think it serves a valid purpose. That said however, at the very least you need a strongly worded warning about the dangers of using load on user provided input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions