Skip to content
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 97 additions & 12 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ text: report a warning to the console; type: dfn; url: https://console.spec.what
"href": "https://www.bucksch.org/1/projects/mozilla/108153/",
"title": "HTML Sanitizer",
"publisher": "Ben Bucksch"
},
"XSS": {
"href": "https://developer.mozilla.org/de/docs/Web/Security/Attacks/XSS",
"title": "Cross-Site Scripting",
"publisher": "MDN Web Docs"
}
}
</pre>
Expand Down Expand Up @@ -1452,11 +1457,17 @@ URLs, is as follows:

# Security Considerations # {#security-considerations}

The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting
by traversing a supplied HTML content and removing elements and attributes
according to a configuration. The specified API must not support
the construction of a Sanitizer object that leaves script-capable markup in
and doing so would be a bug in the threat model.
The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting [[XSS]]
by traversing supplied HTML content and removing elements and attributes
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that sanitzation happens before inserting. E.g.:

[...] by traversing supplied HTML content, after parsing but before insertion into any document, and removing [...]

I think this is obvious to us, but I think a lot of people haven't entirely thought through the "when" aspect of sanitiaztion.

according to a configuration. The Sanitizer API ships a strict baseline,
such that scripting can never be allowed (cf. [[#never-allowed]]).
In addition to this, a default safe list includes further restrictions that may help
prevent a wide range of undesirable effects. These include, elements that can override
site-specific settings (e.g., `<meta>`), embed third-party content or change the
layout of the page (e.g., `<img>`, `<style>`, `aria-` attributes).
Comment thread
mozfreddyb marked this conversation as resolved.
Outdated
In addition, we disallow element and attribute configurations that lead to outgoing HTTP
requests, and form controls. A detailed list is given in [[#default-disallowed-elements]].
Comment thread
mozfreddyb marked this conversation as resolved.


That being said, there are security issues which the correct usage of the
Sanitizer API will not be able to protect against and the scenarios will be
Expand All @@ -1466,8 +1477,7 @@ laid out in the following sections.

<em>This section is not normative.</em>

The Sanitizer API operates solely in the DOM and adds a capability to traverse
and filter an existing DocumentFragment. The Sanitizer does not address
The Sanitizer API operates solely in the browser. The Sanitizer does not address
server-side reflected or stored XSS.

## DOM clobbering ## {#dom-clobbering}
Expand All @@ -1479,8 +1489,8 @@ application by naming elements through `id` or `name` attributes such that
properties like `children` of an HTML element in the DOM are overshadowed by
the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove `id` and `name` attributes.
The Sanitizer API disallows the `id` and `name` attributes by default, but
Comment thread
mozfreddyb marked this conversation as resolved.
can be configured to allow them if desired.

## XSS with Script gadgets ## {#script-gadgets}

Expand Down Expand Up @@ -1512,13 +1522,13 @@ into a different parent element. An example for carrying out such an attack
is by relying on the change of parsing behavior for foreign content or
mis-nested tags.

The Sanitizer API offers only functions that turn a string into a node tree.
The context is supplied implicitly by all sanitizer functions:
The Sanitizer API offers functions that combine parsing, sanitization and insertion.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very true. I wonder if it bears saying at the top of this section, in the paragraph "The Sanitizer API is intended to [...]". Maybe instead of the comment I've proposed there.

As such, the parsing context is supplied implicitly by all sanitizer functions:
`Element.setHTML()` uses the current element; `Document.parseHTML()` creates a
new document. Therefore Sanitizer API is not directly affected by mutated XSS.

If a developer were to retrieve a sanitized node tree as a string, e.g. via
`.innerHTML`, and to then parse it again then mutated XSS may occur.
`.innerHTML`, to only parse or insert it again, mutated XSS may still occur.
We discourage this practice. If processing or passing of HTML as a
string should be necessary after all, then any string should be considered
untrusted and should be sanitized (again) when inserting it into the DOM. In
Expand All @@ -1527,6 +1537,81 @@ longer be considered as sanitized.

A more complete treatment of mXSS can be found in [[MXSS]].


# Index

<em>This section is not normative.</em>

## Elements and attributes not allowed in the default config ## {#default-disallowed-elements}

While the Sanitizer aims to disallow Cross-Site Scripting attacks [[XSS]], the default
settings are a bit stricter and do not include elements that may change page
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a bit" feels a bit off here. Maybe "the default settings are purposefully stricter"

settings, layout or fetch resources from other origins. The following elements
Copy link
Copy Markdown
Collaborator

@evilpie evilpie Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't/can't really differentiate between fetching resources from same vs. other origins, so this seems too specific.

can be allowed with a configuration, but the individual reasoning for disallowing
them by default is given below.
Comment thread
mozfreddyb marked this conversation as resolved.

See also: [[#security-considerations]].


## Elements that are **never** allowed ## {#never-allowed}

<table>
<caption>List of unsafe and disallowed elements</caption>
<thead>
<tr>
<th> Element
<th> Namespace
<th> Description
<tbody>

<tr>
<td><code>embed</code></td>
<td>http://www.w3.org/1999/xhtml</td>
<td>This element is disallowed because it can include other, untrusted documents and including scripts.</td>
</tr>

<tr>
<td><code>frame</code></td>
<td>http://www.w3.org/1999/xhtml</td>
<td>This element is disallowed because it can include other, untrusted documents and including scripts.</td>
</tr>

<tr>
<td><code>iframe</code></td>
<td>http://www.w3.org/1999/xhtml</td>
<td>This element is disallowed because it can include other, untrusted documents and including scripts.</td>
</tr>

<tr>
<td><code>object</code></td>
<td>http://www.w3.org/1999/xhtml</td>
<td>This element is disallowed because it can include other, untrusted documents and including scripts.</td>
Copy link
Copy Markdown
Collaborator

@evilpie evilpie Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"including scripts" doesn't seem grammatically correct to me.

</tr>

<tr>
<td><code>script</code></td>
<td>http://www.w3.org/1999/xhtml</td>
<td>This element is disallowed because it executes scripts.</td>
</tr>

<tr>
<td><code>script</code></td>
<td>http://www.w3.org/2000/svg</td>
<td>This element is disallowed because it executes scripts.</td>
</tr>

<tr>
<td><code>use</code></td>
<td>http://www.w3.org/2000/svg</td>
<td>This element is disallowed because it can include other, untrusted documents and including scripts.</td>
</tr>
</table>


## Event Handler Content Attributes

All known [=event handler content attributes=] are disallowed, as they execute script.

# Acknowledgements # {#ack}

This work is informed and inspired by [[DOMPURIFY]] from cure53,
Expand Down