Platform/HTML5 sanitizer
From MozillaWiki
< Platform
Gecko Requirements
- Allow a setting for enabling styles.
- Allow a setting for enabling comments. See bug 572642
- Or always enable comments? (What about "--" in comments?)
-
Have three element white lists: HTML, SVG and MathML.- This turns out to lead to a lot of complexity without clear benefit.
- Have three attribute white lists: HTML, SVG and MathML. The attributes don't depend on the element they are on beyond the element namespace.
- XXX: Figure out what the requirements are for attributes starting with data- or _.
- Have three lists of attributes that take URLs. Drop the attributes when they have prohibited URLs (after trimming whitespace from the value).
- Resolve relative URLs into absolute ones using a per fragment base URL. (Is this correct for Gecko reqs? Current code uses the node's base URI. Is that right?)
- However, allow any URL in the src attribute on the img element, because imgs are safe. bug 572637
- Have a list of SVG attributes that take different-document references.
- Have a list of SVG attributes that are allowed to have same-document references only.
- If styles are allowed, sanitize style attribute values. If styles aren't allowed, drop the style attribute.
- Always drop script and title elements and their contents.
- If styles are disabled, drop style elements and their contents.
- If styles are enabled, sanitize the content of style elements.
- Add the controls attribute to the video and audio elements (if it isn't there already).
Open Questions
- Can stylistic SVG attributes have values that need to be sanitized?
- Should Semantic MathML be on the white list for clipboard round-tripping? (Mainly a footprint issue.)
- Is it dangerous for SVG fragment id references to be able to refer to an id in the document the untrusted fragment gets inserted into?
- What to do about microdata?
Non-Gecko Requirements
These are features for the HTML5 parser when it is used outside Gecko.
- Allow form-related elements to be toggled on and off in the white list.
- Allow using the sanitizer in non-fragment mode (in which case, the title element should be allowed).
- Are there compelling use cases for non-fragment mode sanitization?
- Have a configurable white list of permitted URL schemes in attributes that take URLs.