SGML

HTML200129 DTD Reference

W3C HTML5(http://sgmljs.net/schemas/sgml-cms/w3c/html5.dtd)
DTD for W3C HTML 5 (deprecated); while the DTD itself is deprecated, the text describes the construction of the HTML 5 DTD in detail; see later versions for important revisions
W3C HTML5.1(http://sgmljs.net/schemas/sgml-cms/w3c/html51.dtd) (http://sgmljs.net/schemas/sgml-cms/w3c/html51mini.dtd)
Full DTD and Minimal DTD for W3C HTML 5.1 (superseded by HTML 5.2)
W3C HTML5.2 (http://sgmljs.net/schemas/sgml-cms/w3c/html52.dtd) (http://sgmljs.net/schemas/sgml-cms/w3c/html52mini.dtd)
Full DTD and Minimal DTD for W3C HTML 5.2 (superseded by HTML RD 200129)
HTML Review Draft 200129 (http://sgmljs.net/schemas/sgml-cms/w3c/html200129.dtd) (http://sgmljs.net/schemas/sgml-cms/w3c/html200129mini.dtd)
Full DTD and Minimal DTD for HTML Review Draft 200129; note the Minimal DTD is the declaration set resolved via the 'about:legacy-compat' system identifier in sgmljs.net SGML
HTML Review Draft 230116 (http://sgmljs.net/schemas/sgml-cms/w3c/html230116.dtd) (http://sgmljs.net/schemas/sgml-cms/w3c/html230116mini.dtd)
Full DTD and Minimal DTD for HTML Review Draft 230116 (experimental)
Note: the W3C HTML 5 series DTDs are deprecated and superseded by the HTML Review Draft (200129 and newer) DTDs. The current HTML RD 200129 Minimal DTD assumes SGML IMPLYDEF ELEMENT ANYOTHER behaviour with respect to undeclared elements as defined in ISO/IEC 8879:1986/Cor.2:1999(E) in support of SVG and MathML foreign vocabularies and custom elements. While IMPLYDEF ELEMENT ANYOTHER is supported by sgmljs.net SGML, it might not be by other SGML software such as OpenSP. If a minimal DTD for use with OpenSP is desired, use the legacy Minimal HTML 5.1 DTD. Note this only affects the minimal but not full DTD variants.

Overview

The Full WHATWG HTML RD2001 DTD, like former versions, is a transcription of WHATWG's HTML Review Draft specification prose published January 29th, 2020, into an SGML DTD. The Full DTD covers all elements of HTML, SVG, MathML, and the ARIA attributes, and its construction is described in the reference for the W3C HTML 5 DTD, with only modifications for the current version described in this document.

The Minimal WHATWG HTML RD2001 DTD, also like former versions, is a compact DTD containing only essential parsing rules for HTML. As only HTML's special rules for HTML void elements and enumerated attributes are included (others being admitted freely), the Minimal WHATWG HTML's DTD usefulness for validation purposes is limited. Instead, the purpose of the Minimal HTML DTD is to provide a minimal bundled declaration set for content parsing and production tasks for modern and idiomatic HTML in sgmljs.net and other SGML software with support for resolving declaration sets via catalog resolution (in sgmljs.net, the Minimal HTML DTD is resolved and accessed by the about:legacy-compat system identifier).

This DTD is based on HTML review draft 20-01 published as a W3C recommendation on January 28, 2021, which is the first (and, so far, only) W3C recommendation based on a WHATWG HTML review draft, and the first W3C recommendation since 2017.

Update for HTML Review Draft 200129

Apart from a larger set of small changes to be expected for the first revision since years as explained below, Review Draft 200229, accepted as W3C HTML recommendation, is also the first W3C HTML specification published under the Memorandum of Understanding between WHATWG and W3C which prevents W3C from directly redacting specification text. As such, HTML Review Draft 200219 sees notable change in two long standing issues where upstream (WHATWG) HTML specification text was accepted when it was explicitly rejected in previous W3C versions despite lack of material change:

  • multiple main elements are allowed, reflected by the nav, article, and aside content models now not forbidding main descendant content

  • hgroup was included in W3C HTML for the first time; note in WHATWG HTML, hgroup, as orginally introduced for hiding headings having multiple ranks from the so-called HTML 5 outlining algorithm to prevent inference of undesired sections, had been deprecated for many years, even though its content model specification hasn't changed (which has been the the reason of the W3C editors for not including it); hgroup's content model is only changed in the upcoming Review Draft 2023

The changes are detailed in the following sections.

Elements

Added the hgroup and slot elements (complementing the template element already part of previous HTML specifications).

The hgroup, meta, and slot elements were added to the flow content category and parameter entity; meta and slot were also added to the phrasing category and parameter entity, resp., while hgroup was added to the flow_only and the heading parameter entities.

A menu element has been re-introduced with changed content rules and semantics; it is being listed under grouping elements, and has been removed as a legacy element. Note the menuitem element that used to be part of the original menu content model isn't anymore used at all but remains present as a legacy element since it admits end-element tag omission.

The style element has been removed from the flow content category, reflecting final abandon of the scoped CSS concept in HTML specs.

img and object, and the legacy keygen element, have been made member of the interactive content category.

Changed content models or inclusion or exclusion constraints of the article, nav, aside, header, footer´,p,figure, ruby,legend, andcanvas` elements.

Note that heading elements as content of legend elements were valid before Review Draft 200129, and are valid in current WHATWG specifications again, hence their disallowance in Review Draft 200129 can be considered erratic. To use the declaration of eg. Review Draft 230116 instead, you can place the following markup declarations into the internal subset:

<!ENTITY html.legend.element "IGNORE">
<!ELEMENT legend - - (#PCDATA|%phrasing;|%heading;)* -(main)>

Retained the rb and rtc elements (removed from the specification but allowing tag omission) as legacy elements.

The address element now appears under sectioning when it would formerly be listed under grouping cortent.

Global attributes

Added event handler attributes onformdata, oncopy, oncontextmenu, oncut, onpaste, onformdata, onsecuritypolicyviolation, onslotchange, and onscrollend.

Removed event handler attributes onabort, onloadend, and onshow.

Added global attributes enterkeyhint, inputmode, is, itemid, itemprop, itemref, itemscope, itemtype, and slot. Also added nonce as global attribute where it used to be declared for specific elements in previous DTDs.

Added body event handler attribute onmessageerror.

Note that the contenteditable, the hidden, the spellcheck, and the translate global attributes can have the empty string as value even though the HTML spec advises to not specifying the attribute in these cases in the first place. This is not reflected in the SGML DTD.

The same is true of the Fetch API destination (as) (cf Section 4.2.4) and the CORS settings (crossorigin) attributes (defined by the Fetch Spec and the referrer policy (referrerpolicy) attribute (defined by the Referrer Policy spec). These two specifications have no versioning (not even equivalent to a Public Review Draft), nor other formal alignment with the HTML specification, and also contain wildly non-normative language, and thus, while their snapshot values at the time of publication can be conditionally included via parameter entities, aren't included in the HTML DTD by default.

Removed the rev attribute on the link and a elements.

Removed the longdesc attribute on the img element.

Removed the typemustmatch attribute on the object element.

Removed the hreflang attribute on the area element.

The autofocus element has been formally made applicable to all HTML elements in WHATWG HTML (section 6.6.7) where it was defined only in the context of form controls in previous revisions; this is reflected by promoting autofocus as global attribute.

Removed the border attribute on the table element.

Removed the charset attribute on the script element.

Added the usemap attribute on object element; note the usemap attribute is removed again in the next review draft (see object-usemap) along with content model changes.

Added the sizes, integrity, imagesrcset, imagesizes, as, and color attributes on the link element.

Added the ping attribute (as a CDATA attribute) on the a and area elements.

Added the decoding attribute to the img element.

Added the loading attribute to the iframe element.

Added the playsinline attribute to the video element.

Added the rel attribute to the form element.

Added the nomodule attribute to the script element.

Errata

The enumerated values for the http-equiv attribute (section 4.2.5.3) are now represented in the DTD.

Changed the width and height attributes (on the img, iframe, embed, object, video, and canvas elements and the width attribute on the input element) to have NUMBER declared value.

The attribute sandbox on the iframe element allows multiple space-separated values hence has been remodelled as having declared value NMTOKENS.

The enumerated values for the autocomplete attribute (section 4.10.3) and the type attribute on the input and button element are now represented in the DTD.

In previous DTDs, the ARIA role attribute wasn't actually declared (only attributes for ARIA states and properties were). This has been fixed. Note unlike role, the tabindex attribute is, and has always been, declared as part of HTML. Note this was fixed in the W3C HTML 5.2 DTD as well.

Moreover, the integration of ARIA has been changed such that declared attribute defaults for ARIA state and property attributes are customized to become #IMPLIED ie. have no material default value specified. This is in line with what's done with HTML attribute defaults where applicable, and due to the expectation that an SGML processor adds default values for attributes where those are declared, which is however in conflict with HTML's and ARIA's expectation that an attribute taking on its default value should be left unspecified. While this change isn't a fix per se, it has been applied to the previous HTML DTD (W3C HTML 5.2, but no prior versions) as well.

In previous versions, exclusion exceptions for the main element had been placed on div and legend elements when they should only apply to sectioning elements with explicit exclusion of main such as article, nav, and aside. Note main itself doesn't exclude main descendants in its content model. Note this fix has been applied to the previous HTML DTD (W3C HTML 5.2, but no prior versions) as well.

SVG2

The HTML Review Draft specification states that

User agents that implement SVG must implement the SVG 2 specification, and not any earlier revisisions.

The SVG working group at W3C hasn't published a formal specification for SVG 2 as language in the form of a DTD or RelaxNG grammar, like was done for previous versions. Moreover, the SVG 2 specification is at candidate recommendation stage at this time, and has been since 2018, reflecting uncertainty regarding whether proposed recommendation or recommendation status can be reached eventually, considering browser vendors have voiced interest in supporting very few conservative SVG 2 additions (such as for streamlining SVG/CSS integration), but not committed to new SVG 2 features as a whole, while continued existence of the SVG working group per its charter, and even W3C as its hosting organization isn't guaranteed.

In keeping with previous HTML 5.x DTDs, the (extremely modular) SVG 1.1 DTD is further extended for SVG 2, but only with those features that are also accepted and implemented for the SVG subset recognized by W3C's nu validator (the SVG RelaxNG grammar used internally by the nu validator is also derived from the SVG 1.1 DTD we're customizing here), up to changes made until May 25th, 2021. Specifically, the following customizations are applied:

  • add feDropShadow as element and filter primitive (in line with section 12.2.6.5's listing of feDropShadow among mapped camel-case element names for SVG; note feDropShadow technically was defined as part of SVG Filter Effect Module Level 1, hence as part of SVG 1.* rather than SVG 2

  • additional enumerated values for the operator attribute on feComposite elements, the mode attribute on feBlend elements, and declaration of the x, y, width, and height attributes on symbol elements (note the nu validator only adds width and height)

  • note the SVG desc element remains unchanged (isn't changed to allow any child content)

Moreover, the HTML specification makes the specific requirements that

  • the content model for the SVG title element inside HTML documents is phrasing content (this further constrains the requirements given in SVG 2) (section 4.8.17)

  • the svg element falls into the embedded content, phrasing content, flow content [and palpable content] categories for the purposes of the content models in this specification (section 4.8.17)

  • when the SVG foreignObject element contains elements from the HTML namespace, such elements must all be flow content

  • HTML defines the nonce attribute applying to SVG and other foreign elements (section 2.6.6)

  • script elements are allowed in SVG anywhere

which have been applied as well.

Finally, generic XML attributes in need of declaration within an SGML context (xml:lang, xml:space, and id, including their no-namespace HTML variants if applicable) are declared (see section 3.2.6.2).

Note XLink attributes are declared by the SVG 1.1 DTD (see also section 12.1.2.3).

MathML

Customization of MathML 3 DTD for embedding into HTML includes the following specific requirements:

When the MathML annotation-xml element contains elements from the HTML namespace, such elements must all be flow content" (section 4.8.16)

When the MathML token elements (mi, mo, mn, ns, and mtext) are descendants of HTML elements, they may contain phrasing content elements from the HTML namespace (section 4.8.16)

Finally, like with SVG, generic XML attributes in need of declaring no-namespace HTML variants for xml:lang and xml:space are declared.