The Full WHATWG HTML RD2001 DTD, like former versions, is a transcription of WHATWG's HTML Review Draft specification prose published January 29th, 2020, into an SGML DTD. The Full DTD covers all elements of HTML, SVG, MathML, and the ARIA attributes, and its construction is described in the reference for the W3C HTML 5 DTD, with only modifications for the current version described in this document.
The Minimal WHATWG HTML RD2001 DTD,
also like former versions, is a compact DTD containing
only essential parsing rules for HTML.
As only HTML's special rules for HTML void elements and
enumerated attributes are included (others being admitted
freely), the Minimal WHATWG HTML's DTD
usefulness for validation purposes is limited. Instead, the
purpose of the Minimal HTML DTD is to provide a
minimal bundled declaration set for content parsing and
production tasks for modern and idiomatic HTML in sgmljs.net
and other SGML software with support for resolving
declaration sets via catalog resolution (in sgmljs.net,
the Minimal HTML DTD is resolved and accessed by
the about:legacy-compat
system identifier).
This DTD is based on HTML review draft 20-01 published as a W3C recommendation on January 28, 2021, which is the first (and, so far, only) W3C recommendation based on a WHATWG HTML review draft, and the first W3C recommendation since 2017.
Apart from a larger set of small changes to be expected for the first revision since years as explained below, Review Draft 200229, accepted as W3C HTML recommendation, is also the first W3C HTML specification published under the Memorandum of Understanding between WHATWG and W3C which prevents W3C from directly redacting specification text. As such, HTML Review Draft 200219 sees notable change in two long standing issues where upstream (WHATWG) HTML specification text was accepted when it was explicitly rejected in previous W3C versions despite lack of material change:
multiple main
elements are allowed, reflected by the nav
,
article
, and aside
content models now not forbidding main
descendant content
hgroup
was included in W3C HTML for the first time; note
in WHATWG HTML, hgroup
, as orginally introduced for hiding headings
having multiple ranks from the so-called HTML 5 outlining algorithm to
prevent inference of undesired sections, had been deprecated
for many years, even though its content model specification
hasn't changed (which has been the the reason of the W3C editors
for not including it); hgroup
's content model is only changed
in the upcoming Review Draft 2023
The changes are detailed in the following sections.
Added the hgroup
and slot
elements (complementing the template
element already part of previous HTML specifications).
The hgroup
, meta
, and slot
elements were added
to the flow
content category and parameter entity; meta
and slot
were also added to the phrasing
category and
parameter entity, resp., while hgroup
was added to the flow_only
and the heading
parameter entities.
A menu
element has been re-introduced with changed content
rules and semantics; it is being listed under grouping elements,
and has been removed as a legacy element. Note the menuitem
element that used to be part of the original menu
content
model isn't anymore used at all but remains present as a
legacy element since it admits end-element tag omission.
The style
element has been removed from the flow content
category, reflecting final abandon of the scoped CSS concept
in HTML specs.
img
and object
, and the legacy keygen
element, have
been made member of the interactive content category.
Changed content models or inclusion or exclusion constraints
of the article
, nav
, aside
, header
, footer´,
p,
figure,
ruby,
legend, and
canvas` elements.
Note that heading elements as content of legend
elements were
valid before Review Draft 200129, and are valid in current WHATWG
specifications again, hence their disallowance in Review Draft 200129
can be considered erratic. To use the declaration of eg. Review Draft
230116 instead, you can place the following markup declarations into
the internal subset:
<!ENTITY html.legend.element "IGNORE">
<!ELEMENT legend - - (#PCDATA|%phrasing;|%heading;)* -(main)>
Retained the rb
and rtc
elements (removed from the specification but
allowing tag omission) as legacy elements.
The address
element now appears under sectioning when it
would formerly be listed under grouping cortent.
Added event handler attributes onformdata
, oncopy
, oncontextmenu
,
oncut
, onpaste
, onformdata
, onsecuritypolicyviolation
,
onslotchange
, and onscrollend
.
Removed event handler attributes onabort
, onloadend
,
and onshow
.
Added global attributes enterkeyhint
, inputmode
, is
, itemid
,
itemprop
, itemref
, itemscope
, itemtype
, and slot
. Also added
nonce
as global attribute where it used to be declared for specific
elements in previous DTDs.
Added body event handler attribute onmessageerror
.
Note that the contenteditable
, the hidden
, the
spellcheck
, and the translate
global attributes can have the
empty string as value even though the HTML spec advises to
not specifying the attribute in these cases in the first place.
This is not reflected in the SGML DTD.
The same is true of the Fetch API destination (as
) (cf
Section 4.2.4) and the CORS settings (crossorigin
) attributes (defined by
the Fetch Spec and the referrer policy (referrerpolicy
)
attribute (defined by the Referrer Policy spec). These
two specifications have no versioning (not even equivalent to a
Public Review Draft), nor other formal alignment with the HTML
specification, and also contain wildly non-normative language,
and thus, while their snapshot values at the time of publication
can be conditionally included via parameter entities, aren't
included in the HTML DTD by default.
Removed the rev
attribute on the link
and a
elements.
Removed the longdesc
attribute on the img
element.
Removed the typemustmatch
attribute on the object
element.
Removed the hreflang
attribute on the area
element.
The autofocus
element has been formally made applicable to all
HTML elements in WHATWG HTML (section 6.6.7) where it was defined only
in the context of form controls in previous revisions; this is
reflected by promoting autofocus
as global attribute.
Removed the border
attribute on the table
element.
Removed the charset
attribute on the script
element.
Added the usemap
attribute on object
element; note the
usemap
attribute is removed again in the next review draft
(see object-usemap) along with content model changes.
Added the sizes
, integrity
, imagesrcset
, imagesizes
, as
,
and color
attributes on the link
element.
Added the ping
attribute (as a CDATA attribute) on the a
and area
elements.
Added the decoding
attribute to the img
element.
Added the loading
attribute to the iframe
element.
Added the playsinline
attribute to the video
element.
Added the rel
attribute to the form
element.
Added the nomodule
attribute to the script
element.
The enumerated values for the http-equiv
attribute
(section 4.2.5.3) are now represented in the DTD.
Changed the width
and height
attributes (on the img
, iframe
,
embed
, object
, video
, and canvas
elements and the width
attribute
on the input
element) to have NUMBER
declared value.
The attribute sandbox
on the iframe
element
allows multiple space-separated values hence has been remodelled
as having declared value NMTOKENS
.
The enumerated values for the autocomplete
attribute
(section 4.10.3) and the type
attribute on the input
and button
element are now represented in the DTD.
In previous DTDs, the ARIA role
attribute wasn't actually declared
(only attributes for ARIA states and properties were). This has been
fixed. Note unlike role
, the tabindex
attribute is, and has always
been, declared as part of HTML. Note this was fixed in the W3C HTML
5.2 DTD as well.
Moreover, the integration of ARIA has been changed such that
declared attribute defaults for ARIA state and property attributes
are customized to become #IMPLIED
ie. have no material default
value specified. This is in line with what's done with HTML attribute
defaults where applicable, and due to the expectation that an SGML
processor adds default values for attributes where those are declared,
which is however in conflict with HTML's and ARIA's expectation that
an attribute taking on its default value should be left unspecified.
While this change isn't a fix per se, it has been applied to the
previous HTML DTD (W3C HTML 5.2, but no prior versions) as well.
In previous versions, exclusion exceptions for the main
element
had been placed on div
and legend
elements when they should only
apply to sectioning elements with explicit exclusion of main
such as
article
, nav
, and aside
. Note main
itself doesn't exclude main
descendants in its content model. Note this fix has been applied to
the previous HTML DTD (W3C HTML 5.2, but no prior versions) as well.
The HTML Review Draft specification states that
User agents that implement SVG must implement the SVG 2 specification, and not any earlier revisisions.
The SVG working group at W3C hasn't published a formal specification for SVG 2 as language in the form of a DTD or RelaxNG grammar, like was done for previous versions. Moreover, the SVG 2 specification is at candidate recommendation stage at this time, and has been since 2018, reflecting uncertainty regarding whether proposed recommendation or recommendation status can be reached eventually, considering browser vendors have voiced interest in supporting very few conservative SVG 2 additions (such as for streamlining SVG/CSS integration), but not committed to new SVG 2 features as a whole, while continued existence of the SVG working group per its charter, and even W3C as its hosting organization isn't guaranteed.
In keeping with previous HTML 5.x DTDs, the (extremely modular) SVG 1.1 DTD is further extended for SVG 2, but only with those features that are also accepted and implemented for the SVG subset recognized by W3C's nu validator (the SVG RelaxNG grammar used internally by the nu validator is also derived from the SVG 1.1 DTD we're customizing here), up to changes made until May 25th, 2021. Specifically, the following customizations are applied:
add feDropShadow
as element and filter primitive
(in line with section 12.2.6.5's listing of feDropShadow
among
mapped camel-case element names for SVG; note feDropShadow
technically
was defined as part of SVG Filter Effect Module Level 1, hence
as part of SVG 1.* rather than SVG 2
additional enumerated values for the operator
attribute on
feComposite
elements, the mode
attribute on feBlend
elements, and declaration of the x
, y
, width
, and height
attributes on symbol
elements (note the nu validator only adds width
and height
)
note the SVG desc
element remains unchanged (isn't changed
to allow any child content)
Moreover, the HTML specification makes the specific requirements that
the content model for the SVG title
element inside HTML documents is
phrasing content (this further constrains the requirements given in
SVG 2) (section 4.8.17)
the svg
element falls into the embedded content, phrasing content,
flow content [and palpable content] categories for the purposes of
the content models in this specification (section 4.8.17)
when the SVG foreignObject
element contains elements from the HTML
namespace, such elements must all be flow content
HTML defines the nonce
attribute applying to SVG and other foreign
elements (section 2.6.6)
which have been applied as well.
Finally, generic XML attributes in need of declaration within an SGML
context (xml:lang
, xml:space
, and id
, including their no-namespace
HTML variants if applicable) are declared (see section 3.2.6.2).
Note XLink attributes are declared by the SVG 1.1 DTD (see also section 12.1.2.3).
Customization of MathML 3 DTD for embedding into HTML includes the following specific requirements:
When the MathML annotation-xml element contains elements from the HTML namespace, such elements must all be flow content" (section 4.8.16)
When the MathML token elements (mi, mo, mn, ns, and mtext) are descendants of HTML elements, they may contain phrasing content elements from the HTML namespace (section 4.8.16)
Finally, like with SVG, generic XML attributes in need of declaring
no-namespace HTML variants for xml:lang
and xml:space
are declared.