Is this HTML 6?
Last year's motion for a new HTML candidate recommendation based on HTML Review Draft, published January, 2022 was rejected by W3C's HTML working group, but no new HTML review draft has been widely reviewed or proposed for bringing to recommendation status by W3C since, even though the major reason for last year's rejection has been prominently changed.
Already brought up during wide review of Review Draft January, 2021 was the issue of the Reporting API being unfinished, yet normatively referenced. Ignoring the more serious privacy concerns also raised against the Reporting API, it wasn't clear where to add a notice hinting at the preliminary Reporting API status, as there was no consensus regarding whether W3C could add normative text to upstream WHATWG HTML specifications according to the Memorandum of Understanding Between W3C and WHATWG.
A change that was brought forward through WHATWG's editorial process was a resolution to the long standing issue of the so-called HTML 5 outlining algorithm where Steve Faulkner took it upon himself to edit the WHATWG specification text. The reasoning behind removal, rewriting, or at least warning about the outlining algorithm are well-known: that it isn't, wasn't, and won't be implemented in user agents, nor was used in assistive technologies either.
However, the concept of sectioning roots and the semantic sectioning
elements introduced by Ian Hickson such as article
and aside
into the HTML 5 specification as we know it today is pretty fundamental,
and a key innovation for HTML 5, along with allowing hyperlink anchoring
around any content rather than just phrasing content.
HTML outlining is a beautiful idea with a distinct SGML flavor
using section element inference to bridge HTML "flat-earth markup"
with XHTML-era ideas for a rank-less <h>
element and for <section>
elements. The Producing HTML tutorial explains outlining using SGML in detail. Its applicability
isn't affected by HTML outlining changes, because the
tutorial is using outlines only as an intermediate vehicle for
generating navigational tables of content. Unlike in SGML proper,
in (legacy) HTML outlining, no material DOM elements are
inferred, but the outlining algorithm is merely used as a device
for establishing equivalence of DOMs with and without section elements
and of the desired accessibility semantics.
Obviously, Ian Hickson knew his SGML well, as is also
evident by his initial HTML 5 parsing algorithm description
capturing SGML tag inference with remarkable precision. Unfortunately, the hard-coded
formulation of HTML parsing he left behind has languished due to its
presentation being littered with explicitly enumerated elements
on where eg. parent content models are closed, rather than referencing
the underlying generic SGML parsing rules (or at least element categories
which the specification itself goes to great lengths defining).
As a consequence, subsequent additions of HTML elements weren't
reflected in tag inference rules already in the HTML version
published as W3C HTML 5.1 right after the initial release
with respect to the details
, figcaption
, figure
, and menu
elements. Basically, HTML's parsing rules were already ossified
at that point.
We also note that Review Draft January, 2020 now accepts
multiple main
sections where in previous versions W3C
editors made a point of enforcing a singular mapping to a
main
ARIA landmark role (main
was even introduced
for that purpose). The wider discussion generally evolves around design
options available to markup languages since the beginning, namely
whether to represent some piece of information as element name,
ID value, ARIA role attribute, CSS class, or other attribute value.
Which is why SGML has link process definitions (LPDs), providing the necessary build-in support for exactly the kind of attribute to element mappings required for outlining and ARIA role mapping, and for presenting the results as logical document "views" as part of the core language. Oddly, in the HTML context, CSS isn't even considered for producing ARIA mappings, when it supposedly should bridge presentational and structural gaps according to its proponents.
Considering the HTML outlining algorithm and the concept of sectioning roots had been in the HTML specification for nearly two decades, their unceremonious removal or deprecation is problematic. The reasoning behind the change seems sound enough, but of course doesn't change the large corpus of existing content produced under the assumption of section inference implied by heading elements. Its departure would thus seem to warrant a prominent major-version bump or renaming as the whole point of the WHATWG HTML initiative was backwards compatibility with existing content, with HTML 5 now being in use long enough to have created its own legacy.
To complicate matters, HTML Review Draft published January, 2020
and accepted as W3C recommendation, includes the hgroup
element.
hgroup
had been part of WHATWG HTML specifications for the longest time,
but was deliberately not included in previous W3C recommendations.
Its silent inclusion in W3C's 2021 recommendation as an element
allowing heading elements with multiple, different ranks, the
only purpose of which is to stop the (legacy) outlining algorithm
from inferring sections from their presence, is inconsistent with
all previous W3C recommendations. In fact, hgroup
is changed in
Review Draft January, 2023, again to allow only a single heading
element (with alternate titles going into p
elements) as part of
the major change to HTML outlining already described above.
At the time of this writing, the HTML working group
hasn't started wide review of WHATWG's Review Draft January, 2023, on which the
next HTML DTD would be based (including the upstream change to hgroup
),
nor on other review drafts, as the HTML working group charter would imply and as was done in
previous years. The status of the HTML working group, and that of
W3C, Inc. in general considering its change in legal form and its
ability and commitment to bring broad consensus behind future HTML
recommendations must be questioned at this point.
For these reasons, HTML Review Draft published 2020, January 29th, endorsed as W3C recommendation, remains the preferred DTD for HTML 5; the DTD for HTML Review Draft published 2023, January 16th is provided here purely for completeness.
Apart from the hgroup
-related outlining changes discussed,
modifications in Review Draft January, 2023 relative to Review Draft
January, 2020, are rather modest, however, and include just the
following items:
changed tag omission rules for body
not allowing end-tag omission
on noscript
child content which seems arbitrary
summary
now allowing any mix of heading and phrasing child content
rather than either a single heading element or phrasing content
as before when the motivation for either choice is really unclear;
considering WHATWG issue 2272 and 8864, one can expect further change
to the effect that certain type of interactive content is prohibited
(similar concerns are also raised with respect to hyperlink anchors
within anchors)
menuitem
being reintroduced albeit with another
content model
the newly formulated constraint that <a>
(anchor) elements disallow
descendant elements having their tabindex
attribute specified
can't be represented in SGML DTD anyway
contrast handling of the rt
and rtc
ruby-supporting element
(removed because only supported by Firefox) with dialog
, which is
unchanged in Review Draft 2023 but was already included in Review
Draft 2020 even though only implemented by Chrome (cf. issue 4937),
a fact that WHATWG editors aren't happy with either
hence there's almost no progress to show for three years, and the changes that made it are controversial at best and will without exception require further adaptions in the future.
As demonstrated by these changes, a process where a snapshot at a
certain point in time is mechanically taken without quality assurance
or any other redactional workflow whatsoever, nor W3C or anyone else
reviewing, tends to result in higly volatile specification releases
with changes either incidental in nature (such as for hgroup
and also legend
element), or merely an artifact of bureaucracy
(such as dialog
, rt
, or menuitem
).
The impression emerging from the presence of hgroup
in HTML
Review Draft 2020 (and the W3C recommendation based on it, no less)
is that it has been merely smuggled in while Steve Faulkner wasn't looking,
which doesn't reflect favorably on W3C's process and scrutinity.
Consensus is also increasingly difficult to reach due to the HTML specification aggregating additional functionality and sheer volume, as can be seen by last year's rejection of the Reporting API, with the awkward W3C/WHATWG Memorandum Of Understanding seemingly not helping progress either.
For all these reasons, it's expected that HTML Review Draft 2020 will be the final HTML 5 version published and endorsed by W3C. Even if W3C will be able to organize consensus and publish a new version based on current WHATWG work, that version, due to the changes already discussed, will have to represent a new major version of the HTML markup language; which it kind of has, by simply not being called HTML 5 anymore.
Pre-production release for sgmljs.net SGML.
sgmljs.net 0.2.5-beta has been released with the
HTML Review Draft 200129 mini-DTD as
embedded HTML DTD, resolvable via the about:legacy-compat
system
identifier.
See Parsing HTML for a tutorial on parsing HTML and checking out the updated embedded Mini-DTD.
See also release notes for 0.0.10-alpha.