Today marks the 30th anniversary of the SGML standard publication.
As anyone having gone through the ISO 8879 and other ISO specification text will confirm, it's not an easy read, nor are other specifications building on SGML. Steven R. Newcomb, one of the editors of the HyTime standard, later even apologized for it:
In short, despite having great things to say, even the deathless prose of the HyTime standard tends to be unreadble and, quite frankly, to suck as informative literature (I'm a co-editor of it; may God have mercy on us.)
from material about HyTime architectural forms that seems to have vanished from the web, but probably Robin Cover's authorative collection of markup-related specifications, also started in 1986 by the way, has it covered somewhere
Yet, for HTML5 it was found necessary to specify it's parsing rules and content grammar in informal prose and in a procedural fashion. While perhaps appealing to casual readers, this was a step back in terms of quality expectations towards specification work, and maybe the reason few parsers, if any, for HTML5 claim HTML5 conformity. To be fair, though, HTML5 was designed to deal with existing tag soup content on the web, so had to be extremely permissive, basically accepting anything as HTML.
Given HTML's trajectory towards becoming a univeral markup meta language and it's torched earth attitude towards existing markup languages, as a proposal for the HTML 6 and HTML 7 roadmaps, here's a (non-exhaustive) list of features SGML has had for thirty years now (and in fact, much longer):
In modern web development, the proliferation of ad-hoc syntaxes for template "engines" and configuration files is abundant. At the time of this writing, curly braces seem to be in fashion for expressing variable substitution in text documents. Even in XML applications it is common to use curly brace syntax for this purpose, rather than using XML's (and SGML's) built-in entity mechanism. Almost all template engines are prone to injection attacks because, unlike entity references, they can't assess the syntactical context and escaping rules text is substituted into.
SGML can parse user-defined Wiki syntaxes and other shorthand
notations using the
SHORTREF feature, by allowing
context-dependent replacement of text tokens into markup tags or other
text. It can even parse JSON.
For many years now, web developers have strived to implement the idea of semantic markup, nevermind the fact that in eg. classic computer science, the term semantic> is used as opposed to syntactic in a dichotomic sense. Now a markup language is very much a syntactic construct, hence by that definition of the word, it can't be semantic.
As a rational basis for semantic markup,
perhaps the term content reusability
expresses more readily what web developers aim for here.
Looking at the web today, it cannot be said that this has
been achieved (just look at this website's pityful
page source which is littered with
other presentation artifacts required for bootstrap's CSS).
The "semantic markup" discussion basically is a consequence of the fact that web developers are confronted with HTML and CSS as two separate languages, and of a developer's mindset attempting to rationalize this situation.
In SGML, on the other hand, attributes were originally introduced to capture rendering properties, much like CSS properties today, whereas content was put into element child text. In SGML, the notion that markup shouldn't contain presentation details is solved by using "link types", which, like CSS, can define an automaton for selecting applicable rendering properties in a context-dependent way, without having to specify those in main text copy. Unlike with CSS, this task is solved within the SGML language itself, and isn't shifted into separate ad-hoc syntax.
Today, after many years of achievements in the vibrant markup community, it seems that standards fatigue has settled in, and that a whole generation of domain experts who believed in markup language technology have lost their voice or have retired silently.
Overusing XML for problems unrelated to semistructured data in the 2010's, the failure of XHTML, and other failures of the W3C haven't helped either. Standards development for markup languages has largely been stalled or abandoned.
Fortunately, SGML still stands as a practical bona fide language for tackling large-scale content authoring and preservation tasks, which are the kinds of problems SGML was designed to solve almost 50 years ago. It's specification process took almost a decade, and then another decade into becoming an ISO standard. As it stands, we're not going to see an international standard of SGML's calibre anytime soon.
SGML can parse and process most text content formats being used today, including HTML, XML, Wiki syntaxes, and JSON; it can formalize HTML's notion of omitting elements and tag inference, and it has practical integrated content organization, templating, and other authoring mechanisms.