node_modules/.bin/sgmlprocon the command line)
This tutorial gives an introduction to building basic websites from HTML and markdown text fragments with the help of light content extraction and transformation techniques for generating page navigation.
As a simplistic example for organizing web content around sharing common content, we're going to add header and footer content boilerplate to an SGML file that we indend to publish as a static HTML site. The expectation here is that we're going to have muliple pages, each sharing common head metadata, header (with eg. a menu), and footer content (with eg. legal notices), and similar shared content.
So this is what our produced content file(s) should look like:
where we want to have boilerplate for
footer populated by using SGML, and
keep actual content files free from
elements. Instead, we want our content files
to look as follows:
A simple way to do this (with just header and footer
content for now) is storing header and footer
in separate files and using general entities to
pull content in from those files into our main
content file(s) (
header.sgm contains HTML text such as
footer.sgm contains eg.
to produce the expected, completely assembled HTML file.
With sgmljs.net SGML, there's another,
more sophisticated way to do this, and one
that helps in reducing further redundancies
specifies replacement text files via
ref attribute to have
semantics, such that SGML expects the element to
have no syntactical content
has entities for header and footer declared
in both the base document type declaration, as well
as preempted (overriding) entity declarations
for these in the
web link process declarations
as SGML entities,
has system identifiers (filenames) of entities
.sgm (so different files are being addressed
from those used in the first variant), with
footer.sgm meant to
include a doument type declaration.
header.sgm looks as follows:
<!DOCTYPE #IMPLIED ...> means that
the document element is the first element
actually encountered in the file.
SYSTEM in this context means that the
content of the external declaration set is expected in
a file named
HEADER.dtd (on the
which is created by
sgmlproc when a template
is applied on the
header element just before processing
of the template.
To invoke production of output HTML equivalent to what the first variant produces (eg. with header and footer replaced by the respective content):
where we activate the
WEB link process to make
apply template expansion.
#CONREF attribute semantics by itself means
just that SGML parses an element on which a
attribute is specified in content as if it were declared
In classical SGML, this would mean that end-element
tags for the respective element must not be specified.
However, sgmljs.net SGML, infers
FEATURES MINIMIZE EMPTYNRM YES as default SGML
declaration setting, which means that end-element tags
are tolerated, and can be omitted according to the
respective tag omission indicator for end-element tags.
sgmlproc, we could alternatively use/enforce
classic expectations by SGML using the following main
content file instead (
where the end-element tags for
omitted, as per classical SGML defaults.
Note the custom declarations for the
elements here have
- O as tag omission indicators, meaning
these elements can have their end-element tags omitted, when
they normally (in HTML 5 and in the HTML 5.2 DTD)
must have end-element tags specified explicitly.
Custom Wiki syntaxes such as markdown are as old as digital
text processing itself. SGML lets you define element context-specific
token replacement rules for this purpose. For example, to make
SGML format a simplistic markdown fragment into HTML, you could
use an SGML prolog like this (
If processed with
SGML will produce canonical syntax as follows:
This works by declaring, via
SHORTREF short reference
in-em) associating tokens (the
asterisk token in both rules) to replacement entities,
and then make those maps active via
short reference use declarations in a given element context.
If the context (top-most) element is
map is current (as per the second USEMAP declaration), which
defines the replacement text for
* to be
</em>, ending the
emphasized text span. Whereas within
an emphasized text span, and making
em the context element.
As a slight variation,
h2 heading elements can be produced
from text enclosed in double-hashmark (
as used in markdown syntax, with
p paragraph elements being
added by markdown formatting:
Note full markdown syntax formatting is impossible to implement using just short references. For example, markdown's reference links feature allows link details to be populated from data placed elsewhere in the document such that eg. link URLs and titles can be forward- or backward-referenced within a document. This can't be handled by short references which only act locally in a given element context.
For full markdown formatting, sgmljs.net SGML has built-in "virtual" short reference rules that, when referenced (included) in the base document type declaration, will make sgmljs.net SGML recognize and format markdown into HTML as expected:
The former example, rewritten to make use of built-in
shortref rules for markdown, looks as follows
The first line is an SGML declaration reference we need to
include such that
sgmlproc assumes availability of short
reference delimiters needed for markdown and HTML naming rules
in a way that is compatible with third-party SGML software.
md_shortref_maps will enable comprehensive
markdown formatting. Note there's no actual short reference
declaration set being resolved by the
+//IDN sgmljs.net//SHORTREF Markdown//EN public identifier;
these declarations are resolved/recognized specially
by sgmljs.net SGML and are implemented using an internal
markdown-to-HTML converter. The purpose of presenting
markdown formatting as short reference application is
provided for uniformity and compatibility with third-party SGML
sgmlproc includes these definitions by
default when processing files having an
.md file name
suffix. We can omit including an SGML declaration if we rename
our file to process such that it has an
.md file suffix,
in which case the necessary SGML declaration settings will
be automatically assumed by
markdown-headings-builtin.md looks like
and when processed via
will be formatted into:
This example demonstrates how to automatically create a table of content from basic HTML sectioning and/or heading elements using link processing and templating.
An outline is useful for generating a table of content, for assistive technologies, and for generation of page navigation elements.
Specifically, given a source HTML document similar to the following HTML markup not using HTML5' sectioning elements
we want to create a
<nav> element as follows
Moreover, we want to compose the result
<nav> element with the
source content into a compound HTML document such that source content
appears as main content, and generated
<nav> content as side-navigation
(or top-navigation) content.
HTML 5 has introduced sectioning elements as a means to hierarchically structure documents, where earlier HTML versions had only ranked heading elements for representing hierarchy ("flat-earth markup").
When sectioning elements are used, the markup
for a heading element and the belonging body text, as well
as potential subsections, have a common ancestor element,
the sectioning root (a
other element acting as sectioning root).
Traditional "flat-earth HTML markup" doesn't require a common (sectioning or other) element structurally enclosing the heading and it's belonging section content:
sgmljs.net SGML is designed to be used with Markdown
text. Markdown doesn't have Wiki markup for sectioning
as such, but, like earlier versions of HTML, for heading
elements only. To impose sectioning structure onto markdown
section (or other sectioning root) elements
would have to be specified as HTML block elements within
markdown text such as in the following example:
This is however redundant and rarely seen in practice.
SGML can infer (ranked)
section tags from "flat-earth markup"
by parsing HTML with a custom DTD as straightforward as
The parsing result contains inferred
section3 elements as follows:
Note in order to obtain HTML, the rank suffixes for
section3 would have to be removed (using
straightforward renaming into plain
in a link process). This isn't shown here in detail however,
since for the hierarchy text template we don't want to
produce sectioning elements as such, but want to
use sectioning elements for producing navigation
link markup, as shown next.
nav links into an
ul container element
involves inferring sectioning from heading elements
as shown above in a first step, followed by transforming
sectioning structure into
toc link process transforms
section2 elements into
in-section2 link set is made active
which will generate
<a> anchors from
h2 headings, and
put the heading text as hyperlink text for the anchor.
section3 subsection elements, a
ul list is opened, and then the
link set immediately generates a
li element on
#IMPLIED element (according to sgmljs.net SGML's
handling of link rules with
#IMPLIED source elements
before proceeding to transform headings into