node
and npm
command line apps that come with Node.js can be accessed by typing
eg. npm
on the command-line in a terminal,
then create a fresh directory and change into it,
and then install SGML by invoking npm install -g sgml
on
the command line.
This tutorial gives an introduction to building basic websites from HTML and markdown text fragments with the help of light content extraction and other SGML transformation techniques for generating page navigation.
As a simplistic example for organizing web content around sharing common content, we're going to add header and footer content boilerplate to an SGML file that we indend to publish as HTML on the web. The expectation here is that we're going to have multiple pages, each sharing common head metadata, header (with eg. a menu), and footer content (with eg. legal notices), and similar shared content.
So this is what our produced content file(s) should roughly look like:
<html>
<head>
<title> ... </title>
</head>
<body>
<header> ... </header>
<main> ... </main>
<footer> ... </footer>
</body>
</html>
where we want to have boilerplate for head
,
header
, and footer
populated by using SGML, and
keep actual content files free from
redundant head
, header
, and footer
elements. Instead, we want our content files
to look as follows:
<title>The title</title>
<p>Body text</p>
A simple way to do this (with just header and footer
content for now) is storing header and footer content
in separate files and using SGML general entities to
pull content in from those files into our main
content file(s) (content-using-entity-references.sgm
):
<!DOCTYPE html [
<!ENTITY header SYSTEM "header.ent">
<!ENTITY footer SYSTEM "footer.ent">
]>
<html>
<head>
<title>The title</title>
</head>
<body>
&header
<p>Body text</p>
&footer
</body>
</html>
where header.sgm
contains HTML text such as
<header>
<h1>My Site</h1>
</header>
and footer.sgm
contains eg.
<footer>
<p>Copyright by me;</p>
<p>Contact: abuse@mysite.com</p>
</footer>
To preview the result HTML, use
sgmlproc content-using-entity-references.sgm
to produce the expected, completely assembled HTML file in a terminal on the command-line.
To serve SGML files in our directory on the web, start sgmljs.net SGML's default web app:
sgmlweb-app
You'll then be able to open a web browser and point
it at http://localhost:8080/content-using-entity-references
to see our humble first web page, as assembled dynamically
at request processing time.
Taking a closer look at node_modules/sgml/sgmlweb-app.js
shows
it just acts as an express.js "middleware":
/**
* SGML web server app using expressjs.
*/
var express = require('express')
var sgml = require('sgml')
var app = express()
// content rendering
app.use(function(req, res, next) {
sgml.middleware()(null, req, res, next)
})
// error page rendering
app.use(function(err, req, res, next) {
// only render error page on error status codes;
// (eg. neither on 304 Not Modified, redirects, or when
// primary middleware rendering has encountered an error during
// content parsing/processing and has already set
// 200 status/sent headers; in the latter case,
// we want expressjs' finalhandler to just close the socket
// instead)
if (res.statusCode < 400 || res.statusCode > 599) {
next(err)
return
}
req.method = 'GET'
req.pathTranslated = ''
req.pathInfo = ''
req.scriptName = process.cwd() + '/error.sgm'
req.url = ''
req.queryString = ''
req.queryStringDecoded = ''
sgml.middleware()(err, req, res)
})
app.listen(8080)
Actually, it mounts sgml.middleware
twice into the rendering pipeline,
the first call sgml.middleware()(null, req, res, next)
being executed in response to a regular content request,
and the second one only executed to render an error page
if regular rendering failed. Note we don't have error.sgm
in place in our directory, so that second attempt will
always fail, and just send an empty page to the browser.
We can test this by requesting a non-existant page. For example,
let's open http://localhost:8080/nonexistant
, which will make
sgmlweb-app respond with an empty page. Now if we create
an error page, error.sgm
, in the tutorial dir from where we're
running sgmlweb-app
with the following content:
<!DOCTYPE html SYSTEM "about:legacy-compat" [
<!ENTITY STATUS SYSTEM>
]>
<html>
<head>
<title>Error &STATUS</title>
</head>
<body>
Error serving requested page
</body>
</html>
and attempt to reload http://localhost:8080/nonexistant
,
we'll receive a proper error page having "Error 404" in the
title, and we could elaborate our basic error page to
include whatever content we whish; right now, it only
receives STATUS
containing the numerical
HTTP status 404 (for "NOT FOUND") as a system-specific
entity.
As you can see, sgmlweb-app
is just a demo app using
Node.js and express.js in the most straightforward way; for
running productive websites on sgmljs.net SGML and Node.js, you
may want to refer to express.js' documentation for configuring
security settings such as SSL/https keys, etc.
Custom Wiki syntaxes such as markdown are as old as digital
text processing itself. SGML lets you define element context-specific
token replacement rules for this purpose. For example, to make
SGML format a simplistic markdown fragment into HTML, you could
use an SGML prolog like this (markdown-emph.sgm
):
<!DOCTYPE p [
<!ELEMENT p - - ANY>
<!ELEMENT em - - (#PCDATA)>
<!ENTITY start-em '<em>'>
<!ENTITY end-em '</em>'>
<!SHORTREF in-p '*' start-em>
<!SHORTREF in-em '*' end-em>
<!USEMAP in-p p>
<!USEMAP in-em em>
]>
<p>The following text:
*this*
will be put into EM
element tags</p>
If processed with sgmlproc
eg.
sgmlproc markdown-emph.sgm
SGML will produce canonical syntax as follows:
<p>The following text:
<em>this</em>
will be put into EM
element tags</p>
This works by declaring, via SHORTREF
short reference
maps (in-p
and in-em
) associating tokens (the *
asterisk token in both rules) to replacement entities,
and then make those maps active via USEMAP
short reference use declarations in a given element context.
If the context (top-most) element is em
, the in-em
shortref
map is current (as per the second USEMAP
declaration), which
defines the replacement text for *
to be </em>
, ending the
emphasized text span. Whereas within p
, it's <em>
, starting
an emphasized text span, and making em
the context element.
As a slight variation, h2
heading elements can be produced
from text enclosed in double-hashmark (##
) characters,
as used in markdown syntax, with p
paragraph elements being
added by markdown formatting:
<!DOCTYPE body [
<!ELEMENT body O O ((h2,p)+)>
<!ELEMENT p O O (#PCDATA)>
<!ELEMENT h2 - - (#PCDATA)>
<!ENTITY start-h2 '<h2>'>
<!ENTITY end-h2 '</h2>'>
<!SHORTREF in-body '##' start-h2>
<!SHORTREF in-h2 '##' end-h2>
<!USEMAP in-body body>
<!USEMAP in-h2 h2>
]>
<body>
## Heading 1 ##
Body text of first section.
</body>
For formatting full markdown with all bells and whistles as known from popular sites such as github.com, sgmljs.net SGML has built-in short reference rules that, when referenced (included) in the base document type declaration via a parameter entity, will make sgmljs.net SGML recognize and format unrestricted markdown into HTML as expected:
<!ENTITY % md_shortref_maps
PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
This declares the md_shortref_maps
entity to contain
short references rules for full markdown via the public identifier
(symbolic name) +//IDN sgmljs.net//SHORTREF Markdown//EN
,
and then references the entity such that it becomes part of
the markup declarations in which it is referenced, acting
as if it were declared in place of the reference much like
general entities for content we've used above.
The former example, rewritten to make use of built-in
shortref rules for markdown, looks as follows
(markdown-headings-builtin.sgm
):
<!SGML MARKDOWN PUBLIC "+//IDN sgmljs.net//SD Markdown//EN">
<!DOCTYPE body [
<!ELEMENT body O O ((h2,p)+)>
<!ELEMENT p O O (#PCDATA)>
<!ELEMENT h2 - - (#PCDATA)>
<!ENTITY % md_shortref_maps
PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
]>
<body>
## Heading 1 ##
Body text of first section.
</body>
The first line is an SGML declaration reference we need to
include such that sgmlproc
assumes availability of short
reference delimiters needed for markdown and HTML naming rules
in a way that is compatible with third-party SGML software.
Pulling-in md_shortref_maps
will enable comprehensive
markdown formatting. Note there's no actual short reference
declaration set being resolved by the
+//IDN sgmljs.net//SHORTREF Markdown//EN
public identifier;
these declarations are resolved/recognized specially
by sgmljs.net SGML and are implemented using an internal
markdown-to-HTML converter. The purpose of presenting
markdown formatting as short reference application is
provided for uniformity and compatibility with third-party SGML
software.
Note sgmlproc
includes these definitions by
default when processing files having an .md
file name
suffix. We can omit including an SGML declaration if we rename
our file to process such that it has an .md
file suffix,
in which case the necessary SGML declaration settings will
be automatically assumed by sgmlproc
.
For example, markdown-headings-builtin.md
looks like
this:
## Heading 1 ##
Body text of first section.
and when processed via
sgmlproc markdown-headings-builtin.md
will be formatted into:
<h2 id="heading-1">Heading 1
</h2><p>Body text of first section.
</p>
This example demonstrates how to automatically create an outline from basic HTML sectioning and/or heading elements using link processing and templating.
An outline is useful for generating a table of content, for assistive technologies, and for generation of page navigation elements. Specifically, given a source HTML document similar to the following HTML markup not making use of HTML5's sectioning elements
<h2 id="heading-a">A Level Two Heading</h2>
<p>Level Two Content</p>
<p>Other Level Two Content</p>
<h2 id="heading-b">Another Level Two Heading</h2>
<p>Yet other Level Two Content</p>
we want to create a <nav>
element as follows
<nav>
<ul>
<li><a href="#heading-a">A Level Two Heading</a></li>
<li><a href="#heading-b">Another Level Two Heading</a></li>
</ul>
</nav>
Later on, we also want to compose the result <nav>
element with the
source content into a compound HTML document such that source content
appears as main content, and generated <nav>
content as side-navigation
(or top-navigation) content.
HTML 5 has introduced sectioning elements as a means to hierarchically structure documents, where earlier HTML versions had only ranked heading elements for representing hierarchy ("flat-earth markup").
When sectioning elements are used, the markup
for a heading element and the belonging body text, as well
as potential subsections, have a common ancestor element,
the sectioning root (a section
, main
, article
or
other element acting as sectioning root).
<section>
<h2>Section heading</h2>
<p>Section content text</p>
<!-- potential subsections here ... -->
</section>
<section>
<h2>Next section heading</h2>
<p>Other content</p>
<!-- potential subsections here ... -->
</section>
Traditional "flat-earth HTML markup" doesn't require a common (sectioning or other) element structurally enclosing the heading and its belonging section content:
<h2>Section heading</h2>
<p>Section content text</p>
<!-- ... --->
<h2>Next section heading</h2>
<p>Other content</p>
<!-- ... --->
sgmljs.net SGML is designed to be used with markdown
text. Markdown doesn't have Wiki markup for sectioning
as such, but, like earlier versions of HTML, for heading
elements only. To impose sectioning structure onto markdown
text explicitly, section
(or other sectioning root) elements
would have to be specified as HTML blocks within markdown
text such as in the following example:
<section>
# Heading #
Markdown text with enclosing sectioning root
as markup block
</section>
This is however redundant and rarely seen in practice.
Therefore, for producing outlines from markdown or other
HTML source without sectioning structure, we're using SGML
to infer (ranked) section
tags by parsing HTML with a
custom DTD as straightforward as (outlining1.sgm
):
<!DOCTYPE html [
<!ELEMENT html O O (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ELEMENT li - - (ul*)>
<!ELEMENT ul - - (#PCDATA)>
]>
<html>
<h2>Section One Heading</h2>
<p>Section One Body Text</p>
<h2>Section Two Heading</h2>
<p>Section Two Body Text</p>
<h3>Subsection Two Dot Two Heading</h3>
<p>Subsection Two Dot Two Body Text</p>
</html>
The parsing result contains inferred section2
and section3
elements as follows:
<html>
<section2>
<h2>Section One Heading</h2>
<p>Section One Body Text</p>
</section2>
<section2>
<h2>Section Two Heading</h2>
<p>Section Two Body Text</p>
<section3>
<h3>Subsection Two Dot Two Heading</h3>
<p>Subsection Two Dot Two Body Text</p>
</section3>
</section2>
</html>
This works because sgmljs.net SGML infers start-element
tags for section2
and section3
section markers when
seeing h2
and h3
elements, respectively, as directed
by html
's and section2
content models.
Note in order to obtain proper HTML, the rank suffixes for
section2
and section3
would have to be removed (using
straightforward renaming into plain section
elements
in a link process). This isn't shown here in detail however,
since for our use case we don't want to produce sectioning
elements as such, but want to use sectioning elements
only as intermediate markup for producing navigation
link markup from it, as shown next.
Generation of nav-links into an ul
container element
involves inferring sectioning from heading elements
as shown above in a first step, followed by transforming
sectioning structure into nested li
and ul
elements
(outlining2.sgm
):
<!DOCTYPE html [
<!ELEMENT html O O (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ELEMENT li - - (ul*)>
<!ELEMENT ul - - (#PCDATA)>
]>
<!DOCTYPE ul [
<!ELEMENT nav O O (ul)>
<!ELEMENT ul (li+)>
<!ELEMENT li (a,ul*)>
<!ELEMENT a (#PCDATA)>
]>
<!LINKTYPE toc html ul [
<!LINK #INITIAL
html ul
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 a
section3 #USELINK before-section3 ul>
<!LINK before-section3 #IMPLIED #USELINK in-section3 li>
<!LINK in-section3 h3 a>
]>
<html>
<h2>Section One Heading</h2>
<p>Section One Body Text</p>
<h2>Section Two Heading</h2>
<p>Section Two Body Text</p>
<h3>Subsection Two Dot Two Heading</h3>
<p>Subsection Two Dot Two Body Text</p>
</html>
The toc
link process transforms html
into ul
,
and (inferred) section2
elements into li
elements.
On section2
, the in-section2
link set is made active
which will generate <a>
anchors from h2
headings, and
produce content text for the anchor from heading text.
Furthermore, on section3
subsection elements, a
nested ul
list is opened, and then the before-section3
link set immediately generates a li
element on
a virtual #IMPLIED
element (according to sgmljs.net SGML's
handling of link rules with #IMPLIED
source elements
before proceeding to transform headings into <a>
anchors.
We can test our example document on the command line by invoking
sgmlproc -v active_lpd_names=TOC outlining2.sgm
and will see an HTML list containing the heading texts as list items, preserving the hierarchical nesting structure:
<ul>
<li><a>Section One Heading</a></li>
<li><a>Section Two Heading</a>
<ul>
<li><a>Subsection Two Dot Two Heading</a></li>
</ul>
</li>
</ul>
Note while we have created <a>
anchor elements, we haven't
yet created href
attributes for those anchor elements to
link to the respective section in body text. We'll come
back to this later to keep example code text small for now.
Now it's nice that SGML can produce an HTML nav-list from a document's outline, but we want to have the produced nav-list and the document body from which it was produced in the same document. To do so, our document must essentially contain source markup twice:
(assuming we want our rendered HTML to have an in-page document outline before actual main content). We're going to literally include content twice in the following example, but will soon turn to use entity references to avoid this redundancy.
So that we can still apply rank-based tag inference
we're using a different result markup declaration set
for the source and result markup, respectively: the
declarations of htmlsource
integrate our content model
rules used before for tag inference below nav
elements;
the declarations of the result document type html
, on
the other hand, admit HTML elements being used freely
(tocdoc1.sgm
):
<!DOCTYPE htmlsource [
<!ELEMENT htmlsource O O ANY>
<!ELEMENT nav - - (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!DOCTYPE html [
<!ELEMENT html - - ANY>
<!ELEMENT nav - - ANY -(p)>
<!ELEMENT h2 - - (#PCDATA)>
<!ELEMENT h3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!LINKTYPE toc htmlsource html [
<!LINK #INITIAL
htmlsource html
nav #USELINK in-nav nav
h2 h2
h3 h3
p p>
<!LINK in-nav
#IMPLIED #USELINK in-nav2 ul>
<!LINK in-nav2
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 a
p #USELINK #EMPTY #IMPLIED
section3 #USELINK before-section3 ul>
<!LINK before-section3
#IMPLIED #USELINK in-section3 li>
<!LINK in-section3
h3 a
p #USELINK #EMPTY #IMPLIED>
]>
<htmlsource>
<nav>
<h2>Section One Heading</h2>
<p>Section One Body Text</p>
<h2>Section Two Heading</h2>
<p>Section Two Body Text</p>
<h3>Subsection Two Dot Two Heading</h3>
</nav>
<h2>Section One Heading</h2>
<p>Section One Body Text</p>
<h2>Section Two Heading</h2>
<p>Section Two Body Text</p>
<h3>Subsection Two Dot Two Heading</h3>
<p>Subsection Two Dot Two Body Text</p>
</htmlsource>
The toc
link process adapts our link rules
explained above within top-level nav
content
in the in-nav
link rules and rules reached
from it.
There are two additional link rules
of the form : p #USELINK #EMPTY #IMPLIED
,
on the in-section2
and in-section3
link sets, respectively, necessary
here to filter-out paragraph elements
from result content. Moreover, we add an
exclusion exception -(p)
to the nav
element
declaration. Together, these changes make the link
process skip p
paragraph elements within nav
content because the result element of the rule is
#IMPLIED
, meaning the element is only copied
over to result markup if allowed at the context
position, which paragraph elements are not because
they're excluded via the -(p)
content exception
for nav
.
To now eliminate having to redundantly specify
our <h2>Section One Heading</h2><p>...
content text
twice in the document, we're replacing each
occurence with an entity reference &content
,
store content in the file tocdoc-content.ent
,
and declare the content
entity accordingly
(tocdoc2.sgm
):
<!DOCTYPE htmlsource [
<!ELEMENT htmlsource O O ANY>
<!ELEMENT nav - - (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ENTITY content SYSTEM "tocdoc-content.ent">
]>
<!DOCTYPE html [
<!ELEMENT html - - ANY>
<!ELEMENT nav - - ANY -(p)>
<!ELEMENT h2 - - (#PCDATA)>
<!ELEMENT h3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!LINKTYPE toc htmlsource html [
<!LINK #INITIAL
htmlsource html
nav #USELINK in-nav nav
h2 h2
h3 h3
p p>
<!LINK in-nav
#IMPLIED #USELINK in-nav2 ul>
<!LINK in-nav2
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 a
p #USELINK #EMPTY #IMPLIED
section3 #USELINK before-section3 ul>
<!LINK before-section3
#IMPLIED #USELINK in-section3 li>
<!LINK in-section3
p #USELINK #EMPTY #IMPLIED
h3 a>
]>
<htmlsource>
<nav>
&content
</nav>
&content
</htmlsource>
We may want to confirm on the command line that processing
tocdoc1.sgm
and tocdoc2.sgm
produce the same result
by invoking
sgmlproc -v active_lpd_names=TOC tocdoc1.sgm
and
sgmlproc -v active_lpd_names=TOC tocdoc2.sgm
respectively.
Note we now basically have a page template which will work
with multiple individual content documents to be used in
place of content
. We only have to change the declaration
of the content
entity into this:
<!ENTITY content SYSTEM>
(eg. without a filename) to have SGML treat it as
system-specific entity resolved to a file named
content
by default, or whatever value else we supply
to it as customized system-specific entity for content
.
sgmlweb has built-in support for resolving files from HTTP request URLs as follows: if
http://[template]/[content]
is requested, where [template]
and [content]
refer
to existing files [template].sgm
and a directory/file
in [template]/[content].sgm
, then it will process the
request by producing HTML from the [template].sgm
file, with [template]/[content].sgm
(as filename) being
supplied as value of the PATH_TRANSLATED
system-specific
entity, and the content of [template]/[content].sgm
supplied as PATH_TRANSLATED_CONTENT
.
Note PATH_TRANSLATED
is the name of a meta-variable
supplied by web servers to CGI web modules
according to the CGI specification, and is also used by
JSGI/connect/express.js (as pathTranslated
) for
portable JavaScript web middleware modules.
So to make our page template fit for using it
directly in web templating, we choose to rename
our content
entity into PATH_TRANSLATED_CONTENT
,
eg. we're changing
<!ENTITY content SYSTEM>
...
&content
...
into
<!ENTITY PATH_TRANSLATED_CONTENT SYSTEM>
...
&PATH_TRANSLATED_CONTENT
...
Moreover, we add
<!ENTITY PATH_TRANSLATED SYSTEM>
to have access to the requested file name (the portion
following http://localhost:8080/doc/
in our request URL).
The Download all files link (see above) links to a file archive where all files are put into the proper places according to sgmlweb's URL mapping rules to make our template run directly as page template.
To be able to use markdown syntax instead of our
explicit <h2>Section One Heading</h2><p>...
text,
as already explained above, we just have to enable
markdown processing in the template file by adding
the markdown SGML declaration reference:
<!SGML MARKDOWN PUBLIC "+//IDN sgmljs.net//SD Markdown//EN">
and by referencing markdown shortref declarations in the base document type:
<!ENTITY % md_shortref_maps PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
We also must respect markdown delimiter recognition
by adding blank lines before and after references
to &PATH_TRANSLATED
, by placing &PATH_TRANSLATED
at the begin of lines, and also placing the </nav>
element at the begin of a line
With these changes, this is what our page template
looks like at this point (doc1.sgm
):
<!SGML MARKDOWN PUBLIC "+//IDN sgmljs.net//SD Markdown//EN">
<!DOCTYPE htmlsource [
<!ELEMENT htmlsource O O ANY>
<!ELEMENT nav - - (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ENTITY PATH_TRANSLATED SYSTEM>
<!ENTITY PATH_TRANSLATED_CONTENT SYSTEM>
<!ENTITY % md_shortref_maps PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
]>
<!DOCTYPE html [
<!ELEMENT html - - ANY>
<!ELEMENT nav - - ANY -(p)>
<!ELEMENT h2 - - (#PCDATA)>
<!ELEMENT h3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!LINKTYPE toc htmlsource html [
<!LINK #INITIAL
htmlsource html
nav #USELINK in-nav nav
h2 h2
h3 h3
p p>
<!LINK in-nav
#IMPLIED #USELINK in-nav2 ul>
<!LINK in-nav2
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 a
p #USELINK #EMPTY #IMPLIED
section3 #USELINK before-section3 ul>
<!LINK before-section3
#IMPLIED #USELINK in-section3 li>
<!LINK in-section3
h3 a
p #USELINK #EMPTY #IMPLIED>
]>
<htmlsource>
<nav>
&PATH_TRANSLATED_CONTENT
</nav>
&PATH_TRANSLATED_CONTENT
</htmlsource>
We may, again, verify our template works as expected
on the command line by invoking sgmlproc
on doc1.sgm
while also supplying a value for PATH_TRANSLATED_CONTENT
:
sgmlproc -v active_lpd_names=TOC -- -e 'PATH_TRANSLATED_CONTENT=<osfile>tocdoc-content.ent' doc1.sgm
Note the <osfile>...
syntax we're using for specifying
the content of tocdoc-content.ent
as value for resolving
PATH_TRANSLATED_CONTENT
as system-specific entity is a
Formal System Identifier (specified by the HyTime extensions
to SGML). Note also we don't need to specify a value
for PATH_TRANSLATED
here since we're not actually referencing
it in content; merely declaring it as general entity won't
in itself make sgmljs.net SGML de-reference and open it.
As promised further above, we now also want to implement
our outline to actually contain nav-links to the sections
of our body text, since our <a>
anchor elements don't
contain any href
attributes at all yet. To do this, we
first must use HTML id
attributes on our headings as target links,
and then somehow grab those attributes and place them into
our <a>
anchor links.
For the first problem, note that sgmljs.net SGML markdown,
like many other markdown implementations such as pandoc,
generates id
attributes from heading text. For example,
markdown text such as
# My Heading #
markdown body text
gets converted into the following HTML fragment:
<h1 id="my-heading">My Heading</h1>
<p>markdown body text</p>
For the second problem of forwarding id
attribute values
into href
attributes, we're going to use templating
with sgmljs.net SGML as a more sophisticated technique
for pulling-in content from external files.
Recall that in our initial, basic HTML composition example,
we've used SGML general entities to supply replacement
text for header
and footer
content.
Rewriting our basic composition example to make use
of templating looks as follows
(content-using-conref-templating.sgm
):
<!DOCTYPE html [
<!ATTLIST header ref ENTITY #CONREF>
<!ATTLIST footer ref ENTITY #CONREF>
<!ENTITY header SYSTEM "header.sgm">
<!ENTITY footer SYSTEM "footer.sgm">
]>
<!LINKTYPE web html #IMPLIED [
<!NOTATION sgml
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN">
<!ENTITY header SYSTEM "header.sgm" NDATA sgml>
<!ENTITY footer SYSTEM "footer.sgm" NDATA sgml>
<!LINK #INITIAL [ ]>
]>
<html>
<head>
<title>The title</title>
</head>
<body>
<header ref=header></header>
<p>Body text</p>
<footer ref=footer></footer>
</body>
</html>
This variant
specifies replacement text files via ENTITY
attributes,
declares the ref
attribute to have #CONREF
semantics, such that SGML expects the element to
have no syntactical content
has entities for header and footer declared
in both the base document type declaration, as well
as preempted (overriding) entity declarations
for these in the web
link process declarations
as SGML entities (eg. declared as data entities with
the SGML public identifier),
has system identifiers (filenames) of entities
end in .sgm
(so different files are being addressed
from those used in the first variant), with
header.sgm
and footer.sgm
meant to
include a doument type declaration.
For example, header.sgm
can look as follows:
<!DOCTYPE #IMPLIED SYSTEM>
<header>
<h1>My Site</h1>
</header>
(and similarly, footer.sgm
has <!DOCTYPE #IMPLIED SYSTEM>
as well, extending our former header.ent
and footer.ent
files into stand-alone SGML files).
<!DOCTYPE #IMPLIED ...>
means that
the document element is the first element
actually encountered in the file.
Moreover, SYSTEM
in this context means that the
content of the external declaration set is expected in
a file named HEADER.dtd
(on the header
/HEADER
element),
which is created by sgmlproc
when a template
is applied on the header
element just before processing
of the template.
To produce output HTML equivalent to what we've produced iin the first variant (eg. with header and footer replaced by the respective content), invoke:
sgmlproc \
-v active_lpd_names=WEB \
content-using-conref-templating.sgm
where we activate the WEB
link process to make sgmlproc
apply template expansion.
SGML's #CONREF
attribute semantics by itself means
just that SGML parses an element on which a #CONREF
attribute is specified in content as if it were declared EMPTY
.
In classical SGML, this would mean that end-element
tags for the respective element must not be specified.
However, sgmljs.net SGML infers
FEATURES MINIMIZE EMPTYNRM YES
as default SGML
declaration setting, which means that end-element tags
are tolerated, and can be omitted according to the
respective tag omission indicator for end-element tags.
With sgmlproc
, we could alternatively use/enforce
classic expectations by SGML using the following main
content file instead (content-using-conref-templating-emptynrm.sgm
):
<!DOCTYPE html [
<!ELEMENT header - O ANY>
<!ELEMENT footer - O ANY>
<!ELEMENT p - - (#PCDATA)>
<!ATTLIST header ref NAME #CONREF>
<!ATTLIST footer ref NAME #CONREF>
<!ENTITY header SYSTEM "header.sgm">
<!ENTITY footer SYSTEM "footer.sgm">
]>
<!LINKTYPE web html #IMPLIED [
<!NOTATION sgml
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN">
<!ENTITY header SYSTEM "header.sgm" NDATA sgml>
<!ENTITY footer SYSTEM "footer.sgm" NDATA sgml>
<!LINK #INITIAL [ ]>
]>
<html>
<head>
<title>The title</title>
</head>
<body>
<header ref=header>
<p>Body text</p>
<footer ref=footer>
</body>
</html>
where the end-element tags for header
and footer
are
omitted, as per classical SGML defaults.
Note the custom declarations for the header
and footer
elements here have - O
as tag omission indicators, meaning
these elements can have their end-element tags omitted, when
they normally (in HTML 5 and in the HTML 5.2 DTD)
must have end-element tags specified explicitly.
sgmlproc \
-v sgmldecl_features_minimize_emptynrm="NO" \
-v active_lpd_names=WEB \
content-using-conref-templating-emptynrm.sgm
sgmljs.net SGML can also apply templating on elements
without using #CONREF
entities, by specifying a template
as a notation attribute in a link process on an element.
More interestingly, this variant allows to grab attributes
from source markup and supply those as system-specific entities
to the template, which is what we want to do in our
doc
page template to supply id
values from sections to
href
values in our nav-links.
We're returning to our running example for outlining/nav-link
generation from two sections before here (doc1.sgm
), and
just add templating on <a>
anchor elements within nav
elements (<doc.sgm>):
<!SGML MARKDOWN PUBLIC "+//IDN sgmljs.net//SD Markdown//EN">
<!DOCTYPE htmlsource [
<!ELEMENT htmlsource O O ANY>
<!ELEMENT nav - - (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ENTITY PATH_TRANSLATED SYSTEM>
<!ENTITY PATH_TRANSLATED_CONTENT SYSTEM>
<!ENTITY % md_shortref_maps PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
]>
<!DOCTYPE html [
<!ELEMENT html - - ANY>
<!ELEMENT nav - - ANY>
<!ELEMENT h2 - - (#PCDATA)>
<!ELEMENT h3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!LINKTYPE toc htmlsource html [
<!NOTATION anchor
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"anchor.sgm">
<!ATTLIST #NOTATION anchor
id CDATA #IMPLIED>
<!ATTLIST (h2|h3)
id CDATA #IMPLIED
template NOTATION (anchor) #IMPLIED>
<!LINK #INITIAL
htmlsource html
nav #USELINK in-nav nav
h2 h2
h3 h3
p p>
<!LINK in-nav
#IMPLIED #USELINK in-nav2 ul>
<!LINK in-nav2
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 [ template=anchor ] a
section3 #USELINK before-section3 ul>
<!LINK before-section3 #IMPLIED #USELINK in-section3 li>
<!LINK in-section3 h3 [ template=anchor ] a>
]>
<htmlsource>
<nav>
&PATH_TRANSLATED_CONTENT
</nav>
&PATH_TRANSLATED_CONTENT
</htmlsource>
We've added an SGML notation declaration for anchor
here,
and declared that the notation's content is found in anchor.sgm
.
anchor.sgm
has the following content (anchor.sgm
):
<!DOCTYPE #IMPLIED SYSTEM [
<!ENTITY id SYSTEM>
<!ENTITY content SYSTEM "<osfd>0">
]>
<a href="#&id">&content</a>
We also edited our link rules to apply on h2
and h3
elements when appearing as descendants of nav
elements
such that the template
link attribute is set to anchor
.
When sgmljs.net SGML sees an SGML notation associated
to an element being targetted in a link rule, it doesn't
just rename the source element into the result element
specified in the link rule, but it creates the result
element content by applying the anchor
template.
Since we've declared the id
attribute on h2
and h3
elements in the link process declaration, and also
declared the id
attribute as a data attribute (attribute
of the anchor
notation), sgmljs.net SGML will now
make available the value of the ID
attribute as
a system-specific entity of the same name to the template,
in addition to providing the child content on which the
template is applied via <osfd>0
as explained in
Parsing HTML.
If we now arrange for a subdirectory named after the
template file (eg. doc
), and have a content file
(markdowncontent1.sgm
, say) in that directory, we can open
http://localhost:8080/doc/markdowncontent1
in a web browser
to make sgmlweb apply our template to add a basic
outline using the process described above. This works
of course for any document of the form used above stored
in the doc
subdirectory.
To run the example on the command line:
sgmlproc -v active_lpd_names=TOC -- -e 'PATH_TRANSLATED_CONTENT=<osfile>doc/markdowncontent1.sgm' doc.sgm
and we'll see exactly the same markup the browser is receiving as rendered result: a markdown file rendered as HTML with an automatically added navigation list displayed on top of it.
We'll add another, larger content file now (doc/markdowncontent2.sgm
)
containing lorem ipsum blind text instead of one-line body
content, but otherwise structurally equivalent to doc/markdowncontent1.sgm
.
We'll also add a header
element now containing a
primitive site menu with links to both
http://localhost:8080/doc/markdowncontent1
and
http://localhost:8080/doc/markdowncontent2
(for simplicity, we're not using header templating
discussed above). Note the changes to doc.sgm
in this
section are not part of the download archive for
the tutorial but must be copy/pasted manually into doc.sgm
.
<header>
<h1>My Site</h1>
<ul>
<li><a href="/doc/markdowncontent1">markdowncontent1</a></li>
<li><a href="/doc/markdowncontent2">markdowncontent2</a></li>
</ul>
</header>
So that the header
, ul
, and li
element make it to
the result markup, we need to edit the #INITIAL
link
set such that it contains mappings for these elements:
<!LINK #INITIAL
...
header header
ul ul
li li
...
For <a>
anchor elements, once again, we're using
the following mapping rule
<!LINK #INITIAL
...
a #IMPLIED
...
with the intent that the href
attribute is
carried over to the result <a>
anchor element
(which it wouldn't if we were chosing a simple
a a
mapping rule instead).
Moreover, we instruct the browser to load sgml-ua.min.js
(the JavaScript code for SGML user agent) by adding
<script src="/scripts/sgml-ua.min.js" async="async"></script>
along with a link rule script #IMPLIED
to doc.sgm
such that it reads as follows with our changes:
<!SGML MARKDOWN PUBLIC "+//IDN sgmljs.net//SD Markdown//EN">
<!DOCTYPE htmlsource [
<!ELEMENT htmlsource O O ANY>
<!ELEMENT nav - - (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ENTITY PATH_TRANSLATED SYSTEM>
<!ENTITY PATH_TRANSLATED_CONTENT SYSTEM>
<!ENTITY % md_shortref_maps PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
]>
<!DOCTYPE html [
<!ELEMENT html - - ANY>
<!ELEMENT nav - - ANY>
<!ELEMENT h2 - - (#PCDATA)>
<!ELEMENT h3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!LINKTYPE toc htmlsource html [
<!NOTATION anchor
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"anchor.sgm">
<!ATTLIST #NOTATION anchor
id CDATA #IMPLIED>
<!ATTLIST (h2|h3)
id CDATA #IMPLIED
template NOTATION (anchor) #IMPLIED>
<!LINK #INITIAL
htmlsource html
header header
ul ul
li li
a #IMPLIED
nav #USELINK in-nav nav
h2 h2
h3 h3
p p
script #IMPLIED>
<!LINK in-nav
#IMPLIED #USELINK in-nav2 ul>
<!LINK in-nav2
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 [ template=anchor ] a
section3 #USELINK before-section3 ul>
<!LINK before-section3
#IMPLIED #USELINK in-section3 li>
<!LINK in-section3
h3 [ template=anchor ] a>
]>
<htmlsource>
<header>
<ul>
<li><a href="/doc/markdowncontent1">markdwoncontent1</a></li>
<li><a href="/doc/markdowncontent2">markdowncontent2</a></li>
</ul>
</header>
<nav>
&PATH_TRANSLATED_CONTENT
</nav>
&PATH_TRANSLATED_CONTENT
<script src="/scripts/sgml-ua.min.js"></script>
</htmlsource>
If we refresh our browser for http://localhost:8080/doc/markdowncontent1
and click on the link for http://localhost:8080/doc/markdowncontent2
,
we'll see that markdowncontent2
was produced in-browser, without
sgmlweb processing on the web backend/Node.js. To see this, we need
to open Web Developer Tools in our browser (Firefox or Chrome) and
open the Network tab before refreshing. It will tell us that the browser
has fetched /doc.sgm
, and /doc/markdowncontent2
as individual
static files, and composed the HTML document/DOM for the
markdowncontent2
HTML document in the browser.
As you can see, the SGML user agent is designed as a drop-in replacement for server-side HTML composition, offloading SGML processing from the server to the browser. It works such that, upon load, it changes click behaviour of links to URLs on the origin host to perform broser-side SGML processing, executing the exact same JavaScript code as used on the server. It is envisioned that server-side SGML composition is only performed on the initial page load for a given web site, with subsequent page loads being processed entirely on the browser and the server only sending static files used for composition.
Now there's still something wrong with our browser-rendered
document which is immediately visible when we're navigating from
our landing page to the in-browser composed page: the page
margins are missing on our in-browser composed page. The reason
is simply that we're lacking a proper HTML body
element,
and also a header
element with at least a page title
,
as is required for valid HTML. While the browser adds it
automatically, SGML only adds header
and body
etc.
element if it is instructed to do so by specifying
a HTML DTD with tag inference and other content rules for HTML.
Using a HTML DTD is discussed to great length in the
Parsing HTML Tutorial;
here we just want to conclude our tutorial by specifying the
required header
, body
, and title
elements manually.
For the title
element, we populate its text content from
PATH_TRANSLATED
which expands to the client file name,
and which we're already declaring in our SGML prolog anyway:
<!SGML MARKDOWN PUBLIC "+//IDN sgmljs.net//SD Markdown//EN">
<!DOCTYPE htmlsource [
<!ELEMENT htmlsource O O ANY>
<!ELEMENT nav - - (section2+)>
<!ELEMENT section 2 O O (h2,p*,section3*)>
<!ELEMENT section 3 O O (h3,p*)>
<!ELEMENT h 2 - - (#PCDATA)>
<!ELEMENT h 3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
<!ENTITY PATH_TRANSLATED SYSTEM>
<!ENTITY PATH_TRANSLATED_CONTENT SYSTEM>
<!ENTITY % md_shortref_maps PUBLIC "+//IDN sgmljs.net//SHORTREF Markdown//EN">
%md_shortref_maps;
]>
<!DOCTYPE html [
<!ELEMENT html - - ANY>
<!ELEMENT nav - - ANY>
<!ELEMENT h2 - - (#PCDATA)>
<!ELEMENT h3 - - (#PCDATA)>
<!ELEMENT p - - (#PCDATA)>
<!ELEMENT a - - (#PCDATA)>
]>
<!LINKTYPE toc htmlsource html [
<!NOTATION anchor
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"anchor.sgm">
<!ATTLIST #NOTATION anchor
id CDATA #IMPLIED>
<!ATTLIST (h2|h3)
id CDATA #IMPLIED
template NOTATION (anchor) #IMPLIED>
<!LINK #INITIAL
htmlsource html
header header
ul ul
li li
a #IMPLIED
nav #USELINK in-nav nav
h2 h2
h3 h3
p p
head head
title title
body body
script #IMPLIED>
<!LINK in-nav
#IMPLIED #USELINK in-nav2 ul>
<!LINK in-nav2
section2 #USELINK in-section2 li>
<!LINK in-section2
h2 [ template=anchor ] a
section3 #USELINK before-section3 ul>
<!LINK before-section3
#IMPLIED #USELINK in-section3 li>
<!LINK in-section3
h3 [ template=anchor ] a>
]>
<htmlsource>
<head>
<title>&PATH_TRANSLATED</title>
</head>
<body>
<header>
<ul>
<li><a href="/doc/markdowncontent1">markdwoncontent1</a></li>
<li><a href="/doc/markdowncontent2">markdowncontent2</a></li>
</ul>
</header>
<nav>
&PATH_TRANSLATED_CONTENT
</nav>
&PATH_TRANSLATED_CONTENT
<script src="/scripts/sgml-ua.min.js"></script>
</body>
</htmlsource>
If we want to test our modified template on the command
line, we now have to actually supply PATH_TRANSLATED
as
system-specific entity:
sgmlproc -v active_lpd_names=TOC -- -e 'PATH_TRANSLATED=doc/markdowncontent1.sgm' -e 'PATH_TRANSLATED_CONTENT=<osfile>doc/markdowncontent2.sgm' doc.sgm