SGML

Templating Reference

SGML Templating

SGML templating builds on SGML's NOTATION and LINK concepts (explained below) and allows parametric inclusion of other SGML files/resources (templates), such that entity references in templates are expanded into values provided from the invocation context.

Parameters can be provided either

  • from data attributes (attributes specified for data entities)
  • from content attributes (attributes specified on element in the main file which are expanded by templates)
  • from external resources such as SQL or SPARQL queries, web services, etc.

Link Processes

A link process, in SGML, is a general term for any kind of processing performed by code "linked" (as in dynamic link library) to a SGML parser library. The term "link process" as used in SGML has nothing to do with hyperlinks, and is better understood as a kind of stylesheet similar to a CSS stylesheet.

A link process declaration (LPD) is a declaration set (a syntactical construct that can appear in a document prolog along with a DTD) declaring properties for link processes and their values. These properties are declared as link attributes, re-using standard attribute list declaration syntax.

Link attributes are associated with content elements in the same way as plain content attributes, but aren't serialized to output. Values for link attributes are determined either by #FIXED default values declared for the respective attribute in ATTLIST declarations in the link process declaration, or can be determined in a state-dependent way based on link sets declared in the link process declaration.

A link set is a rule that associates link attribute values to a particular element appearing in the content, potentially with filtering rules that further restrict the elements to which the rule(s) apply. Link attributes determined via link sets, unlike link attributes merely defaulted by a #FIXED link atttribute declaration, can be given values depending on where the element to apply to appears within content.

For example, link sets can specify that list items with odd ordinal numbers should be rendered differently from those with even ordinals. This is achieved by using two link sets, each associating different link attribute values to the same list item content element, and by telling the processor to use the other link set as applicable link set ("current link set") once a link attribute has been determined.

Link attributes are not included in the output document of SGML processing; instead, in sgmljs.net, link attributes determine templates and template parameters to apply on the respective elements to which the link attribute is assigned by link processing, and the SGML processing output for those elements is determined by the template (note there are additional ways for a template to influence output, such as via result elements on explicit link rules that don't select a template, and via entity preemption; these are explained in detail further down).

While implicit link processes are restricted to associate link attributes to content attribute as described, explict link processes can be used to produce entirely new SGML documents driven by source elements. Explicit link processes can also omit source elements (filter source content), which implicit link can't do.

Explicit link processes can be chained to form a pipeline of link processes where the output of one explicit link process serves as the source content for the next chained explicit link process. In this way, link processing can perform fairly rich document preparation and extraction tasks such as generating an outline for navigation, paging, format search results, etc.

<[ IGNORE [ TODO: However, the expressiveness of link process declarations is limited. For example, it would be impossible to perform any kind of linguistic analysis (which needs to perform word stemming etc.) using only SGML built-in link processing facilities. For this reason, in line with the original use cases for link processes, &productname SGML provides for a way to treat custom link processes (which can be implemented in e.g. JavaScript or XSLT and thus perform arbitrary computation) as a step in a link process chain, such that SGML-implemented and foreign language implemented link processes can be chained together. ]]>

Apart from implicit and explicit link processes, SGML also offers simple processes. A simple link process contains global properties of a link process, rather than properties of individual elements; syntactically, a simple link process is declared as a special kind of implicit link process where link attributes are only declared on the document element (the root element of the document).

For a LPD to be processed and have any effect, it must be active. A link process is activated by supplying its name as an invocation parameter to the active_lpds command-line parameter in command-line processing. In web templating, link processes having the name web and http are activated by default, if present in a document to serve.

Implicit LPDs

An implicit LPD has the following form

<!LINKTYPE lpd-name dtd-name #IMPLIED [
          [entity-declarations]
          link-attribute-declarations
          <!LINK #INITIAL link-rule [link-rule] ...>
         [<!LINK link-name link-rule [link-rule] ...>
          <!LINK link-name link-rule [link-rule] ...> ...]
]>

where link-rule is an item of the form

element [#POSTLINK target] [#USELINK target] [ attr=value ... ]

or

(element1|element2|...) [#POSTLINK target] [#USELINK target] [ attr=value ... ]

Note that in this syntax description for a link rule the square braces around attribute specifications are verbatim SGML code text.

#POSTLINK and #USELINK targets and attribute specifications are optional.

lpd-name
is the name of the link process to declare; the name must be distinct from every other declaration set name (LPD or DTD) in the document
dtd-name
is the declaration set name of the DTD containing the elements on which link attributes are being declared
entity-declarations (optional)

is one or more (general, parameter, or data) entity declaration; see entity preemption

An (implicit or explicit, but not simple) link process declaration must include a single #INITIAL link set declaration, and can optionally contain additional named link set declarations

Link attributes don't get output as content attributes, but are used to associate processing properties with source elements; in particular, if a link process assigns a template to a link attribute, then the respective element on which the template is assigned is replaced by the template content for output, possibly with further link attributes as template parameters.

Otherwise, an implicit link produces it's input sequence of elements and other markup constructs to the output markup unchanged. But note that, like for all link processes, Entity Preemption can also influence the output of an implicit link process.

For example, the following (simplified HTML) document contains an LPD that will assign "body value" to the html content element, and "p value" to the p content element:

<!doctype html [
	<!element html (head?,body)>
	<!element body (p*)>
]>
<!linktype l [
	<!attlist (html|p) linkattribute cdata #implied>
	<!link #initial html #uselink inbody [ linkattribute="body value" ]>
	<!link #inbody p [ linkattribute="p value" ]>
]>
<html>
	<body>
		<p>Hello</p>
	</body>
</html>

Starting in the #INITIAL link set (where "body value" is determined as link attribute), the #USELINK rule takes the link processor to the inbody link set, where "p value" is determined as link attribute on the p element.

Note that #USELINK is pointless here, and only used for demonstrating the syntax and general operation of implicit link processing; in the example, the rule for p could have as well been included in the #INITIAL link set with the same result.

Explicit LPDs

An explicit LPD declares a link process that transforms source markup declared in a source DTD to result markup declared in result DTD. The result DTD can be the same as the source DTD, or can be different.

If it is different and multiple DTDs are involved in an explicit link process, the SGML prolog contains multiple DTDs such as in the following example:

<!doctype A [
	...
]>
<!doctype B [
	...
]>
<!linktype atob A B [
	...
]>

In the example, the declaration for the explicit LPD atob specifies that it transforms source markup declared in declaration set A to result markup declared in declaration set B.

If an explicit LPD's source and result DTD is the same, no additional DTD for the result markup needs to be declared in the document prolog, and the declaration of the explicit LPD takes the following form:

<!linktype name A A [
	...
]>

ie. source and result declaration set name are the same in the link process declaration.

Like an implicit LPD, link attributes are used to associate processing properties with source elements; in particular, if a link process assigns a template to a link attribute, then the respective element on which the template is assigned is replaced by the template content for output, possibly with further link attributes as template parameters. If on the other hand the source element is matched by a link rule, but no template is determined for the source element, then the content is replaced by the matching link rule's result element and result attribute specification.

Note that if a template is assigned, the result attributes for the element specified in the link rule, if any, are ignored, and any output is determined by the template instead.

An explicit LPD performs tag inference and validation of its produced result markup against the result DTD, just as if the result markup were parsed, rather than generated. Note that other SGML processing systems might not perform inference and validation on the result markup of an explicit LPD.

The syntax of an explicit LPD is very similar to that for implicit LPDs, except that a source and result declaration set must be specified (instead of just a declaration set and #IMPLIED, as in implicit LPDs), and that a link rule maps a source to a target element specification, rather than containing just a single element and link attribute specification as in implicit LPDs):

<!LINKTYPE lpd-name source-dtd-name result-dtd-name [
          [entity-declarations]
          link-attribute-declarations
          <!LINK #INITIAL link-rule [link-rule] ...>
         [<!LINK link-name link-rule [link-rule] ...>
          <!LINK link-name link-rule [link-rule] ...> ...]
]>

where link rule is an item of the form

element [#POSTLINK target] [#USELINK target] [ attr=value ... ]
	result-element [ result-attr=value ... ]

or

(element1|element2|...) [#POSTLINK target] [#USELINK target] [ attr=value ... ]
	result-element [ result-attr=value ... ]
lpd-name
is the name of the link process to declare; the name must be distinct from every other declaration set name (LPD or DTD) in the document
source-dtd-name
is the declaration set name of the DTD containing the elements which appear as source elements in link rules, and on which link attributes are being declared
result-dtd-name
is the declaration set name of the DTD containing the elements which appear as result elements in link rules, and which will be produced from link rule applications
entity-declarations (optional)

is one or more (general, parameter, or data) entity declaration; see entity preemption

element (and element1, element2)

is the name of an element in the source-dtd-name declaration set, which will be matched against a source content element

result-element

is the name of an element in the result-dtd-name declaration set which will be produced as result element for every matching (source) element; the result attributes and child content for the produced element are determined either by a template, if a template is implied by the link attributes on the link rule, or, if no template is implied by the link attributes, by the result attributes, if any, specified on the link rule

result-attr

is an result attribute to produce unless a template is applied to produce the result element's attributes and child content

Unlike an implicit link process, an explicit link process only produces result elements to it's output for those source elements which have matching context-dependent link rules applied; if the current link set in an explicit link process hasn't a matching rule for the source content at the context position, no result element is produced.

When no element of a source document can be matched at all by link rules, an explicit link process produces a document containing just an empty result document element (having the result DTD name as element name).

A link rule in an explicit link set can also use the special token #IMPLIED instead of either the source or result element (bot not both); see link rules with #IMPLIED source element and link rules with #IMPLIED result element.

Explicit Link rules having the form

#IMPLIED [#POSTLINK target] [#USELINK target] [ attr=value ... ]
	result-element [ result-attr=value ... ]

(using #IMPLIED as source element) will create an additional element with the respective result element name and result attributes when it is in the current link set, and any start-element event is encountered.

A link rule having #IMPLIED as its source element must be the only link rule in it's containing named link set, and must have a #USELINK target declared.

Contained in a named link set, it is only reached from a preceding sibling or parent element's link rule application transitioning to it by referencing it in its #USELINK or #POSTLINK target.

Once the result element for a rule is produced, the link state immediately transitions to the link set referenced in its #USELINK target. If the transitioned-to link set, in turn, contains a link rule having #IMPLIED as its source element, then the result element of that link rule is also produced to the output, and so on, until a link set is reached which doesn't contain a link rule that has #IMPLIED as its source element.

In this way, rules with #IMPLIED source elements can be used to produce arbitrary many nested elements.

It's an error if, when transitioning to a #USELINK target of a link set having a rule with #IMPLIED source element, the transitioned-to target has already been transitioned to before in the chain of followed #USELINK targets; ie. the chain of #USELINK targets traversed must not form a cycle.

Explicit Link rules having the following form

element [#POSTLINK target] [#USELINK target] [ attr=value ... ] #IMPLIED

or

(element1|element2|...) [#POSTLINK target] [#USELINK target] [ attr=value ... ] #IMPLIED

(using #IMPLIED as result element) will produce the source element as result element, if it is allowed to occur directly at the context position in the result document according to the result DTD. Unlike with regular result elements, tag inference is not performed. If the element is not allowed to occur at the context position (or if the element isn't allowed to occur anywhere) in the result markup, it is silently ignored and not produced to the output.

If the result element is produced to the output, any source attributes allowed to occur in the result element are copied over from the source element to the result element, too. To assess if a source attribute is allowed to occur in the result element, the link processor considers only the name but not the declared value of the attribute. It's an error if, when a link rule with an implied result element matches, an attribute is specified or implied in content, and an attribute with the same name is declared in the result DTD, and the value specified or implied for the content attribute isn't a valid attribute value for the result attribute (for example, if the result element declares an attribute with declared value NUMBER, and the corresponding source attribute with the same name is declared CDATA and doesn't contain a string that can be parsed as NUMBER).

A result element is allowed to occur in the result markup according to the standard rules for tag validation; ie. if either

  • it is accepted as regular content token at the result context position, or
  • it is accepted as an included element at the result context position, or
  • the result context position has declared content ANY, and the element isn't in the set of excluded elements at the result context position, or

  • the result context position refers to an implied-ANY element (ie. an element not declared in the result DTD, and treated as if it were declared ANY because IMPLYDEF ELEMENT YES is specified or implied in/by the SGML declaration)

Simple LPDs

Simple LPDs contain only a single link attribute declaration and/or entity declarations:

<!LINKTYPE lpd-name #SIMPLE #IMPLIED [
          [entity-declarations]
          [link-attribute-declaration]
]>

(having SIMPLE #IMPLIED in place of the source and result DTD name, respectively),

Link attributes must be declared on the document element (the root element) of the base DTD (the first DTD) in the containing document. Template notations in simple LPDs aren't supported; instead, simple LPDs are used to provide document-wide properties in a simple item-value format to auxiliary purposes such as e.g. HTTP delivery parameters.

Simple links can also be used for entity preemption.

Entity preemption

As an additional feature of LPDs, any LPD (including simple LPDs) can declare entities. Entities declared in a LPD override ("preempt") those declared in the document's DTD having the same name (but LPDs can also conain private entities that don't preempt DTD entities).

For example, the following document contains a very basic simple LPD in addition to a DTD:

<!doctype html [
	<!entity visitor-name "Unknown">
]>
<!linktype myformat #simple #implied [
	<!entity visitor-name "Chef">
]>
<html>
<body>
<p>Hello, &visitor-name</p>
</body>
</html>

The effective value of the visitor-name entity is Chef, when the myformat LPD is active.

More generally, the effective value for a given entity is declared by the entity declaration that gets processed first. When processing a document, SGML first processes all active LPDs, in the order they appear in the document prolog; then, SGML processes the base (first) DTD.

Implicit and explicit LPDs can contain a single IDLINK link set declared as follows

<!IDLINK link-rule [link-rule] ...>

where link rule is an item of the form

id element [#POSTLINK target] [#USELINK target] [ attr=value ... ]

(for implicit LPDs), or

id element [#POSTLINK target] [#USELINK target] [ attr=value ... ]
	result-element [ result-attr=value ... ]

(for explicit LPDs).

Link rules in IDLINK sets are matched on elements of the respective type when the element has an ID attribute declared in the DTD, and the ID value specified on the element matches the id declared in the link rule.

Elements matching a link rule in the IDLINK link set have their link and result attributes always determined by the matching link rule in the IDLINK link set in preference to any matching link rules in the current link set.

At any point in time when processing a SGML document, a link process has a current link set; the initial current link set at the start of the document is the #INITIAL link set.

When processing an element, the link process inspects the current link set to check if it contains one or more link rules for the element being processed, ie. if the name of the element to process occurs as the element (or as one of the elements in the name group) of a link rule. If so, it proceeds with link rule selection and link attribute determination as described below.

Otherwise, if an element isn't matched by any link rule in the current link set, no link attributes are inferred for that element. Note that the SGML standard describes a fallback mechanism where link attributes are determined by a parent or predecessor link set, but neither sgmljs.net nor other SGML software implements this behaviour.

Once a link processor has established that a link set contains link rules for the element being processed, it determines the specific link rule (of potentially multiple link rules) to apply as follows:

  • if the current link set has a single link rule for the element to process, then that single link is selected
  • if it has multiple link rules for the element to process (ie. if the element to process appears as element or in the name group of more than one link rule), the link attribute specifications of the applicable link rules are checked to select a single rule; if any link rule specifies a link attribute with the same name as a content attribute on the respective element, and the content attribute value for the current element has the same value as specified in the link rule, then the first such rule, in the order specified in the link set, is selected

To ensure that a single link rule can always be selected by the above algorithm, sgmljs.net checks that any link rule specifying a link attribute having the same name as a DTD attribute is part of a set of multiple rules for the respective element in the link set, and that for any link set with multiple link rules for the same element,

  • each link rule, of the multiple link rules applying to the same element, has one or more link attributes specified (as required by standard SGML syntax), and, moreover,
  • each link rule, of the multiple link rules applying to the same element except the last, specifies at least one attribute that has the same name as a content attribute declared on the respective element in the DTD, and

  • the last link rule (in the order appearing in the link set) doesn't have link attributes specified that have the same name as a DTD attribute.

Link attribute value(s) specified in the selected rule, if any, are implied as link attribute values(s) for the element being processed.

If no value for a link attribute (ie. of those declared as attributes for the element in the LPD) is specified in the link rule, then the attribute default declared in the attribute declaration in the LPD, if any, is implied as value for that link attribute.

If the selected link rule contains a #POSTLINK clause, the target link set name of the #POSTLINK clause is set as the current link set for any elements after the end-element event of the current element has been processed; ie. the #POSTLINK target applies to immediately following sibling elements, and their descendant elements (until another #USELINK or #POSTLINK target is set as the current element). On the end-element event for the parent element of the current element, the current link set is reset to that of the parent element, or the parent element's #POSTLINK target, if it has one.

If the selected link rule contains a #USELINK clause, the target link set name of the #USELINK clause is set as the current link set for any child element of the current element; otherwise, the current link set name isn't changed for child element content.

A #USELINK or #POSTLINK target may specify #EMPTY in place of a link set name, in which case no output markup events are generated for child content or following-sibling content, respectively, of the element being processed.

When #EMPTY is the current link set, neither start-element events nor character data events are produced to the output; in all other cases, character data is always delivered to the output, even when the containing element wasn't covered by a link rule in an explicit link process, and hence didn't produce a result markup element.

A #USELINK or #POSTLINK target may also specify #INITIAL in place of a link set name, in which case the current link set of the child content, or following-sibling content, respectively, is set to the #INITIAL link set containing the link rules effective at the document element of the document being processed.

Determining and applying a template

In an LPD declaring a NOTATION link attribute, when the "Link attribute determination" step above assigns the name of a notation having the SGML public identifier as value to the NOTATION link attribute, then the assigned notation's system identifier is used as a template.

When the assigned notation has data attributes with the same name as link attributes declared on the element on which the notation is assigned as template, then the link attribute values are supplied as data attributes for the notation/template application. Moreover, the processor passes child content of the element on which templating is invoked to the template processing context.

See the next chapter for template processing details.

Templates

This chapter introduces SGML templates. SGML templates provide a more advanced technique of transcluding content from foreign documents or other resources (in addition to plain parsed references) with support for supplying template parameters from the document where the template is transcluded.

SGML templates do not introduce new SGML syntax, but makes use of SGML notations and other SGML constructs.

Any plain SGML document can be used as SGML template. A template SGML document typically (but not necessarily) will use system-specific entities to refer to values obtained from the environment/document where the template is transcluded.

For example, the following hypothetical SGML document, greeting.sgm can be used as a template receiving the guest-name and number-of-new-messages parameters (using #CONREF entity expansion as explained above):

<!doctype div system [
  <!entity guest-name system>
  <!entity number-of-new-messages system>
]>
<div>Hello, &guest-name. You have &number-of-new-messages new messages</div>

A document, master.sgm, making use of the above template can look like this

<!doctype html [
  <!attlist div template entity #conref>
  <!entity msg "placeholder message">
]>
<!linktype web #simple #implied [
  <!notation sgml
    public "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)">
  <!entity msg system "greeting.sgm" [ guest-name="Max" number-of-new-messages=3 ]>
]>
<html>
  <body>
    <div template=msg>
  </body>
</html>

The SGML processor, when processing the master.sgm file with the web link process activated (and when told to output HTML), will produce the following output:

<!doctype html>
<html>
  <body>
    <div>Hello, Max. You have 3 new messages</div>
  </body>
</html>

Using a SGML template as an entity

In the example shown the SGML processor performs the following steps to produce the output (making use of #CONREF entity peemption):

  • the DTD for the file being processed, master.sgm, declares a content reference (#CONREF) attribute with ENTITY declared value for the template attribute on the div element; this tells the processor to treat div elements as EMPTY when the template attribute is used

  • the declaration for the msg entity in the LPD preempts (overrides) the declaration in the DTD so the dummy declaration for msg in the DTD gets ignored; the msg entity is declared in the LPD as an entity in the sgml notation, which, in turn, by reference to the SGML public identifier ISO//..., indicates to the processor to handle the notation as SGML template

  • the processor, when encountering the div attribute, will start a SGML processing sub-context using msg's system identifier greeting.sgm as file to process; the guest-name and number-of-new-messages processing parameters, supplied via data attribute on the msg entity declaration, are declared as the guest-name and number-of-new-messages system-specific entities, respectively, in the template's DTD, for reference from template content

Using a SGML template as a notation

The same template as in the previous example, greeting.sgm, can alternatively be invoked as a template notation (instead of as an entity as shown in master.sgm) like this:

<!doctype html [
  <!-- can be left empty -->
]>
<!linktype web html #implied [
  <!notation sgml
    public "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)">
    "greeting.msg">
  <!attlist #notation sgml
    guest-name cdata #implied
    number-of-new-messages cdata #implied>
  <!attlist div
    n notation (sgml) #fixed sgml
    guest-name cdata #fixed "Max"
    number-of-new-messages cdata #fixed "3">
]>
<html>
  <body>
    <div guest-name="Max" number-of-new-messages="3"></div>
  </body>
</html>

The SGML processor will process the document as follows, producing the same output as in the previous example:

  • the processor handles notations declared by reference to the SGML public identifier as template notation and uses the system identifier greeting.sgm of the template notation as template file name; moreover, template parameters are represented as data attributes declared on the template notation

  • the LPD assigns the sgml notation to the n link attribute; informing the processor that it should apply the greeting.msg template on every div element

  • the LPD also assigns the guest-name and number-of-messages link attribute with the #FIXED values from the link attribute declarations

  • the processor extracts data attributes as template parameters from link attributes having the same name as data attributes; ie. the template parameter guest-name is populated from the link attribute guest-name

Alternatively, and more usefully, template parameters can be extracted from content attributes as follows:

<!doctype html [
  <!attlist div
    guest-name cdata #implied
    number-of-new-messages cdata #implied>
]>
<!linktype web html #implied [
  <!notation sgml
    public "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)"
        "greeting.sgm">
  <!attlist #notation sgml
    guest-name cdata #implied
    number-of-new-messages cdata #implied>
  <!attlist div
    n notation (sgml) #fixed sgml
    guest-name cdata #implied
    number-of-new-messages cdata #implied>
]>
<html>
  <body>
    <div guest-name="Max" number-of-new-messages="3"></div>
  </body>
</html>

As an extension of the example before, in particular, the document body may contain multiple div elements with different attributes, which will result in multiple template applications to be produced to the output. For example,

...
<html>
  <body>
    <div guest-name="Max" number-of-new-messages="3"></div>
    <div guest-name="Maria" number-of-new-messages="5"></div>
  </body>
</html

will produce

...
<html>
  <body>
    <div>Hello, Max. You have 3 new messages</div>
    <div>Hello, Maria. You have 5 new messages</div>
  </body>
</html

So that the SGML processor can supply a template parameter from a content attribute,

  • the content attribute (DTD attribute) must be declared on the element on which the template is applied, and
  • a data attribute having the same name as the content attribute must be declared on the template notation,

and (as already shown in the notation examples before)

  • a NOTATION link attribute must be declared on the element, and assigned a notation that is declared using the SGML public identifier, or a notation that derives from a notation using the SGML public identifier (see below)

  • the template notation must declare a system identifier for locating the template.

Declaring template notations

As shown in the example before, sgmljs.net recognizes those notations as templates notations when declared with the SGML public identifier

ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)

When declaring more than one template notation, multiple notation names and declarations must be used so that the system identifiers containing the file names for each template can contain paths to the different locations of the respective template notations, and so that different sets of data attributes can be declared as required.

There is an alternative to declaring the SGML public identifier directly on a notation that should be treated as template notation:

<!notation sgml
  public "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)">
<!notation mytemplate system "my-template.sgm">
<!attlist #notation mytemplate superdcn name #fixed sgml>

This declares the sgml notation by just using the SGML public identifier, ie. omitting the system identifier, then declares a template notation, mytemplate, in a separate declaration. The mytemplate notation is recognized by sgmljs.net as a SGML template notation because it has the superdcn attribute declared, and the value of the superdcn atttribute is set (as #FIXED value) to the name of a notation declared with the SGML public identifier.

Capturing and supplying child content as template parameter

When using a template notation in content, as shown in the example where the template extracts content attributes, a template notation can also receive child content of the element on which the template is invoked. The template receives child content of the master context via it's standard input, and can access it using the <osfd>0 system identifier.

For example, when using the following fragment in the same context as above

<div guest-name="Max" number-of-new-messages="3">
  <p><h3>Message 1</h3><p>Bla bla blah ... </p>
  <p><h3>Message 2</h3><p>...</p>
  <p><h3>Message 3</h3><p>...</p>
</div>

the following template references the child content of the div element from the master context via the content entity as follows (adding a h2 before it):

<!doctype #implied system [
  <!entity content system "<osfd>0">
  <!entity guest-name system>
  <!entity number-of-new-messages system>
]>
<div>Hello, &guest-name. You have &number-of-new-messages new messages</div>
  <h2>Details</h2>
  &content
</div>

Processing templates

Output of a template application is produced in a separate SGML processing context invocation on the template notation or entity as main file, and such that each data attribute declared and supplied in the master context can be accessed as a system-specific general or parameter entity.

Child content of the element on which the template is applied is captured in the master context as the sequence of normalized markup events (generated according to the master context's parsing rules declared in markup and SGML declarations), and is serialized to a character stream which is supplied to the template via <osfd>0.

A template produces an element with the same name as that on which the template is applied. Moreover, generated template child content must be valid in the master context into which it is included.

Strict templates

The SGML processor enforces this by requiring that the template uses <!DOCTYPE ... SYSTEM .. as DTD, ie. the template must include the markup declarations of the master context by referencing a system-specific external subset that the master context sets up for the template context.

The template may declare it's DTD either using

  • <!DOCTYPE element SYSTEM ..., where element is the element on which the template is applied, or

  • <!DOCTYPE #IMPLIED SYSTEM ..., which makes the template processing context determine the document element from the first content element in the template document (which must be the element on which the template is applied)

By the resolution rules for system-specific entities, the template processing context picks up the prescribed markup declarations that the master context has created via a file name generated from the template document element.

For example, if the template document element is div, the template processing context accesses the external subset by reading the file div.dtd. The file is looked up in a directory that the master context created and supplied for the template invocation, and in which it has placed the div.dtd file containing markup declarations that the template are parsed with and that the template has to conform to.

As a consequence, if the template doesn't use the expected element for it's document element (the element name on which the template is applied), it will fail to locate the file for the external subset in the directory provided by the invoking master context.

Markup declarations that the template processing context receives from the invoking master context don't contain entity declarations, and any parameter entity references are pre-expanded into the replacement values used by the master context.

The following restrictions apply to documents included as strict templates

  • LPDs in documents included as strict templates cannot be activated -- a template is always parsed using the base (the first) DTD only; by extension, nested templating (where a template is applied that itself can apply templates) isn't supported
  • documents included as strict templates cannot declare and/or use data entities; normally, data entities (other than CDATA data entities) are serialized as-is (as reference, rather than as expanded text) to the output, but a template's data entities aren't declared in the master's DTD, nor are the data entities declared in a master's DTD visible to the template so references to data entities are rejected in template content

  • documents included as strict templates cannot use referential attributes (those with #CURRENT default, or with ID, IDREF, IDREFS, ENTITY, ENTITIES, or NOTATION declared value) because these interfere with the master's context such that invalid markup could be produced

  • an element on which a strict template is applied must have declared exclusion exceptions that are at least as strict as those effective on the element in the master context where the template is applied, if any; that is, the exclusion exceptions declared in the master context's DTD on the element on which a strict template is applied must exclude all elements which are contextually excluded at the position where the template is applied in the invoking master context (ie. not only on the element directly but also on it's parent elements)
  • documents included as strict template shouldn't use element declarations; declaring and using new elements from within a template will produce invalid markup if the master context hasn't the IMPLYDEF ELEMENT YES feature; note the master context is expected to enforce this by declaring e.g. a content model on the element on which the template is applied (rather than allowing ANY content on it); by declaring a content model on the template element, template processing can't place elements that are undeclared in the master as template child content; note that because it can be expressed by the master DTD, template processing itself (when not constrained by the master DTD) does not enforce that no element is used which is declared in the template

Note documents included as strict template can (by default) declare and introduce new attributes, since the IMLYDEF ATTLIST YES feature is set by default, and output serialization will always produce a normalized attribute specification output (such that e.g. short forms of attribute specification in a template are always serialized in canonical form to the output, irrespective of what short forms are supported in the master context output) (TODO: re-check this).

Note that it is generally advisable to use the same settings for SYNTAX NANMECASE ENTITY and SYNTAX NAMECASE GENERAL, because the templating mechanism transports values via attributes (which are subject to SYNTAX NAMECASE GENERAL) into entities (which are subject to SYNTAX NAMECASE ENTITY) of the same name.

Lax templates

In "lax" templates, the template doesn't have to declare it's DTD using <!DOCTYPE ... SYSTEM ..., but must use an external subset with the same system identifier as the external subset that the invoking master context uses.

The template accesses any markup declarations that it shares with the invoking master context via regular processing of markup declarations in the shared external subset. Parameter entity reference in the external subset are not pre-expanded, but are expanded in the template processing context. Note this means that expanded values for parameter entities in the template processing context could be different from those of the master processing context.

Lax templates, in general, aren't guaranteed to produce content that is valid according to the DTD of the invoking master process; instead, this must be ensured by proper DTD and/or template authoring.

Exceptions use case

A driving use case for lax templates is to propagate context-sensitive exclusion exceptions into template processing contexts. In order to make use of this, the used DTD must play along (have a placeholder where exceptions are to be inserted), and we need a way to materialize, into a data attribute, the exclusions effective at a given template expansion place.

The intent of the following hypothetical markup declarations is to allow <script> anywhere inside <sub> except when descending from <usercontent>. The template is invoked on <sub>, thus allowing <script> and we want to make it so that it isn't allowed in the markup declarations the template is using:

<!-- doc.dtd -->
<!element doc - - (sub|p|usercontent)+>
<!element sub - - (p+) +(script)>
<!element p O O (b|i|#pcdata)+>
<!element b - - (#pcdata)>
<!element i - - (#pcdata)>
<!element script - - cdata>
<!attlist sub ref entity #conref>
<!element usercontent - - (sub+) -(script)>

<!-- master.sgm -->
<!doctype test system "doc.dtd" [
  <!entity template "doesn't matter">
]>
<!linktype lnk [
  <!entity template system "template.sgm" ndata sgml>
]>
<doc><sub>blabla</sub><usercontent><sub ref=template></usercontent></doc>

<!-- (lax) template.sgm -->
<!doctype test system "doc.dtd">
<sub>formatted user contributed content</sub>

To implement enforcement of the contextual constraints on sub, as used in child content of usercontent elements, we change the DTD to use the script_or_noscript parameter entity for holding the exception string used in the DTD, ie. +(script) in the master's view of the DTD, and -(script) in the templates view, supplying the exception string-(script)` as template parameter in the latter case:

<!-- doc.dtd -->
<!element doc - - (sub|p|usercontent)+>
<!entity % script_or_noscript "+(script)">
<!element sub - - (p+) %script_or_noscript>
<!element p O O (b|i|#pcdata)+>
<!element b - - (#pcdata)>
<!element i - - (#pcdata)>
<!element script - - cdata>
<!attlist sub ref entity #conref>
<!element usercontent - - (sub+) -(script)>

<!-- master.sgm -->
<!doctype test system "doc.dtd" [
  <!entity template "doesn't matter">
]>
<!linktype lnk [
  <!notation template_notation public "... SGML pubid ...">
  <!attlist #notation template_notation script_or_noscript cdata #required>
  <!entity template system "template.sgm" ndata template_notation [script_or_noscript="-(script)">
]>
<doc><sub>blabla</sub><usercontent><sub ref=template></doc>

<!-- (lax) template.sgm -->
<!doctype test system "doc.dtd" [
  <!entity % script_or_noscript system>
]>
<sub>formatted user contributed content</sub>

Note that if we had placed +(script) on doc, and don't the template wouldn't be allowed to use <script> either because it's only introduced on doc and the template doesn't start on <doc> hence can't use <script>

Similarly, other limitations of strict templates can be solved using parameter entity techniques such as changing REFID attributes (in the master's DTD view) into enumerated value attributes (in the template's view), or changing ID attributes into #FIXED attributes.

Inline templates

Strict templates using an empty document prolog (which is interpreted as <!doctype #implied system> when the IMPLYDEF DOCTYPE YES feature is active) can be specified inline, rather than in an external file, like this (using a literal storage manager FSI):

<!doctype doc [
  <!element doc - - (sub+)>
  <!element sub - - (#pcdata)>
  <!attlist sub ref entity #conref>
  <!entity e system "doesn't matter">
]>
<!linktype lnk #simple #implied [
  <!notation sgml public "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)">
  <!entity e system "<literal><sub>text from referenced entity</sub>" ndata sgml>
]>
<doc><sub ref=e></doc>

Inline templates can reference template parameters as general entity references: when IMPLYDEF ENTITY YES is effective, any undeclared entity is implicitly declared system-specific, ie. treated as if it were declared <!ENTITY ent SYSTEM>; so if a template doesn't make use of parameter entities or other markup declarations, it can omit the document prolog and still reference template parameters; note that parameter entities are not subject to IMPLYDEF ENTITY and must always be expclitly declared.

Use of templates in multi-stage processing pipelines

Since a template is processed in a separate SGML processing invocation, the result of a template application is not visible to the master context from where the template is applied. As a consequence

  • link rules descending from a state where a template is applied (e.g. link rules reached via a #USELINK target from a link rule that determines a template) aren't reached; if a template is applied, no further link state propagation is performed on child content of the element on which the template is applied; for these reasons, link rules that assign templates should declare #USELINK #EMPTY

  • in a pipeline of multiple explicit link processes, or a pipeline consisting of an explicit link process followed by the single allowed implicit link process, a template can only be applied on the last link process; produced template content isn't visible to further link processing stages

Query result set formatting

Templates can also be used for formatting query results from SQL or SPARQL databases.

SQL and SPARQL endpoints deliver query results in a tabular fashion, ie. as ordered result set of tuples (or rows), where a tuple is a set of item-value pairs mapping query column names to result value strings.

In query result formatting, the SGML processor invokes template processing on every row of a result set, and concatenates the produced markup of the (potentially more than one) template invocations into result markup.

Formatting query results involves

  • a query as a CDATA data entity in a query notation (ie. either in the SQL or he SPARQL notation (ie. declared using either the +//IDN sgml-cms.net//NOTATION sql-query or the +//IDN sgml-cms.net//NOTATION sparql-query public identifier, respectively), and

  • a formatting template which receives and accesses bound column values as (system-specific) entities, and which is specified on the query template as value of a NOTATION attribute.

For example, the following file, biblio.sparql, contains a SPARQL query for fetching bibliography data from an external data source (in the vocabulary used by the Zotero blbliography management application):

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX res: <http://purl.org/vocab/resourcelist/schema#>
PREFIX z:   <http://www.zotero.org/namespaces/export#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT $item $resource $resourcetype $title $givenname $surname $name
FROM <zotero-bilbliography.rdf>
WHERE {
  $item rdf:type z:UserItem.
      $item res:resource $resource.
      $resource rdf:type $resourcetype.
      OPTIONAL {$resource dcterms:title $title}
      OPTIONAL {$resource dcterms:creator $creator}
      OPTIONAL {$creator rdf:type foaf:Person}
      OPTIONAL {$creator foaf:givenname $givenname}
      OPTIONAL {$creator foaf:surname $surname}
      OPTIONAL {$creator foaf:name $name}
}

The following SGML document executes the query against the zotero-bibliography.rdf RDF file containing bibliography data exported from Zotero, and formats the results via an inline tempate:

<!DOCTYPE HTML [
  <!ELEMENT HTML O O (DIV|P)+>
  <!ELEMENT DIV - O (#PCDATA|P|SPAN)+>
  <!ELEMENT P O O (#PCDATA|A)+>
  <!ELEMENT SPAN - - (#PCDATA)>
  <!ATTLIST SPAN PROPERTY CDATA #IMPLIED>
  <!ELEMENT A - - (#PCDATA)>
  <!ATTLIST A HREF CDATA #IMPLIED TITLE CDATA #IMPLIED>
  <!ENTITY bibliography_rdf_graph_ns "http://zotero.org/users/local/Rk1rxAG1/items/">
  <!ENTITY bibliography_rdf_graph_location "zotero-bibliography.rdf">
  <!ATTLIST DIV REF ENTITY #CONREF PROPERTY CDATA #IMPLIED>
  <!ENTITY bibliography "placeholder">
]>
<!LINKTYPE BIBLIOGRAPHY #SIMPLE #IMPLIED [
  <!NOTATION SGML PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)">
  <!NOTATION BIBLIO PUBLIC "+//IDN sgml-cms.net//NOTATION sparql-query">
  <!ATTLIST #NOTATION BIBLIO
      SUPERDCN NAME #FIXED SGML
      TEMPLATE NAME #REQUIRED
      ENDPOINT CDATA #FIXED "roqet --quiet --results tsv --exec">
  <!NOTATION BIBLIOENTRY
      SYSTEM '<literal>
            <div property="dcterms:creator">
              <span property="foaf:givenname">&givenname</span>
              <span property="foaf:surname">&surname</span>
              <span property="foaf:name">&name</span></div>'>
  <!ATTLIST #NOTATION BIBLIOENTRY
      SUPERDCN NAME #FIXED SGML
      GIVENNAME CDATA #IMPLIED
          SURNAME CDATA #IMPLIED
          NAME CDATA #IMPLIED>
  <!ENTITY BIBLIOGRAPHY SYSTEM "biblio.sparql" NDATA BIBLIO [ TEMPLATE=BIBLIOENTRY ]>
]>
<html>
...
<h2>Bibliography</h2>
<div ref=bibliography>
</html>

As shown in the following sections, SPARQL and SQL queries can also be supplied inline as notational content (rather than via an external file).

Query parameters

Data attributes of a query notation (other than the template data attribute which is treated special as described) are supplied as substitution variables to the SPARQL or SQL query processor and can be referenced in the query notation code text in a similar way as entity references in SQL code text.

For example, the following declaration of a SPARQL query notation takes the color query parameter and substitutes the supplied value in place of the &color parameter reference:

<!notation sgml ...>
<!notation query public "+//IDN sgml-cms.net//NOTATION sparql-query">
<!attlist #notation query
	superdcn name #fixed sgml
	template name #required
	color cdata #required>

<!entity flower-by-color-query system "<literal>
	PREFIX plants: <http://plants.org/plants-vocabulary-1.0/>
	SELECT ?flower
	WHERE { ?flower :color &color }"
	ndata query [ template=... color="yellow" ]>
]>

Note that even though the & (ampersand) character is used in SGML, SPARQL, and SQL in a similar way, recognition and interpretation of "entity references" in query notation content is still handled in a query-notation specific way.

Note query parameters are supplied to the query processor and are not automatically supplied to the query result formatting template.

The SPARQL processor used by sgmljs.net will evaluate SPARQL queries against a local default tuple store (ie. database); see SPARQL examples on how to supply a specific SPARQL endpoint (IP address or DNS name or a particular RDF file) to query.

SQL queries

The same technique used for SPARQL can also be used with SQL. For example, the following document formats the results of an inline SQL query to query all stored names when using GENDER_CD 0 as query parameter (it also demonstrates use of connection parameters):

<!DOCTYPE HTML [
	<!ELEMENT HTML O O (DIV|P)+>
	<!ELEMENT DIV - O (#PCDATA|P|SPAN)+>
	<!ELEMENT P O O (#PCDATA|A)+>
	<!ELEMENT SPAN - - (#PCDATA)>
	<!ELEMENT A - - (#PCDATA)>
	<!ATTLIST DIV REF ENTITY #CONREF PROPERTY CDATA #IMPLIED>
	<!ENTITY femalenames "placeholder">
]>
<!LINKTYPE LISTBOOKS #SIMPLE #IMPLIED [
	<!NOTATION SGML PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)">
	<!NOTATION SQLQUERY PUBLIC "+//IDN sgml-cms.net//NOTATION sql-query">
	<!ATTLIST #NOTATION SQLQUERY
		SUPERDCN NAME #FIXED SGML
		TEMPLATE NAME #REQUIRED
		GENDER_CD CDATA #REQUIRED
		SQLITE_DB_FILE CDATA #REQUIRED>
	<!NOTATION NAMEENTRY SYSTEM '<literal><div><span>&name</span></div>'>
	<!ATTLIST #NOTATION NAMEENTRY
		SUPERDCN NAME #FIXED SGML
		NAME CDATA #IMPLIED>
	<!ENTITY femalenames SYSTEM "<literal>
set colsep '	'
set underline off
connect 'Driver=SQLite;Database=&sqlite_db_file'
select name from names_tbl where gender_cd = cast('&gender_cd' as decimal);"
		NDATA SQLQUERY [ TEMPLATE=NAMEENTRY GENDER_CD="0" SQLITE_DB_FILE="/Users/guesswho/Repositories/markdown-awk/test/test.db" ]>
]>
<html>
<div ref=femalenames>
</html>

When supplying values to SQL, the SGML processor will allways escape single quotes occuring in supplied values into double quotes (TODO: OWASP link).

SQL queries should always use received values as string literals, ie. in single quotes.