As a straightforward application of SGML
basic entity substitution
and templating on the web,
page.sgm, shows a very simple
example of a SGML document, that, when accessed
as web page at
name entity reference substituted into
Tom, and returned as response to a browser
(or can be rendered entirely within a web browser
running SGML User Agent):
<!doctype html [ <!element html - - any> <!element p - - (#pcdata)> <!entity param system> ]> <html> <head> <title>SGML page</title> </head> <body> <p conref=e> </body> </html>
Provisioning SGML on the Web is based on interpreting HTTP request
URLs as file names according to terms and concepts in use for the
longest time on the Internet (
For the purpose of resolving a URL to a file or other resource, a
web host passes a request URL and other request parameters as value
PATH_TRANSLATED and other system-specific entities
to SGML processing, The SGML processor then either prepares HTML from SGML,
or just serves static files, depending on what media type the user agent
has indicated to accept in the request, and on what files are found
to exist on the server's file system at the resolved location.
Interpretation and modification of
PATH_TRANSLATED is analogous to
what a classic CGI script receives via the
the classic scenario assumes there's a common document root
(or web root) directory wherein a CGI program is looked up in
a designated script directory. The program, if found,
is then executed with the trailing part of the request URL
(everything following the portion used to locate the program/script
PATH_TRANSLATED is derived from
PATH_INFO by resolution against the document root,
resulting in an absolute path name.
PATH_TRANSLATED can be alternatively computed (without
knowledge of a document root directory (and provided
SCRIPT_NAME is absolute against the web server's file
system root, which it is typically not, however) by starting
with the directory where
SCRIPT_NAME resides and going
back as many parent directories as there are path components
SCRIPT_NAME, then appending
If a static file is requested, as determined by requesting
a resource name (in
PATH_TRANSLATED) having a dot in its last
it is served "as-is" (as a static file), with a media type
Content-Type) derived from the file extension
note the HTTP
Accept-header isn't checked in this case
commonly requested static file types include prerendered
.html files as well as
.js, and image files
otherwise, if a static resource by the name of
name doesn't exist, a
404 NOT FOUND HTTP response is generated
PATH_TRANSLATED doesn't have a dot), if
can be resolved as a SGML file by appending the
text/sgml is accepted by the request
the SGML gateway determines
scriptName as master SGML file/template
and sends the static master file
otherwise, and by default (and if either no
Accept header is
present in the request or its value is
text/html or a wildcard)
PATH_TRANSLATED is processed for producing HTML,
and the output is served astext/html` response
PATH_TRANSLATED cannot be resolved to a SGML file)
if the first path component of
PATH_TRANSLATED (or the longest
sequence of consecutive path steps contained twice in
PATH_TRANSLATED) can be resolved as an SGML file by appending
sgm file extension,
text/sgml is accepted for the response, the resolved file
is served statically from the resolved file as
otherwise, the resolved file is processed for producing HTML,
and the output is served as
the remaining part of
PATH_TRANSLATED (not including the
initial part up to and including the resolved SGML file)
is resolved against the web root directory to an absolute path,
and supplied as the
PATH_TRANSLATED system-specific entity to
Otherwise (when the request's
PATH_TRANSLATED value couldn't be
interpreted in any of the ways explained) a
404 NOT FOUNDHTTP
response is generated.
Whenever a SGML file is selected for processing, the file's
modification date is checked against the value of the
HTTP header, if present in the request. If the file is older than
last-modified value, it's not processed, and a
304 NOT MODIFIED
HTTP response is returned to the client instead. In case the processed
SGML is determined from the initial portion of
modification date of the file name denoted by the remaining part,
is checked as well.
sgmlweb produces HTTP GET responses for
locating an SGML file with a
.sgm file suffix in the
web document root directory matching the request URI
in the most obvious way
invoking SGML processing on the located SGML with activating
HTML target document type name
WEB link process
HTTP link process
returning the produced SGML processing output to the the requestor, with HTTP response headers populated as discussed below.
This means that on SGML files in the web root directory
that don't have
HTML as base document type name,
a link pipeline is inferred from the link process
declarations present in the prolog of the SGML file
such that the ultimate result document type name
produced by the link pipelining is
an error if neither the base document is
nor a target document type name of
HTML can be
produced by forming a valid sequence of link processes
from those declared in the link process.
Moreover, sgmlweb activates the
link processes. If
HTTP are declared in
the document prolog, any inferred link process pipeline
will always contain the
HTTP link process,
respectively (but either
HTTP can be omitted
as described below).
The following system-specific entities are exposed to SGML and can be declared.
Absolute path to master SGML file (primary SGML being accessed)
URL portion following the part identifying
SCRIPT_NAME, if any
includes a leading slash character
Absolute path to the file corresponding to
is set/if there's an URL portion following the part identifying the
master SGML file)
Content of the file corresponding to
PATH_TRANSLATED, if any
HTTP method used for the request (eg.
The following system-specific parameter entities/CGI meta-variables
are additionally made available (see
for an explanation) when either declared manually or conditionally
declared via referencing a parameter entity for the
+//IDN sgmljs.net/ENTITIES CGI 1.1//EN public identifier:
are not necessarily (or even typically) passed internally from a web server
to SGML, but are what SGML passes to an SGML document processing context.
If a request URL consists of just a path name identifying an SGML resource
PATH_INFO, and hence no
PATH_TRANSLATED etc. system-specific entities
are exposed and accessing (or even declaring) those is treated as error.
To be able to process requests both with and without
PATH_TRANSLATED using the same master document,
CGI meta-variables can be declared using the
//IDN sgmljs.net//ENTITES CGI 1.1//EN public text like this
<!DOCTYPE html ... [ <!ENTITY % cgivars "+//IDN sgmljs.net//ENTITIES CGI 1.1//EN"> %cgivars; ... ]>
+//IDN sgmljs.net//ENTITIES CGI 1.1//EN public text
in a declaration set as shown is equivalent to declaring the
CGI meta-variables as both system-specific general and parameter
entities manually. However, the
entities (and the
PATH_TRANSLATED_CONTENT as a general entity)
are only declared if actually supplied in the processing context.
In particular, fallback values for those can be specified in the document itself. For example, the following declaration set
<!DOCTYPE html ... [ <!ENTITY % cgivars "+//IDN sgmljs.net//ENTITIES CGI 1.1//EN"> %cgivars; <!ENTITY PATH_TRANSLATED "somefile"> ]>
assumes values for
PATH_TRANSLATED as obtained from the trailing
part of a request URL. However, if the request URI doesn't contain
a trailing part following after the part that identifies the master
%cgivars will leave
hence the subsequent entity declaration for
supply the effective value for it.
In this way, a master document can assign a fallback value for an absent file name (eg. as derived from an absent secondary path step in a HTTP request URI) for a client document such as the file name of the latest or otherwise most relevant client document of a document collection sharing a common path prefix.
While CGI meta-variables represent data handed by the web server to SGML, HTTP response meta-variables (such as the HTTP response status) are data returned from SGML processing to the web server along with result markup as response body.
Conceptually, HTTP response meta-variables are represented
as link attributes of a simple link process. A simple
link process declares link attributes on the document element
of the response body carrying HTTP response meta-variables.
HTTP response link attributes are declared in a distinguished
link process declaration identified by the
+//IDN sgmljs.net//LPD HTTP 1.1//EN and
+//IDN sgmljs.net//LPD HTTP 2.0//EN public text identifiers.
These LPDs behave as if declared as follows:
<!ENTITY % HTTP_RESPONSE_STATUS "200"> <!ENTITY % HTTP_RESPONSE_CONTENT_TYPE "text/html"> <!ENTITY % HTTP_RESPONSE_LOCATION ""> <!ENTITY % HTTP_CACHE_VALIDATION_ENTITIES ""> <!ATTLIST html status NUMBER #FIXED %HTTP_RESPONSE_STATUS location CDATA #FIXED "%HTTP_RESPONSE_LOCATION" content-type CDATA "%HTTP_RESPONSE_CONTENT_TYPE" cache-validation-entities ENTITIES "%HTTP_CACHE_VALIDATION_ENTITIES" ...>
To make sgmlweb return values for response meta-variables to user agents other than the defaults, a master document declares an LPD with one of the distinguished LPDs as external subset, and then preempts one or more of the parameter entities used as default values for link attributes. For example, the following master document makes sgmlweb send a 404 HTTP status to a web browser:
<!DOCTYPE html ... [ ]> <!LINKTYPE http PUBLIC "+//IDN sgmljs.net//LPD HTTP 1.1//EN" [ <!ENTITY % HTTP_RESPONSE_STATUS "404"> ]> ...
Note that the name of the link process must be
other LPDs referencing the public identifier for HTTP response meta-variables
won't get activated (and hence ignored) by sgmlweb.
HTTP_RESPONSE_STATUS and other parameter entities
can be preempted from any link process, not just from the
link process, subject to declaration set preemption.
Note since the LPD determines the names of parameter entities
it accepts as
#FIXED values, there's no need to have link
processing determine link attributes; all that has
to happen is that a respective LPD ("deriving" from a distinguished
response LPD) is declared and activated; the effective values
of the respective parameter can be queried from entity management
just as request parameters.
In effect, specifying values for response meta-variables is syntactically very similar to declaring request parameters.
Note the LPD is tied to the
html response document element,
The following parameter entities, when declared/preempted as described, have these respective meaning:
numeric HTTP status to respond
the valid HTTP response status codes are 100, 101, 200, 201, 202, 203, 204, 205, 206, 300, 301, 302, 303, 304, 305, 307, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 426, 451, 500, 501, 502, 503, 504, and 505
of those, all requests with non-2xx or with 204 NO CONTENT status get terminated after prolog processing, without producing a response body, and a generic response body for 4xx and 5xx responses, if applicable, is populated by the web server instead
the reason phrase (such as
NOT MODIFIED for 304 responses)
are also generated by the web server
on a 301 MOVED PERMANENTLY, 302 FOUND, 303 SEE OTHER (on POST),
and 307 TEMPORARY REDIRECT response status, a redirect URL
is configured in the
HTTP_RESPONSE_LOCATION parameter entity;
it is an error if the
HTTP_RESPONSE_LOCATION isn't declared/preempted
(and will lead to a 500 INTERNAL SERVER ERROR response)
Redirect-URL for 301 MOVED PERMANENTLY, 302 FOUND, 303 SEE OTHER and 307 TEMPORARY REDIRECT; see above
response media type
preempted values must match the syntax of an RFC 1521 media type
application/xhtml+xml with optional parameters)
application as (main) type; other values
will lead to a 500 INTERNAL SERVER ERROR generic error response
space-separated list of entity names naming file names
declared in the document prolog, the youngest of which
is used to populate the value of the
QUERY_STRING system-specific entity contains the
query part of an URL if the request URL contains a query
such as in
Moreover, the value of
QUERY_STRING is parsed according
to the rules for HTML form-encoding.
and any individual query parameters (such as
in the above example) are made available to SGML
processing and can be accessed by declaring
a system-specific (general or parameter) entity
with the respective name.
Note case-folding of system-specific entities is applied according to SGML declaration or out-of-band settings while URI query parameters are not made uppercase when supplied as values for system-specific entities. This means that when NAMECASE ENTITY YES is effective for SGML processing, query parameters must be supplied in uppercase letters in the request URI, and must be supplied in the request URI in the exact sequence of upper- and lowercase letters specified used for the system-specific entity declaration otherwise.
Note mapping query parameter to system-specific
entities is only useful if the query parameters in an
URI are unique; when the URI contains multiple
key=value pairs for the same key (such as
produced by browsers when submitting an HTML form
with multiple same-named fields), this model of
accessing and processing query parameters isn't
useful, and the technique of parsing the raw
QUERY_STRING value via SGML short references
as explained below is used instead.
System-specific entities received from HTML
query parameters in HTTP GET or POST request URIs can be
escaped by declaring those as external data text entities
<!NOTATION some-notation SYSTEM "..."> <!ENTITY uri_parameter1 SYSTEM CDATA some-notation>
Characters used for markup delimiters such as
replacement text for data entity references get replaced by
numeric character entity references when expanded in content
and are not interpreted as markup delimiters by SGML.
System-specific entities not declared as
data entities don't receive HTML escaping, thus
can potentially contain malicious markup such as
script elements, that, when expanded into a context
without further constraints (such as element exclusion
exceptions) generally represent a security threat.
Therefore, declaring HTML form-like parameters in
URLs or transferred otherwise should
generally be declared as data entities, unless
these transferred entities should explicitly
represent markup as explained below.
While declaring an entity as data entity as shown already ensures escaping of HTML markup delimiters, sgmlweb provides a distinguished notation with the public identifier
+//IDN www.w3c.org/TR/html5//NOTATION HTML 5 Form Input Types//EN
representing lexical value spaces of HTML form input values.
The notation representing form input value types is declared as follows:
<!NOTATION html5-form-input PUBLIC "+//IDN www.w3c.org/TR/html5//NOTATION HTML 5 Form Input Types//EN"> <!ATTLIST #NOTATION html5-form-input TYPE (text|email|url|number|date|time|datetime) text PATTERN CDATA #IMPLIED>
A system-specific entity can make use of this notation to obtain extended lexical value checks. For example, the declaration
<!ENTITY formparam SYSTEM CDATA html5-form-input [ type="email" ]>
declares the form-like input parameter
formparam as having
me@home but not
Likewise, entities declared as form input value data entities with
are checked against the respective validation rules of the
HTML 5 specification.
Any HTML form input value type will have the effect that value
normalization is performed on the respective entity replacement
value. The most general
text input value accepts any input
text, and performs value normalization by removing all newline
characters from the entity value. Other types perform value
checks and normalizations in addition to the value normalization
text type values. Of the supported input types
for validation, only the
number input types performs additional
value normalization (namely, any
+ characters are removed,
an uppercase letter
E exponent separator is changed to
lowercase, and a leading zero is added where a number value
begins with a decimal point/dot character).
pattern data attribute can contain a regular
is checked against in addition to the check and normalizations
implied by the
are supported. In particular, lookahead operators, Unicode
code points, and other PCRE-specific constructs other than
\s special symbols are not available.
HTML form input lexical types are also available for WebSGML attribute data specifications. A declaration such as
<!ATTLIST elmt attr DATA html5-input [ type="email" ]>
will make sgmljs.net SGML check and value-normalize the
value against the rules for email addresses.
Likewise, data attribute can also be declared for notations, such as used for template notations. For example, the following declaration
<!NOTATION sgml ...> <!ATTLIST #NOTATION sgml attr DATA html5-input [ type="email" ]>
attr data attribute of the
as having type
sgml notation template.
A 400 BAD REQUEST HTTP response status is emitted by
sgmlweb when form input validation fails on one or more
system-specific entities supplied via HTML form-like GET variables
(or, equivalently, variables POSTed in
request bodies). A 4xx status is only generated when the
validation is performed on a data entity declared
as having HTML form input lexical value, rather than as
a data attribute declared as having a HTML form input
lexical value (a
DATA attribute), for which an
unspecifc 500 HTTP status is reported instead.
The latter restriction is because form input validation
might not be tightly traceable to an input value (eg. because the input value is composed of expanded general entity replacement text)
is performed lazily as part of content parsing rather than prolog parsing (hence can't be reported as HTTP status in a HTTP header which must be determined before content parsing).
A 400 BAD REQUEST signifies to the client/web browser that a request is malformed (whereas a 5xx status can be interpreted as advice to retry a request at a later time), so is generally the more appropriate status to return on form input value errors.
sgmlweb built-in support for HTML form-like GET with query parameters contained in the request URI is also applied automatically on HTML form POST requests having application/x-www-form-urlencoded media type (HTML form-like GET requests include static requests with an URI query part parameters indistinguishable from form-like URI query parameters).
Values transferred in HTML form POST request bodies with application/x-www-form-urlencoded media type are exposed exactly the same as request URI parameters. As far as query parameters are concerned, POST request bodies differ from form GET requests only in that they transfer query parameters in the request body rather than as part of the request URI. Therefore, the request URI for form POST requests with application/x-www-form-urlencoded request bodies must not contain a query part, since the query part is assumed to be contained in the request body.
For HTTP POST queries (TODO: what about GET?) with
application/x-www-form-urlencoded media type, the CGI-meta variable
QUERY_STRING is exposed as system-specific entity to
the SGML processing context. In addition,
is provided, containing a variant encoding for
& (ampersand) characters are replaced by
characters so as to be more useful for interpretation as SGML.
Specifically, raw query string (
processing via SGML short references is necessary when
the URI query string contains multiple key=value pairs
for the same key, such as is commonly emitted from HTML
forms containing tabular data or multiple repeated field
Request bodies other than those with
content type (which are read and processed as described above)
can be accessed via
<!NOTATION some-notation SYSTEM "..."> <!ENTITY raw_reqest_uri_query_part SYSTEM "<osfd>0" CDATA some-notation>
If processing results in an 4xx or 5xx efffective HTTP status, either by regular sgmlweb processing or via setting a custom HTTP response status as described above, sgmlweb attempts to render an error response body. Rendering error responses is no different from rendering regular SGML pages, but any request parameters are cleared and not available because the request parameters might be erroneous or malicious, hence might make rendering an error response body fail again for the same reason that regular sgmlweb rendering has already failed for the requested URL and request parameters.
When rendering an error response, the system-specific
STATUS is available (as a read-only entity)
in the processing context.
A simple error page might look like the following example:
<!DOCTYPE html SYSTEM "about:legacy-compat" [ <!ENTITY STATUS SYSTEM> ]> <html> <head> <title>Error &STATUS</title> </head> <body> <p>Error serving requested page</p> </body> </html>
Likewise, the name of the error page (
/error.sgm) is hard-coded
in some sgmlweb builds, but can be chosen freely as part
of the configuration of regular request processing handler
chains in other sgmlweb builds (such as for Node.js).
Files resolved by the SGML Web Server Gateway itself are passed as
open file descriptors to SGML processing, such
that those can be accessed using
<osfd> FSIs. The processing
environment can access up to five file descriptors:
POSTed body content, when used and supported in the request
stdout): output of SGML processing; can be a file or a buffer
stderr); error output and log destination
3 main input; contains character data from master file
(resolved using either the complete path as of the initial value of
PATH_TRANSLATED, or just the first path component of
4 file descriptor containing character data resolved using the
remainder potion of
PATH_TRANSLATED if file descriptor #3 was
resolved using only the first path component
Before passing control to core SGML processing, the SGML gateway
(on select execution environments) pre-opens the
PATH_TRANSLATED, if relevant, as
Accessing open file descriptors rather than opening files by path name from main SGML processing as needed avoids race conditions and has generally desirable properties wrt. exploiting POSIX file system guarantees for atomic/continued/high-available content delivery in the presence of concurrent content change and maintenance.
Specifically, this is done to be able to guarantee that, after SGML prolog parsing, no request processing will fail due to missing template and/or client document files, and that the content of determined files remains accessible to SGML processing even if it is subject to concurrent change or deletion during processing.
the template and client files (if any) are held open by the SGML Web Server Gateway process, so those files can be atomically changed while request processing on previous content is underway (due to Unix file system guarantees)
note that these guarantees do not hold for further external entities
other than the content of
PATH_TRANSLATED from the main template file itself
that might be referenced from the template file (or the client file if it
is atypically transcluded as template and can have entity declarations)
404 NOT MODIFIED HTTP status can be send, rather than beginning
the response with a
200 OK status and then detecting non-existence
during content processing and having e.g. to include error message
character data along with user content
to this aim, SGML processing takes advantages of being designed such
that no output character data is written before the first content arrives
in output buffer handling, ie. prolog data is buffered until actual
content begins, hence HTTP
404 NOT FOUND or other non-default status
can be set at the end of SGML prolog processing
Request processing is performed under the assumption of sending
a default HTTP result of
200 OK unless set explicitly to
another status, and assuming that, in general, the HTTP result
status can't be changed once any output has been emitted.
Request processing is performed such that the complete SGML
prolog of the document instance to process is validated before
emitting any output. On any prolog parsing errors (including when
system-specific parameter entities couldn't be resolved),
processing is aborted, and a proper
404 NOT FOUND or
500 INTERNAL SERVER ERROR HTTP status, depending on whether
eg. an operating system error of
ENOENT or non-
was encountered) is generated.
Parsing, resolution, or other errors during content parsing, on the other hand, can't typically be reported via HTTP error status codes because response headers will have been sent to the client alraedy by the time a content error is encountered.
As already explained for the individual routing branches,
based on the above sketched file name resolution for
before actually accessing and sending content, up-to-datedness of the
client document is checked; if it hasn't changed since the date and time
of the last modification, a
304 NOT MODIFIED HTTP response is generated.
Note that access policies etc. don't play into here as the content body
isn't transferred with 304 responses.
The SGML User Agent (
program for web browsers designed to produce HTML from SGML
in the same way that HTML is produced from SGML on a
SGML Web Server, thereby transparently
offloading SGML processing to the browser, and at the same
time saving network bandwidth by avoiding redundant network
transfer of repeated partial page content.
While the SGML User Agent is designed to run against a SGML Web Server, it can also run against any other (e.g. simple static) web server lacking SGML support, in browser-only mode, with reduced user agent functionality. More generally, a SGML web setup can involve:
browser-only processing: SGML files are accessed as static files from the web server and then rendered into a displayed HTML DOM on the browser
The SGML User Agent, when started, determines if the web page
is running from a server with server-side SGML support by inspecting
page metadata in the HTML
head element. If the
does not contain
<link rel="alternate" type="text/sgml" ...>
then the SGML User Agent assumes it is running off a web server without server-side SGML rendering support.
Server-side SGML support is required for proper session history, whereas when server-side SGML support isn't advertised via the link element as shown, browser-refresh and back-navigation from an external site linked to from the SGML webpage will take the user to the initial landing page (the static or prerendered HTML page carrying the SGML User Agent script). Morevoer, bookmarking works only with server-side SGML support.
The basic functionality of the SGML User Agent is to, on
click handlers to the current
document's local (same-domain) links performing SGML page
rendering (transforming SGML content to HTML/DOM).
Specifically, this is enabled on anchors that have the same
effective protocol/host/port as the invoking page, and that
either have no
type attribute specified, or have
specified as its value.
Once a page is rendered using SGML, its anchors get captured by SGML event handling for further navigation within the domain name, in turn.
popState() works in a natural way for basic
forward and backward navigation): if we're about
to navigate to another page (on the same domain so rendered
via SGML), we're just storing the previous page via pushState(),
with the URL used to fetch SGML (or HTML on the initial page).
When we return to this state via backward navigation, the
popstate event handler will pop the state and start re-rendering the
HTML from the pushed
href URL in the same way the SGML page was rendered
when first visited.
On a page (browser) refresh, the browser reloads
window.location using the regular browser page loading algorithm.
When the server can render SGML to HTML (as triggered by an
HTTP Accept header favouring HTML over SGML) there's nothing special
to do here, since the re-visited page gets rendered server-side
(and carries the
sgml-ua.js script to attach to link handlers for
further browser-local SGML processing, just like on an initial
On the other hand, when working against a static web server,
window.location will fetch SGML text, and browsers will render
the SGML code text as either plain text (Chrome) or possibly
broken HTML (FF, IE). Therefore, support for static servers
involves further browser history manipulation.
Blocking browser refresh isn't possible in general. There exist various attempts/scripts to accomplish this by either
intercepting key events (but these techniques won't
handle clicking on a browser refresh icon button); what
can be achieved here (by either returning a non-void value
beforeunload event handling, by setting the
returnValue, or by calling
is to bring up a "do you really want to leave?"
warning, but the reload action as such can't be prevented.
establishing a new browser context in an
or a HTML4
frame (see e.g.
Disabling the Back Button),
but these techniques are generally considered user-hostile
not creating history entries in the first place; ie.
Back Button Behavior on a Page With an iframe
(although being about iframes mostly)
can be used to suppress creating history entries; namely,
if, on a
click event, the navigated-to
replaceState()d into the same as the top-most one
then no history entry is created; this could be used
to block any backward (and forward) navigation, but
(if it actually works) is overreaching since it will
disable plain backward navigation between SGML rendered
pages, which isn't a a problem even against servers
without server-side SGML rendering.
So what SGML User Agent does is to ensure that, while a page is up,
window.location points to the landing page of the current
page, ie. the page through which the site was entered,
which in many cases will be the site's home page, but could
be any page carrying the
As final part of SGML rendering (after rendered link
target URLs in the generated page have been changed into
window.location is set to the
landing page URL. When the page is left, the history entry
for the page view is then restored to the original SGML
resource URL, rather than the landing page URL, so that
on plain back navigation, regular
handler execution will render the SGML resource.
The original resource URL is stored in the history entry's data field (for as long as it is shadowed by the landing page URL).
Executing history restoration globally on the
unload/beforeunload (or even
pagehide/pageshow) events isn't
possible, since Ajax page loads don't trigger those
events. Therefore, history restoration is executed
on individual outgoing link click events, along with
SGML processing for the new page.
Note that no history restoration takes place (the
landing page history entry is kept) on outgoing external
links since those don't get a click handler for SGML
processing attached and hence exhibit standard browser
behavior on link activation. For external links we're
can't register handlers; when navigating back
from an external page we must therefore enter
a HTML (not SGML) page carrying the
SGML User Agent starts out on an initial landing page carrying this script
following a link on the page in the same domain will result in rendering the link target using SGML
backward-navigation (to either an earlier rendered SGML page or the landing page) is also be performed using SGML
navigation/following links to external sites will end SGML UA execution and continue with standard browser HTML loading/rendering; on return to a SGML-rendered site, if the site is running on static web server without server-side SGML rendering, the landing page, not the page through which the site was left, is reloaded; only if using server-side SGML can the proper page of departure be loaded
page refreshes take the user to the landing page when not served from a web server with support for server-side SGML; only if using server-side SGML will the current page be reloaded (it's not possible to intercept browser behavior on refresh)
when running off a static web server without server-side SGML rendering, context menus (as activated by right-click or long-click/hold) are blocked to prevent the Open Link in new tab being offered and bookmarking (both of which won't work against a static server)
This section describes facilities for publishing tab-separated value data streams produced from SQL queries or other sources into HTML markup via bundled functions for dynamically generating required markup declarations.
Moreover, a technique for implementing an endpoint for HTML forms submission with support for tabular data insertion (where data is presented to SGML as query string with possibly multiple repeating field groups) is explained.
As explained in the context of short reference parsing, tab-separated values pulled-in from an external source such as a file or SQL query can be made available as markup elements using short reference use and map declarations specific to a particular tab-separated data source stream.
sgmljs.net SGML provides, via custom storage manager notations, bundled functions for automatically generating required short reference declaration for TSV parsing given the names of attributes provided as data attributes. Moreover, further provided markup declaration generators are used to extend basic TSV-parsing to a generic mechanism for formatting tab-separated values, by feeding result tabular data rows obtained from TSV parsing into SGML templating.
For supplying parsed TSV data values to templating, input data must be provided as markup attributes rather than elements. To collect a sequence of (text contents of) elements produced from TSV parsing into attributes, sgmljs.net SGML uses the techniques of
re-mapping of element content into attributes provided
NotNames attribute (part of ISO 10744 DAFE
support) when a template notation is applied on an element
in a link process
propagating attribute values to preceding sibling
#CURRENT link attribute default
To demonstrate these techniques, consider the following example input document representing possible output of TSV parsing setup as discussed in record boundary insertion:
<!DOCTYPE tsv [ <!ELEMENT tsv - - (record+)> <!ELEMENT record - - (field1,field2,field3)> <!ELEMENT field1 - - (#PCDATA)> <!ELEMENT field2 - - (#PCDATA)> <!ELEMENT field3 - - (#PCDATA)> ]> <!DOCTYPE table [ <!ELEMENT table - - (tr+)> <!ELEMENT tr - - (td+)> <!ELEMENT td - - (#PCDATA)> ]> <!LINKTYPE lpd tsv table [ <!NOTATION field1-template PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "template.sgm"> <!NOTATION field2-template PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "template.sgm"> <!NOTATION field3-template PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "template.sgm"> <!ATTLIST #NOTATION (field1-template|field2-template|field3-template) field1 CDATA #CURRENT> <!ATTLIST #NOTATION (field2-template|field3-template) field2 CDATA #CURRENT> <!ATTLIST #NOTATION field3-template field3 CDATA #IMPLIED> <!ATTLIST (field1|field2|field3) field1 CDATA #IMPLIED field2 CDATA #IMPLIED field3 CDATA #IMPLIED template NOTATION (field1-template|field2-template|field3-template) #IMPLIED NotNames CDATA #IMPLIED> <!LINK #INITIAL tsv table record tr field1 [ template=field1-template NotNames="field1 #CONTENT" ] #IMPLIED field2 [ template=field2-template NotNames="field2 #CONTENT" ] #IMPLIED field3 [ template=field3-template NotNames="field3 #CONTENT" ] td> ]> <tsv> <record> <field1>first value</field1> <field2>second value</field2> <field3>third value</field3> </record> <!-- further data following here: <record> <field1>...</field1> ... </record> --> </tsv>
For this example,
template.sgm is expected to contain:
<!DOCTYPE #IMPLIED SYSTEM [ <!ENTITY field1 SYSTEM> <!ENTITY field2 SYSTEM> <!ENTITY field3 SYSTEM> ]> <tr> <td>&field1</td> <td>&field2</td> <td>&field3</td> </tr>
The result markup of processing the example document
lpd link process activated is as follows
(omitting commented text):
<table> <tr> <td>first value</td> <td>second value</td> <td>third value</td> </tr> </table>
The link process contains template notation declarations
for the individual
fieldN elements, and link rules applying
template notation on the respective
Crucially, the result element of all the link rule except
field3 element (the last element of a 'record
, meaning that the template is applied
if the source element (eg.field1
be placed into the result context. Since neitherfield1
can appear anywhere in the result HTML-like
content, no template will apply on thefield1
elements. The link rules placed on thesefieldN`
elements exists solely
NotNames rules, which collect the
text content of the element on which the template is
placed into the
field2 values, respectively
(and also according on
for updating the current values for
field2, respectively, in the link attribute processing
field3 elements shares the declarations
#CURRENT link attributes,
the value for
field2 is transported to
the context for the
field3 element, where the
is applied as regular template, since the result element
tr is expected/admitted at the result context position.
In this way, the content of the
originally in the input source are propagated to the
field3 data (DAFE) attributes, and hence
available as entities in the sub-processing context for
sgmljs.net SGML provides the built-in storage manager
to generate markup declarations
from 'fields and
params data attributes as required for
the above declaration fragments.
The above example rewritten to use markup declaration generators for fetching TSV records and and applying a formatting template looks as follows:
<!DOCTYPE tsv [ <!NOTATION sql SYSTEM> <?IS10744 FSIDR sql tsv_element_decl tsv_entity_decl tsv_shortref_decl tsv_usemap_decl tsv_notation_decl FSIDefDoc="+//IDN sgmljs.net//DTD FSISM TSV parsing declaration utilities//EN"> <!ELEMENT tsv - - (record+)> <!ELEMENT record - - (name,gender_cd)> <!ENTITY % element-decls SYSTEM '<tsv_element_decl container="tsv" record="record" fields="name gender_cd">'> <!ENTITY % entity-decls SYSTEM '<tsv_entity_decl container="tsv" record="record" fields="name gender_cd">'> <!ENTITY % shortref-decls SYSTEM '<tsv_shortref_decl container="tsv" record="record" fields="name gender_cd">'> <!ENTITY % usemap-decls SYSTEM '<tsv_usemap_decl container="tsv" record="record" fields="name gender_cd">'> %element-decls; %entity-decls; %shortref-decls; %usemap-decls; <!ENTITY % query-results SYSTEM "<sql>connect 'Driver=SQLite;Database=test.db' set headings off set colsep ' ' select name, gender_cd from names_tbl where gender_cd = 0 order by name;"> <!ENTITY query-results "%query-results"> ]> <!DOCTYPE table SYSTEM [ <!-- <!ELEMENT table - - (tr+)> <!ELEMENT tr O O (td+)> <!ELEMENT td - - (#PCDATA)> --> ]> <!LINKTYPE lnk tsv table [ <?IS10744 FSIDR tsv_notation_decl tsv_linkattr_decl tsv_linkrule_decl FSIDefDoc="+//IDN sgmljs.net//DTD FSISM TSV parsing declaration utilities//EN"> <!ENTITY % notation-decls SYSTEM '<tsv_notation_decl container="tsv" record="record" fields="name gender_cd" template_sysid="sql-names-gendercd-query-with-aggregation-into-last-field2-referenced-template.sgm">'> %notation-decls <!ENTITY % linkattr-decls SYSTEM '<tsv_linkattr_decl container="tsv" record="record" fields="name gender_cd">'> %linkattr-decls <!ENTITY % linkrule-decls SYSTEM '<tsv_linkrule_decl container="table" record="tr" fields="name gender_cd">'> <!LINK #INITIAL tsv table %linkrule-decls> ]> <tsv> &query-results</tsv>
With the given values for the
data attributes as supplied in the example prolog, the respective
storage manager notation FSIs generate markup declaration text
corresponding to fragments of the initial example.
See the Bundled Modules API documentation
for the detailed description of the
functions of the bundled
As a convenience, an SGML document for generic SQL selection
as just described can be constructed by just using
+//IDN sgmljs.net//NOTATION SQL query formatting template for HTML table element//EN public identifer (or some of its variants such
as for producing a
tbody element instead). The following
example shows a complete (simplified) HTML-like document
where SQL data is rendered into an HTML
tr elements as record (row) container, and
as data cell element:
<!DOCTYPE HTML [ <!ELEMENT HTML O O (TABLE|P)+> <!ELEMENT TABLE - - (TR+)> <!ELEMENT TR - - (TD+)> <!ELEMENT TD - - (#PCDATA)> <!ELEMENT P O O (#PCDATA|A)+> <!ELEMENT SPAN - - (#PCDATA)> <!ATTLIST SPAN PROPERTY CDATA #IMPLIED> <!ELEMENT A - - (#PCDATA)> <!ATTLIST A HREF CDATA #IMPLIED TITLE CDATA #IMPLIED> <!ATTLIST TABLE REF ENTITY #CONREF PROPERTY CDATA #IMPLIED> ]> <!LINKTYPE LISTBOOKS #SIMPLE #IMPLIED [ <!NOTATION SGML PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"> <!NOTATION SQLQUERY PUBLIC "+//IDN sgmljs.net//NOTATION SQL query formatting template for HTML table element//EN"> <!ATTLIST #NOTATION SQLQUERY SUPERDCN NAME #FIXED SGML FIELDS NAMES #FIXED "NAME" PARAMS NAMES #FIXED "GENDER_CD" GENDER_CD CDATA #REQUIRED TEMPLATE_SYSID CDATA #FIXED '<literal><tr><td>&name</td></tr>'> <!ENTITY FEMALENAMES SYSTEM "<literal> set colsep ' ' set underline off connect 'Driver=SQLite;Database=/tmp/test.db' select name from names_tbl where gender_cd = cast('&gender_cd' as decimal);" NDATA SQLQUERY [ GENDER_CD="0"]> ]> <html> <table ref=femalenames> </html>
For executing SQL INSERT or other statements from POSTed
URL-encoded data as would be produced from a HTML form
with potentially multiple repeating groups of query keys/fields,
SGML similar to the following boilerplate SGML can be used
(where instead of actual SQL invocation using a
manager notation a literal template for rendering the
supplied values as
tr elements is used instead:
<!DOCTYPE doc [ <!ELEMENT doc - - (sub+)> <!ELEMENT sub O O (key,value,key,value)> <!ELEMENT key - - (#PCDATA)> <!ELEMENT value - O (#PCDATA)> <!ENTITY start-key "<key>"> <!ENTITY end-key-start-value "</key><value>"> <!ENTITY end-value-start-key "</value><key>"> <!SHORTREF in-doc ";" start-key> <!SHORTREF in-key "=" end-key-start-value> <!SHORTREF in-value ";" end-value-start-key> <!USEMAP in-doc doc> <!USEMAP in-key key> <!USEMAP in-value value> ]> <!DOCTYPE html [ <!ELEMENT html - - (table)> <!ELEMENT table O O (tr+)> <!ELEMENT tr - - (#PCDATA)> <!ATTLIST tr f_attr CDATA #IMPLIED g_attr CDATA #IMPLIED> ]> <!LINKTYPE lnk doc html [ <!-- entity supplied by sgmlweb containing a semicolon-separated (rather than ampersand-separated) URI query string --> <!ENTITY QUERY_STRING_DECODED SYSTEM> <!-- dummy notation(s( just for enjoying #CURRENT attribute propagation; not actually executed --> <!NOTATION aggregate-current-values PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "non-existant.sgm"> <!NOTATION check-current-key-is-f PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "non-existant.sgm"> <!NOTATION check-current-key-is-g PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "non-existant.sgm"> <!NOTATION formatting PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN" "<literal><tr f_attr='&f_attr' g_attr='&g_attr'></tr>"> <!ATTLIST #NOTATION (formatting|aggregate-current-values) f_attr NUMBER #CURRENT> <!ATTLIST #NOTATION check-current-key-is-f key CDATA #FIXED "f"> <!ATTLIST #NOTATION check-current-key-is-g key CDATA #FIXED "g"> <!ATTLIST #NOTATION formatting g_attr CDATA #CURRENT> <!ATTLIST (key|value|sub) f_attr CDATA #IMPLIED g_attr CDATA #IMPLIED NotNames CDATA #IMPLIED key CDATA #IMPLIED template NOTATION (check-current-key-is-f|check-current-key-is-g|aggregate-current-values|formatting) #IMPLIED> <!LINK #INITIAL doc html key #POSTLINK after-key-f [ template=check-current-key-is-f NotNames="key #CONTENT" ] #IMPLIED> <!LINK after-key-f value [ template=aggregate-current-values NotNames="f_attr #CONTENT" ] #IMPLIED key #POSTLINK after-key-g [ template=check-current-key-is-g NotNames="key #CONTENT" ] #IMPLIED> <!LINK after-key-g value [ template=formatting NotNames="g_attr #CONTENT" ] tr> ]> <doc> <key>&QUERY_STRING_DECODED</sub> </doc>
If the value for
QUERY_STRING_ENCODED were supplied as
f=1;g=value1g;f=2;g=value2g, such as when the SGML document
were accessed via POSTing to
this document, when activating the
lnk link process, would
formatting template on each logical data row represented
record element with the
corresponding to the respective database column.
To collect URL query parameters in repeating groups into elements, the document
makes use of short reference to rewrite equals (
semicolon characters into
then inserts, via SGML tag inference and constraining
the respective content model, an enclosing
acting as record container element
then uses link processing with
NotNames to collect
element content of
value elements into
Moreover, the expected sequence of values for the
key element content (eg.
f on every odd, and
on every even element) is enforced.