As a straightforward application of SGML
basic entity substitution
and templating on the web,
this file, page.sgm
, shows a very simple
example of a SGML document, that, when accessed
as web page at http://.../page?name=Tom
, will
get the name
entity reference substituted into
Tom
, and returned as response to a browser
(or can be rendered entirely within a web browser
running SGML User Agent):
<!doctype html [
<!element html - - any>
<!element p - - (#pcdata)>
<!entity param system>
]>
<html>
<head>
<title>SGML page</title>
</head>
<body>
<p conref=e>
</body>
</html>
Provisioning SGML on the Web is based on interpreting HTTP request
URLs as file names according to terms and concepts in use for the
longest time on the Internet (https://tools.ietf.org/html/rfc3875
).
For the purpose of resolving a URL to a file or other resource, a
web host passes a request URL and other request parameters as value
of the PATH_INFO
, PATH_TRANSLATED
and other system-specific entities
to SGML processing, The SGML processor then either prepares HTML from SGML,
or just serves static files, depending on what media type the user agent
has indicated to accept in the request, and on what files are found
to exist on the server's file system at the resolved location.
Interpretation and modification of PATH_TRANSLATED
is analogous to
what a classic CGI script receives via the SCRIPT_NAME
and
PATH_INFO
/PATH_TRANSLATED
variables:
the classic scenario assumes there's a common document root
(or web root) directory wherein a CGI program is looked up in
a designated script directory. The program, if found,
is then executed with the trailing part of the request URL
(everything following the portion used to locate the program/script
itself) as PATH_INFO
parameter. PATH_TRANSLATED
is derived from
PATH_INFO
by resolution against the document root,
resulting in an absolute path name.
PATH_TRANSLATED
can be alternatively computed (without
knowledge of a document root directory (and provided
SCRIPT_NAME
is absolute against the web server's file
system root, which it is typically not, however) by starting
with the directory where SCRIPT_NAME
resides and going
back as many parent directories as there are path components
in SCRIPT_NAME
, then appending PATH_INFO
If a static file is requested, as determined by requesting
a resource name (in PATH_TRANSLATED
) having a dot in its last
path component
it is served "as-is" (as a static file), with a media type
(HTTP Content-Type
) derived from the file extension
note the HTTP Accept
-header isn't checked in this case
commonly requested static file types include prerendered
.html
files as well as .css
, .js
, and image files
otherwise, if a static resource by the name of PATH_TRANSLATED
name doesn't exist, a 404 NOT FOUND
HTTP response is generated
Otherwise (PATH_TRANSLATED
doesn't have a dot), if PATH_TRANSLATED
can be resolved as a SGML file by appending the .sgm
file
extension
if text/sgml
is accepted by the request
the SGML gateway determines scriptName
as master SGML file/template
and sends the static master file
otherwise, and by default (and if either no Accept
header is
present in the request or its value is text/html
or a wildcard)
PATH_TRANSLATED
is processed for producing HTML,
and the output is served as
text/html` response
Otherwise, (PATH_TRANSLATED
cannot be resolved to a SGML file)
if the first path component of PATH_TRANSLATED
(or the longest
sequence of consecutive path steps contained twice in
PATH_TRANSLATED
) can be resolved as an SGML file by appending
the sgm
file extension,
if text/sgml
is accepted for the response, the resolved file
is served statically from the resolved file as text/sgml
otherwise, the resolved file is processed for producing HTML,
and the output is served as text/html
response;
the remaining part of PATH_TRANSLATED
(not including the
initial part up to and including the resolved SGML file)
is resolved against the web root directory to an absolute path,
and supplied as the PATH_TRANSLATED
system-specific entity to
SGML processing
Otherwise (when the request's PATH_TRANSLATED
value couldn't be
interpreted in any of the ways explained) a 404 NOT FOUND
HTTP
response is generated.
Whenever a SGML file is selected for processing, the file's
modification date is checked against the value of the last-modified
HTTP header, if present in the request. If the file is older than
the last-modified
value, it's not processed, and a 304 NOT MODIFIED
HTTP response is returned to the client instead. In case the processed
SGML is determined from the initial portion of PATH_TRANSLATED
, the
modification date of the file name denoted by the remaining part,
is checked as well.
sgmlweb produces HTTP GET responses for text/html
content
by
locating an SGML file with a .sgm
file suffix in the
web document root directory matching the request URI
in the most obvious way
invoking SGML processing on the located SGML with activating
the HTML
target document type name
the WEB
link process
the HTTP
link process
returning the produced SGML processing output to the the requestor, with HTTP response headers populated as discussed below.
This means that on SGML files in the web root directory
that don't have HTML
as base document type name,
a link pipeline is inferred from the link process
declarations present in the prolog of the SGML file
such that the ultimate result document type name
produced by the link pipelining is HTML
. It's
an error if neither the base document is HTML
,
nor a target document type name of HTML
can be
produced by forming a valid sequence of link processes
from those declared in the link process.
Moreover, sgmlweb activates the WEB
and HTTP
link processes. If WEB
and/or HTTP
are declared in
the document prolog, any inferred link process pipeline
will always contain the WEB
and HTTP
link process,
respectively (but either WEB
and HTTP
can be omitted
as described below).
The following system-specific entities are exposed to SGML and can be declared.
DOCUMENT_ROOT
directory
SCRIPT_NAME
Absolute path to master SGML file (primary SGML being accessed)
PATH_INFO
URL portion following the part identifying SCRIPT_NAME
, if any
includes a leading slash character
PATH_TRANSLATED
Absolute path to the file corresponding to PATH_INFO
, (if PATH_INFO
is set/if there's an URL portion following the part identifying the
master SGML file)
PATH_TRANSLATED_CONTENT
Content of the file corresponding to PATH_TRANSLATED
, if any
REQUEST_METHOD
HTTP method used for the request (eg. GET
or POST
)
The following system-specific parameter entities/CGI meta-variables
are additionally made available (see https://tools.ietf.org/html/rfc3875
for an explanation) when either declared manually or conditionally
declared via referencing a parameter entity for the
+//IDN sgml.net/ENTITIES CGI 1.1//EN
public identifier:
REMOTE_ADDR
SERVER_NAME
SERVER_PROTOCOL
CONTENT_TYPE
(see https://tools.ietf.org/html/rfc3875#page-12
)
Note PATH_INFO
, PATH_TRANSLATED
, and PATH_TRANSLATED_CONTENT
are not necessarily (or even typically) passed internally from a web server
to SGML, but are what SGML passes to an SGML document processing context.
If a request URL consists of just a path name identifying an SGML resource
no PATH_INFO
, and hence no PATH_TRANSLATED
etc. system-specific entities
are exposed and accessing (or even declaring) those is treated as error.
To be able to process requests both with and without
PATH_INFO
/PATH_TRANSLATED
using the same master document,
CGI meta-variables can be declared using the
//IDN sgml.net//ENTITES CGI 1.1//EN
public text like this
<!DOCTYPE html ... [
<!ENTITY % cgivars "+//IDN sgml.net//ENTITIES CGI 1.1//EN">
%cgivars;
...
]>
Referencing the +//IDN sgml.net//ENTITIES CGI 1.1//EN
public text
in a declaration set as shown is equivalent to declaring the
CGI meta-variables as both system-specific general and parameter
entities manually. However, the PATH_INFO
and PATH_TRANSLATED
entities (and the PATH_TRANSLATED_CONTENT
as a general entity)
are only declared if actually supplied in the processing context.
In particular, fallback values for those can be specified in the document itself. For example, the following declaration set
<!DOCTYPE html ... [
<!ENTITY % cgivars "+//IDN sgml.net//ENTITIES CGI 1.1//EN">
%cgivars;
<!ENTITY PATH_TRANSLATED "somefile">
]>
assumes values for PATH_TRANSLATED
as obtained from the trailing
part of a request URL. However, if the request URI doesn't contain
a trailing part following after the part that identifies the master
document itself, %cgivars
will leave PATH_TRANSLATED
undeclared;
hence the subsequent entity declaration for PATH_TRANSLATED
will
supply the effective value for it.
In this way, a master document can assign a fallback value for an absent file name (eg. as derived from an absent secondary path step in a HTTP request URI) for a client document such as the file name of the latest or otherwise most relevant client document of a document collection sharing a common path prefix.
While CGI meta-variables represent data handed by the web server to SGML, HTTP response meta-variables (such as the HTTP response status) are data returned from SGML processing to the web server along with result markup as response body.
Conceptually, HTTP response meta-variables are represented
as link attributes of a simple link process. A simple
link process declares link attributes on the document element
of the response body carrying HTTP response meta-variables.
HTTP response link attributes are declared in a distinguished
link process declaration identified by the
+//IDN sgml.net//LPD HTTP 1.1//EN
and
+//IDN sgml.net//LPD HTTP 2.0//EN
public text identifiers.
These LPDs behave as if declared as follows:
<!ENTITY % HTTP_RESPONSE_STATUS "200">
<!ENTITY % HTTP_RESPONSE_CONTENT_TYPE "text/html">
<!ENTITY % HTTP_RESPONSE_LOCATION "">
<!ENTITY % HTTP_CACHE_VALIDATION_ENTITIES "">
<!ATTLIST html
status NUMBER #FIXED %HTTP_RESPONSE_STATUS
location CDATA #FIXED "%HTTP_RESPONSE_LOCATION"
content-type CDATA "%HTTP_RESPONSE_CONTENT_TYPE"
cache-validation-entities ENTITIES "%HTTP_CACHE_VALIDATION_ENTITIES"
...>
To make sgmlweb return values for response meta-variables to user agents other than the defaults, a master document declares an LPD with one of the distinguished LPDs as external subset, and then preempts one or more of the parameter entities used as default values for link attributes. For example, the following master document makes sgmlweb send a 404 HTTP status to a web browser:
<!DOCTYPE html ... [
]>
<!LINKTYPE http PUBLIC "+//IDN sgml.net//LPD HTTP 1.1//EN" [
<!ENTITY % HTTP_RESPONSE_STATUS "404">
]>
...
Note that the name of the link process must be http
;
other LPDs referencing the public identifier for HTTP response meta-variables
won't get activated (and hence ignored) by sgmlweb.
Note that HTTP_RESPONSE_STATUS
and other parameter entities
can be preempted from any link process, not just from the http
link process, subject to declaration set preemption.
Note since the LPD determines the names of parameter entities
it accepts as #FIXED
values, there's no need to have link
processing determine link attributes; all that has
to happen is that a respective LPD ("deriving" from a distinguished
response LPD) is declared and activated; the effective values
of the respective parameter can be queried from entity management
just as request parameters.
In effect, specifying values for response meta-variables is syntactically very similar to declaring request parameters.
Note the LPD is tied to the html
response document element,
though.
The following parameter entities, when declared/preempted as described, have these respective meaning:
HTTP_RESPONSE_STATUS
numeric HTTP status to respond
default: 200
the valid HTTP response status codes are 100, 101, 200, 201, 202, 203, 204, 205, 206, 300, 301, 302, 303, 304, 305, 307, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 426, 451, 500, 501, 502, 503, 504, and 505
of those, all requests with non-2xx or with 204 NO CONTENT status get terminated after prolog processing, without producing a response body, and a generic response body for 4xx and 5xx responses, if applicable, is populated by the web server instead
the reason phrase (such as NOT MODIFIED
for 304 responses)
are also generated by the web server
on a 301 MOVED PERMANENTLY, 302 FOUND, 303 SEE OTHER (on POST),
and 307 TEMPORARY REDIRECT response status, a redirect URL
is configured in the HTTP_RESPONSE_LOCATION
parameter entity;
it is an error if the HTTP_RESPONSE_LOCATION
isn't declared/preempted
(and will lead to a 500 INTERNAL SERVER ERROR response)
HTTP_RESPONSE_LOCATION
Redirect-URL for 301 MOVED PERMANENTLY, 302 FOUND, 303 SEE OTHER and 307 TEMPORARY REDIRECT; see above
HTTP_RESPONSE_CONTENT_TYPE
response media type
default: text/html
preempted values must match the syntax of an RFC 1521 media type
(such as application/xhtml+xml
with optional parameters)
and have text
or application
as (main) type; other values
will lead to a 500 INTERNAL SERVER ERROR generic error response
HTTP_CACHE_VALIDATION_ENTITIES
space-separated list of entity names naming file names
declared in the document prolog, the youngest of which
is used to populate the value of the Last-Modified
date
HTTP header
The QUERY_STRING
system-specific entity contains the
query part of an URL if the request URL contains a query
such as in http://example.com/page?param=value...
,
Moreover, the value of QUERY_STRING
is parsed according
to the rules for HTML form-encoding.
and any individual query parameters (such as param
in the above example) are made available to SGML
processing and can be accessed by declaring
a system-specific (general or parameter) entity
with the respective name.
Note case-folding of system-specific entities is applied according to SGML declaration or out-of-band settings while URI query parameters are not made uppercase when supplied as values for system-specific entities. This means that when NAMECASE ENTITY YES is effective for SGML processing, query parameters must be supplied in uppercase letters in the request URI, and must be supplied in the request URI in the exact sequence of upper- and lowercase letters specified used for the system-specific entity declaration otherwise.
Note mapping query parameter to system-specific
entities is only useful if the query parameters in an
URI are unique; when the URI contains multiple
key=value
pairs for the same key (such as
produced by browsers when submitting an HTML form
with multiple same-named fields), this model of
accessing and processing query parameters isn't
useful, and the technique of parsing the raw
QUERY_STRING
value via SGML short references
as explained below is used instead.
System-specific entities received from HTML form
-like
query parameters in HTTP GET or POST request URIs can be
escaped by declaring those as external data text entities
eg.
<!NOTATION some-notation SYSTEM "...">
<!ENTITY uri_parameter1 SYSTEM CDATA some-notation>
Characters used for markup delimiters such as <
and &
in
replacement text for data entity references get replaced by
numeric character entity references when expanded in content
and are not interpreted as markup delimiters by SGML.
Note:
System-specific entities not declared as
data entities don't receive HTML escaping, thus
can potentially contain malicious markup such as
script
elements, that, when expanded into a context
without further constraints (such as element exclusion
exceptions) generally represent a security threat.
Therefore, declaring HTML form-like parameters in
URLs or transferred otherwise should
generally be declared as data entities, unless
these transferred entities should explicitly
represent markup as explained below.
While declaring an entity as data entity as shown already ensures escaping of HTML markup delimiters, sgmlweb provides a distinguished notation with the public identifier
+//IDN www.w3c.org/TR/html5//NOTATION HTML 5 Form Input Types//EN
representing lexical value spaces of HTML form input values.
The notation representing form input value types is declared as follows:
<!NOTATION html5-form-input
PUBLIC
"+//IDN www.w3c.org/TR/html5//NOTATION HTML 5 Form Input Types//EN">
<!ATTLIST #NOTATION html5-form-input
TYPE (text|email|url|number|date|time|datetime) text
PATTERN CDATA #IMPLIED>
A system-specific entity can make use of this notation to obtain extended lexical value checks. For example, the declaration
<!ENTITY formparam SYSTEM CDATA html5-form-input [ type="email" ]>
declares the form-like input parameter formparam
as having
type email
; sgmlweb will check the received value against
the lexical rules for email addresses described in the HTML 5
specification (eg. allow me@home
but not me
).
Likewise, entities declared as form input value data entities with
text
, url
, number
, date
, time
, or datetime-local
type
are checked against the respective validation rules of the
HTML 5 specification.
Any HTML form input value type will have the effect that value
normalization is performed on the respective entity replacement
value. The most general text
input value accepts any input
text, and performs value normalization by removing all newline
characters from the entity value. Other types perform value
checks and normalizations in addition to the value normalization
performed for text
type values. Of the supported input types
for validation, only the number
input types performs additional
value normalization (namely, any +
characters are removed,
an uppercase letter E
exponent separator is changed to
lowercase, and a leading zero is added where a number value
begins with a decimal point/dot character).
The optional pattern
data attribute can contain a regular
expression in JavaScript syntax that the replacement value
is checked against in addition to the check and normalizations
implied by the type
attribue.
Note that not all features of JavaScript regular expressions
are supported. In particular, lookahead operators, Unicode
code points, and other PCRE-specific constructs other than
the \d
, \w
, and \s
special symbols are not available.
HTML form input lexical types are also available for WebSGML attribute data specifications. A declaration such as
<!ATTLIST elmt attr DATA html5-input [ type="email" ]>
will make sgmljs.net SGML check and value-normalize the attr
value against the rules for email addresses.
Likewise, data attribute can also be declared for notations, such as used for template notations. For example, the following declaration
<!NOTATION sgml ...>
<!ATTLIST #NOTATION sgml attr DATA html5-input [ type="email" ]>
declares the attr
data attribute of the sgml
notation
as having type email
, and enforced lexical checks for
any value passed in to the sgml
notation template.
A 400 BAD REQUEST HTTP response status is emitted by
sgmlweb when form input validation fails on one or more
system-specific entities supplied via HTML form-like GET variables
(or, equivalently, variables POSTed in application/x-www-form-urlencoded
request bodies). A 4xx status is only generated when the
validation is performed on a data entity declared
as having HTML form input lexical value, rather than as
a data attribute declared as having a HTML form input
lexical value (a DATA
attribute), for which an
unspecifc 500 HTTP status is reported instead.
The latter restriction is because form input validation
on DATA
attributes
might not be tightly traceable to an input value (eg. because the input value is composed of expanded general entity replacement text)
is performed lazily as part of content parsing rather than prolog parsing (hence can't be reported as HTTP status in a HTTP header which must be determined before content parsing).
A 400 BAD REQUEST signifies to the client/web browser that a request is malformed (whereas a 5xx status can be interpreted as advice to retry a request at a later time), so is generally the more appropriate status to return on form input value errors.
sgmlweb built-in support for HTML form-like GET with query parameters contained in the request URI is also applied automatically on HTML form POST requests having application/x-www-form-urlencoded media type (HTML form-like GET requests include static requests with an URI query part parameters indistinguishable from form-like URI query parameters).
Values transferred in HTML form POST request bodies with application/x-www-form-urlencoded media type are exposed exactly the same as request URI parameters. As far as query parameters are concerned, POST request bodies differ from form GET requests only in that they transfer query parameters in the request body rather than as part of the request URI. Therefore, the request URI for form POST requests with application/x-www-form-urlencoded request bodies must not contain a query part, since the query part is assumed to be contained in the request body.
For HTTP POST queries (TODO: what about GET?) with
application/x-www-form-urlencoded media type, the CGI-meta variable
QUERY_STRING
is exposed as system-specific entity to
the SGML processing context. In addition, QUERY_STRING_DECODED
is provided, containing a variant encoding for QUERY_STRING
where the &
(ampersand) characters are replaced by ;
(semicolon)
characters so as to be more useful for interpretation as SGML.
Specifically, raw query string (QUERY_STRING_DECODED
)
processing via SGML short references is necessary when
the URI query string contains multiple key=value pairs
for the same key, such as is commonly emitted from HTML
forms containing tabular data or multiple repeated field
groups.
Request bodies other than those with application/x-www-form-urlencoded
content type (which are read and processed as described above)
can be accessed via <osfd>0
eg.
<!NOTATION some-notation SYSTEM "...">
<!ENTITY raw_reqest_uri_query_part SYSTEM "<osfd>0" CDATA some-notation>
If processing results in an 4xx or 5xx efffective HTTP status, either by regular sgmlweb processing or via setting a custom HTTP response status as described above, sgmlweb attempts to render an error response body. Rendering error responses is no different from rendering regular SGML pages, but any request parameters are cleared and not available because the request parameters might be erroneous or malicious, hence might make rendering an error response body fail again for the same reason that regular sgmlweb rendering has already failed for the requested URL and request parameters.
When rendering an error response, the system-specific
entity STATUS
is available (as a read-only entity)
in the processing context.
A simple error page might look like the following example:
<!DOCTYPE html SYSTEM "about:legacy-compat" [
<!ENTITY STATUS SYSTEM>
]>
<html>
<head>
<title>Error &STATUS</title>
</head>
<body>
<p>Error serving requested page</p>
</body>
</html>
Error page rendering is Implemented as a separate HTTP request cycle in Node.js implementations and visible and customizable via JavaScript code, whereas it appears hardcoded as part of sgmlweb request processing in other environments.
Likewise, the name of the error page (/error.sgm
) is hard-coded
in some sgmlweb builds, but can be chosen freely as part
of the configuration of regular request processing handler
chains in other sgmlweb builds (such as for Node.js).
Files resolved by the SGML Web Server Gateway itself are passed as
open file descriptors to SGML processing, such
that those can be accessed using <osfd>
FSIs. The processing
environment can access up to five file descriptors:
0 (stdin
); contains POST
ed body content, when used and supported in the request
1 (stdout
): output of SGML processing; can be a file or a buffer
2 (stderr
); error output and log destination
3 main input; contains character data from master file
(resolved using either the complete path as of the initial value of
PATH_TRANSLATED
, or just the first path component of PATH_TRANSLATED
otherwise)
4 file descriptor containing character data resolved using the
remainder potion of PATH_TRANSLATED
if file descriptor #3 was
resolved using only the first path component
Before passing control to core SGML processing, the SGML gateway
(on select execution environments) pre-opens the scriptName
filename
as /dev/fd/N
, and PATH_TRANSLATED
, if relevant, as /dev/fd/N+1
.
Accessing open file descriptors rather than opening files by path name from main SGML processing as needed avoids race conditions and has generally desirable properties wrt. exploiting POSIX file system guarantees for atomic/continued/high-available content delivery in the presence of concurrent content change and maintenance.
Specifically, this is done to be able to guarantee that, after SGML prolog parsing, no request processing will fail due to missing template and/or client document files, and that the content of determined files remains accessible to SGML processing even if it is subject to concurrent change or deletion during processing.
In particular
the template and client files (if any) are held open by the SGML Web Server Gateway process, so those files can be atomically changed while request processing on previous content is underway (due to Unix file system guarantees)
note that these guarantees do not hold for further external entities
other than the content of PATH_TRANSLATED
from the main template file itself
that might be referenced from the template file (or the client file if it
is atypically transcluded as template and can have entity declarations)
a proper 404 NOT MODIFIED
HTTP status can be send, rather than beginning
the response with a 200 OK
status and then detecting non-existence
during content processing and having e.g. to include error message
character data along with user content
to this aim, SGML processing takes advantages of being designed such
that no output character data is written before the first content arrives
in output buffer handling, ie. prolog data is buffered until actual
content begins, hence HTTP 404 NOT FOUND
or other non-default status
can be set at the end of SGML prolog processing
Request processing is performed under the assumption of sending
a default HTTP result of 200 OK
unless set explicitly to
another status, and assuming that, in general, the HTTP result
status can't be changed once any output has been emitted.
Request processing is performed such that the complete SGML
prolog of the document instance to process is validated before
emitting any output. On any prolog parsing errors (including when
system-specific parameter entities couldn't be resolved),
processing is aborted, and a proper 404 NOT FOUND
or
500 INTERNAL SERVER ERROR
HTTP status, depending on whether
eg. an operating system error of ENOENT
or non-ENOENT
, resp.
was encountered) is generated.
Parsing, resolution, or other errors during content parsing, on the other hand, can't typically be reported via HTTP error status codes because response headers will have been sent to the client alraedy by the time a content error is encountered.
As already explained for the individual routing branches,
based on the above sketched file name resolution for PATH_TRANSLATED
,
before actually accessing and sending content, up-to-datedness of the
client document is checked; if it hasn't changed since the date and time
of the last modification, a 304 NOT MODIFIED
HTTP response is generated.
Note that access policies etc. don't play into here as the content body
isn't transferred with 304 responses.
The SGML User Agent (sgml-ua.js
) is a JavaScript (ES5)
program for web browsers designed to produce HTML from SGML
in the same way that HTML is produced from SGML on a
SGML Web Server, thereby transparently
offloading SGML processing to the browser, and at the same
time saving network bandwidth by avoiding redundant network
transfer of repeated partial page content.
While the SGML User Agent is designed to run against a SGML Web Server, it can also run against any other (e.g. simple static) web server lacking SGML support, in browser-only mode, with reduced user agent functionality. More generally, a SGML web setup can involve:
both server and browser processing: SGML is rendered transparently on either, or both, the server (for the initial page load) and in the browser, depending on whether JavaScript is enabled/allowed in the browser
browser-only processing: SGML files are accessed as static files from the web server and then rendered into a displayed HTML DOM on the browser
server-only processing: SGML pages are sent as server-rendered HTML pages to browsers; Javascript support on the browser isn't required
The SGML User Agent, when started, determines if the web page
is running from a server with server-side SGML support by inspecting
page metadata in the HTML head
element. If the head
element
does not contain
<link rel="alternate" type="text/sgml" ...>
then the SGML User Agent assumes it is running off a web server without server-side SGML rendering support.
Server-side SGML support is required for proper session history, whereas when server-side SGML support isn't advertised via the link element as shown, browser-refresh and back-navigation from an external site linked to from the SGML webpage will take the user to the initial landing page (the static or prerendered HTML page carrying the SGML User Agent script). Morevoer, bookmarking works only with server-side SGML support.
The basic functionality of the SGML User Agent is to, on
window.onload
, attach click
handlers to the current
document's local (same-domain) links performing SGML page
rendering (transforming SGML content to HTML/DOM).
Specifically, this is enabled on anchors that have the same
effective protocol/host/port as the invoking page, and that
either have no type
attribute specified, or have text/sgml
specified as its value.
Once a page is rendered using SGML, its anchors get captured by SGML event handling for further navigation within the domain name, in turn.
History pushState()
/popState()
works in a natural way for basic
forward and backward navigation): if we're about
to navigate to another page (on the same domain so rendered
via SGML), we're just storing the previous page via pushState(),
with the URL used to fetch SGML (or HTML on the initial page).
When we return to this state via backward navigation, the
popstate
event handler will pop the state and start re-rendering the
HTML from the pushed href
URL in the same way the SGML page was rendered
when first visited.
On a page (browser) refresh, the browser reloads
window.location
using the regular browser page loading algorithm.
When the server can render SGML to HTML (as triggered by an
HTTP Accept header favouring HTML over SGML) there's nothing special
to do here, since the re-visited page gets rendered server-side
(and carries the sgml-ua.js
script to attach to link handlers for
further browser-local SGML processing, just like on an initial
page load).
On the other hand, when working against a static web server,
window.location
will fetch SGML text, and browsers will render
the SGML code text as either plain text (Chrome) or possibly
broken HTML (FF, IE). Therefore, support for static servers
involves further browser history manipulation.
Blocking browser refresh isn't possible in general. There exist various attempts/scripts to accomplish this by either
intercepting key events (but these techniques won't
handle clicking on a browser refresh icon button); what
can be achieved here (by either returning a non-void value
from beforeunload
event handling, by setting the
event's returnValue
, or by calling preventDefault
),
is to bring up a "do you really want to leave?"
warning, but the reload action as such can't be prevented.
establishing a new browser context in an iframe
or a HTML4 frame
(see e.g.
Disabling the Back Button),
but these techniques are generally considered user-hostile
not creating history entries in the first place; ie.
according to
Back Button Behavior on a Page With an iframe
(although being about iframes mostly) replaceState()
can be used to suppress creating history entries; namely,
if, on a click
event, the navigated-to href
is
replaceState()
d into the same as the top-most one
then no history entry is created; this could be used
to block any backward (and forward) navigation, but
(if it actually works) is overreaching since it will
disable plain backward navigation between SGML rendered
pages, which isn't a a problem even against servers
without server-side SGML rendering.
So what SGML User Agent does is to ensure that, while a page is up,
window.location
points to the landing page of the current
page, ie. the page through which the site was entered,
which in many cases will be the site's home page, but could
be any page carrying the sgml-ua.js
script.
As final part of SGML rendering (after rendered link
target URLs in the generated page have been changed into
absolute/resolved form), window.location
is set to the
landing page URL. When the page is left, the history entry
for the page view is then restored to the original SGML
resource URL, rather than the landing page URL, so that
on plain back navigation, regular popstate
handler execution will render the SGML resource.
The original resource URL is stored in the history entry's data field (for as long as it is shadowed by the landing page URL).
Executing history restoration globally on the
unload/beforeunload
(or even pagehide/pageshow
) events isn't
possible, since Ajax page loads don't trigger those
events. Therefore, history restoration is executed
on individual outgoing link click events, along with
SGML processing for the new page.
Note that no history restoration takes place (the
landing page history entry is kept) on outgoing external
links since those don't get a click handler for SGML
processing attached and hence exhibit standard browser
behavior on link activation. For external links we're
going to loose the JavaScript execution context and
can't register handlers; when navigating back
from an external page we must therefore enter
a HTML (not SGML) page carrying the sgml-ua.js
script.
SGML User Agent starts out on an initial landing page carrying this script
following a link on the page in the same domain will result in rendering the link target using SGML
backward-navigation (to either an earlier rendered SGML page or the landing page) is also be performed using SGML
navigation/following links to external sites will end SGML UA execution and continue with standard browser HTML loading/rendering; on return to a SGML-rendered site, if the site is running on static web server without server-side SGML rendering, the landing page, not the page through which the site was left, is reloaded; only if using server-side SGML can the proper page of departure be loaded
page refreshes take the user to the landing page when not served from a web server with support for server-side SGML; only if using server-side SGML will the current page be reloaded (it's not possible to intercept browser behavior on refresh)
when running off a static web server without server-side SGML rendering, context menus (as activated by right-click or long-click/hold) are blocked to prevent the Open Link in new tab being offered and bookmarking (both of which won't work against a static server)
This section describes facilities for publishing tab-separated value data streams produced from SQL queries or other sources into HTML markup via bundled functions for dynamically generating required markup declarations.
Moreover, a technique for implementing an endpoint for HTML forms submission with support for tabular data insertion (where data is presented to SGML as query string with possibly multiple repeating field groups) is explained.
As explained in the context of short reference parsing, tab-separated values pulled-in from an external source such as a file or SQL query can be made available as markup elements using short reference use and map declarations specific to a particular tab-separated data source stream.
sgmljs.net SGML provides, via custom storage manager notations, bundled functions for automatically generating required short reference declaration for TSV parsing given the names of attributes provided as data attributes. Moreover, further provided markup declaration generators are used to extend basic TSV-parsing to a generic mechanism for formatting tab-separated values, by feeding result tabular data rows obtained from TSV parsing into SGML templating.
For supplying parsed TSV data values to templating, input data must be provided as markup attributes rather than elements. To collect a sequence of (text contents of) elements produced from TSV parsing into attributes, sgmljs.net SGML uses the techniques of
re-mapping of element content into attributes provided
via the NotNames
attribute (part of ISO 10744 DAFE
support) when a template notation is applied on an element
in a link process
propagating attribute values to preceding sibling
elements via #CURRENT
link attribute default
semantics.
To demonstrate these techniques, consider the following example input document representing possible output of TSV parsing setup as discussed in record boundary insertion:
<!DOCTYPE tsv [
<!ELEMENT tsv - - (record+)>
<!ELEMENT record - - (field1,field2,field3)>
<!ELEMENT field1 - - (#PCDATA)>
<!ELEMENT field2 - - (#PCDATA)>
<!ELEMENT field3 - - (#PCDATA)>
]>
<!DOCTYPE table [
<!ELEMENT table - - (tr+)>
<!ELEMENT tr - - (td+)>
<!ELEMENT td - - (#PCDATA)>
]>
<!LINKTYPE lpd tsv table [
<!NOTATION field1-template
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"template.sgm">
<!NOTATION field2-template
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"template.sgm">
<!NOTATION field3-template
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"template.sgm">
<!ATTLIST #NOTATION (field1-template|field2-template|field3-template)
field1 CDATA #CURRENT>
<!ATTLIST #NOTATION (field2-template|field3-template)
field2 CDATA #CURRENT>
<!ATTLIST #NOTATION field3-template
field3 CDATA #IMPLIED>
<!ATTLIST (field1|field2|field3)
field1 CDATA #IMPLIED
field2 CDATA #IMPLIED
field3 CDATA #IMPLIED
template
NOTATION (field1-template|field2-template|field3-template)
#IMPLIED
NotNames CDATA #IMPLIED>
<!LINK #INITIAL
tsv table
record tr
field1 [ template=field1-template NotNames="field1 #CONTENT" ] #IMPLIED
field2 [ template=field2-template NotNames="field2 #CONTENT" ] #IMPLIED
field3 [ template=field3-template NotNames="field3 #CONTENT" ] td>
]>
<tsv>
<record>
<field1>first value</field1>
<field2>second value</field2>
<field3>third value</field3>
</record>
<!-- further data following here:
<record>
<field1>...</field1>
...
</record>
-->
</tsv>
For this example, template.sgm
is expected to contain:
<!DOCTYPE #IMPLIED SYSTEM [
<!ENTITY field1 SYSTEM>
<!ENTITY field2 SYSTEM>
<!ENTITY field3 SYSTEM>
]>
<tr>
<td>&field1</td>
<td>&field2</td>
<td>&field3</td>
</tr>
The result markup of processing the example document
with the lpd
link process activated is as follows
(omitting commented text):
<table>
<tr>
<td>first value</td>
<td>second value</td>
<td>third value</td>
</tr>
</table>
The link process contains template notation declarations
for the individual fieldN
elements, and link rules applying
template notation on the respective fieldN
elements.
Crucially, the result element of all the link rule except
the field3
element (the last element of a 'recordcontent
model) is
#IMPLIED, meaning that the template is applied
if the source element (eg.
field1or
field2) can
be placed into the result context. Since neither
field1
nor
field2can appear anywhere in the result HTML-like
tablecontent, no template will apply on the
field1and
field2elements. The link rules placed on these
fieldN`
elements exists solely
for applying NotNames
rules, which collect the
text content of the element on which the template is
placed into the field1
and field2
values, respectively
(and also according on field3
)
for updating the current values for field1
and field2
, respectively, in the link attribute processing
context
Since the field3
elements shares the declarations
for field1
and field2
as #CURRENT
link attributes,
the value for field1
and field2
is transported to
the context for the field3
element, where the field3-template
is applied as regular template, since the result element
tr
is expected/admitted at the result context position.
In this way, the content of the field1
, field2
, and field3
originally in the input source are propagated to the field1
,
field2
, and field3
data (DAFE) attributes, and hence
available as entities in the sub-processing context for
applying template.sgm
on field3
.
sgmljs.net SGML provides the built-in storage manager
notations tsv_element_decl
, tsv_entity_decl
, tsv_shortref_decl
,
tsv_usemap_decl
, 'tsv_notation_decl,
tsv_linkattr_decl,
and
tsv_linkrule_declto generate markup declarations
from 'fields
and params
data attributes as required for
the above declaration fragments.
The above example rewritten to use markup declaration generators for fetching TSV records and and applying a formatting template looks as follows:
<!DOCTYPE tsv [
<!NOTATION sql SYSTEM>
<?IS10744 FSIDR sql tsv_element_decl tsv_entity_decl tsv_shortref_decl tsv_usemap_decl tsv_notation_decl
FSIDefDoc="+//IDN sgml.net//DTD FSISM TSV parsing declaration utilities//EN">
<!ELEMENT tsv - - (record+)>
<!ELEMENT record - - (name,gender_cd)>
<!ENTITY % element-decls
SYSTEM '<tsv_element_decl container="tsv" record="record" fields="name gender_cd">'>
<!ENTITY % entity-decls SYSTEM '<tsv_entity_decl container="tsv" record="record" fields="name gender_cd">'>
<!ENTITY % shortref-decls SYSTEM '<tsv_shortref_decl container="tsv" record="record" fields="name gender_cd">'>
<!ENTITY % usemap-decls SYSTEM '<tsv_usemap_decl container="tsv" record="record" fields="name gender_cd">'>
%element-decls;
%entity-decls;
%shortref-decls;
%usemap-decls;
<!ENTITY % query-results SYSTEM
"<sql>connect 'Driver=SQLite;Database=test.db'
set headings off
set colsep ' '
select name, gender_cd
from names_tbl
where gender_cd = 0 order by name;">
<!ENTITY query-results "%query-results">
]>
<!DOCTYPE table SYSTEM [
<!-- <!ELEMENT table - - (tr+)>
<!ELEMENT tr O O (td+)>
<!ELEMENT td - - (#PCDATA)> -->
]>
<!LINKTYPE lnk tsv table [
<?IS10744 FSIDR tsv_notation_decl tsv_linkattr_decl tsv_linkrule_decl
FSIDefDoc="+//IDN sgml.net//DTD FSISM TSV parsing declaration utilities//EN">
<!ENTITY % notation-decls SYSTEM '<tsv_notation_decl container="tsv" record="record" fields="name gender_cd" template_sysid="sql-names-gendercd-query-with-aggregation-into-last-field2-referenced-template.sgm">'>
%notation-decls
<!ENTITY % linkattr-decls SYSTEM '<tsv_linkattr_decl container="tsv" record="record" fields="name gender_cd">'>
%linkattr-decls
<!ENTITY % linkrule-decls SYSTEM '<tsv_linkrule_decl container="table" record="tr" fields="name gender_cd">'>
<!LINK #INITIAL
tsv table
%linkrule-decls>
]>
<tsv>
&query-results</tsv>
With the given values for the container
, record
, and fields
data attributes as supplied in the example prolog, the respective
storage manager notation FSIs generate markup declaration text
corresponding to fragments of the initial example.
See the Bundled Modules API documentation
for the detailed description of the tsv_element_decl
,
tsv_entity_decl
, tsv_shortref_decl
, tsv_usemap_decl
tsv_notation_decl
, tsv_linkattr_decl
, and tsv_linkrule_decl
functions of the bundled tsvparsing
module.
As a convenience, an SGML document for generic SQL selection
as just described can be constructed by just using
the +//IDN sgml.net//NOTATION SQL query formatting template for HTML table element//EN
public identifer (or some of its variants such
as for producing a tbody
element instead). The following
example shows a complete (simplified) HTML-like document
where SQL data is rendered into an HTML table
element
with tr
elements as record (row) container, and td
as data cell element:
<!DOCTYPE HTML [
<!ELEMENT HTML O O (TABLE|P)+>
<!ELEMENT TABLE - - (TR+)>
<!ELEMENT TR - - (TD+)>
<!ELEMENT TD - - (#PCDATA)>
<!ELEMENT P O O (#PCDATA|A)+>
<!ELEMENT SPAN - - (#PCDATA)>
<!ATTLIST SPAN PROPERTY CDATA #IMPLIED>
<!ELEMENT A - - (#PCDATA)>
<!ATTLIST A HREF CDATA #IMPLIED TITLE CDATA #IMPLIED>
<!ATTLIST TABLE REF ENTITY #CONREF PROPERTY CDATA #IMPLIED>
]>
<!LINKTYPE LISTBOOKS #SIMPLE #IMPLIED [
<!NOTATION SGML PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN">
<!NOTATION SQLQUERY PUBLIC "+//IDN sgml.net//NOTATION SQL query formatting template for HTML table element//EN">
<!ATTLIST #NOTATION SQLQUERY
SUPERDCN NAME #FIXED SGML
FIELDS NAMES #FIXED "NAME"
PARAMS NAMES #FIXED "GENDER_CD"
GENDER_CD CDATA #REQUIRED
TEMPLATE_SYSID CDATA #FIXED '<literal><tr><td>&name</td></tr>'>
<!ENTITY FEMALENAMES SYSTEM "<literal>
set colsep ' '
set underline off
connect 'Driver=SQLite;Database=/tmp/test.db'
select name from names_tbl where gender_cd = cast('&gender_cd' as decimal);"
NDATA SQLQUERY [ GENDER_CD="0"]>
]>
<html>
<table ref=femalenames>
</html>
For executing SQL INSERT or other statements from POSTed
URL-encoded data as would be produced from a HTML form
having enctype
of application/x-www-form-urlencoded
with potentially multiple repeating groups of query keys/fields,
SGML similar to the following boilerplate SGML can be used
(where instead of actual SQL invocation using a sql
storage
manager notation a literal template for rendering the
supplied values as tr
elements is used instead:
<!DOCTYPE doc [
<!ELEMENT doc - - (sub+)>
<!ELEMENT sub O O (key,value,key,value)>
<!ELEMENT key - - (#PCDATA)>
<!ELEMENT value - O (#PCDATA)>
<!ENTITY start-key "<key>">
<!ENTITY end-key-start-value "</key><value>">
<!ENTITY end-value-start-key "</value><key>">
<!SHORTREF in-doc ";" start-key>
<!SHORTREF in-key "=" end-key-start-value>
<!SHORTREF in-value ";" end-value-start-key>
<!USEMAP in-doc doc>
<!USEMAP in-key key>
<!USEMAP in-value value>
]>
<!DOCTYPE html [
<!ELEMENT html - - (table)>
<!ELEMENT table O O (tr+)>
<!ELEMENT tr - - (#PCDATA)>
<!ATTLIST tr f_attr CDATA #IMPLIED g_attr CDATA #IMPLIED>
]>
<!LINKTYPE lnk doc html [
<!-- entity supplied by sgmlweb containing a semicolon-separated
(rather than ampersand-separated) URI query string -->
<!ENTITY QUERY_STRING_DECODED SYSTEM>
<!-- dummy notation(s( just for enjoying #CURRENT attribute propagation;
not actually executed -->
<!NOTATION aggregate-current-values
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"non-existant.sgm">
<!NOTATION check-current-key-is-f
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"non-existant.sgm">
<!NOTATION check-current-key-is-g
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"non-existant.sgm">
<!NOTATION formatting
PUBLIC "ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)//EN"
"<literal><tr f_attr='&f_attr' g_attr='&g_attr'></tr>">
<!ATTLIST #NOTATION (formatting|aggregate-current-values)
f_attr NUMBER #CURRENT>
<!ATTLIST #NOTATION check-current-key-is-f
key CDATA #FIXED "f">
<!ATTLIST #NOTATION check-current-key-is-g
key CDATA #FIXED "g">
<!ATTLIST #NOTATION formatting
g_attr CDATA #CURRENT>
<!ATTLIST (key|value|sub)
f_attr CDATA #IMPLIED
g_attr CDATA #IMPLIED
NotNames CDATA #IMPLIED
key CDATA #IMPLIED
template NOTATION (check-current-key-is-f|check-current-key-is-g|aggregate-current-values|formatting) #IMPLIED>
<!LINK #INITIAL
doc html
key #POSTLINK after-key-f [ template=check-current-key-is-f NotNames="key #CONTENT" ] #IMPLIED>
<!LINK after-key-f
value [ template=aggregate-current-values NotNames="f_attr #CONTENT" ] #IMPLIED
key #POSTLINK after-key-g [ template=check-current-key-is-g NotNames="key #CONTENT" ] #IMPLIED>
<!LINK after-key-g
value [ template=formatting NotNames="g_attr #CONTENT" ] tr>
]>
<doc>
<key>&QUERY_STRING_DECODED</sub>
</doc>
If the value for QUERY_STRING_ENCODED
were supplied as
f=1;g=value1g;f=2;g=value2g
, such as when the SGML document
were accessed via POSTing to http://../..?f=1&g=value1g&f=2&g=value2g
,
this document, when activating the lnk
link process, would
invoke the formatting
template on each logical data row represented
by a record
element with the f_attr
and g_attr
attributes
corresponding to the respective database column.
To collect URL query parameters in repeating groups into elements, the document
makes use of short reference to rewrite equals (=
) and
semicolon characters into
<key>...</key><value>...</value><key>...</key>...
element sequences
then inserts, via SGML tag inference and constraining
the respective content model, an enclosing sub
element,
acting as record container element
then uses link processing with NotNames
to collect
element content of value
elements into f_attr
and g_attr
attributes, respectively.
Moreover, the expected sequence of values for the
key
element content (eg. f
on every odd, and g
on every even element) is enforced.