SGML

sgmlproc Manual

Name

sgmlproc - normalize and process SGML documents

Synopsis

sgmlproc ( [ -v option=value ] [ -- [ -e entity=replacement text or sysid ] [ -o outfile ]] file ) | -- -h | -- -V

Description

sgmlproc reads SGML markup text from file and outputs SGML conforming to a specified target document type, or just the base document type of the input markup, if no particular target document type has been requested.

The output of sgmlproc is a document which does have all used markup minimization features (such as tag omission, attribute name and value omission, attribute quoting, and short references) and other permitted variant syntax (such as in whitespace or namecase usage) transformed into the respective canonical form, and references to any general entity expanded into the respective replacement text.

sgmlproc parses and validates input markup according to the markup rules declared in the document's document type declaration(s), if any. Moreover, sgmlproc applies transformation and templating as declared in the link process declaration(s) of the input document, if any, and if instructed to by requesting a particular target document type and/or activating a link process via the target_document_type_name and active_lpd_names options, respectively.

Options

General options

-o outfile

Write output to the given outfile rather than standard output

-V

Print version and built-in features, and exit

-h

Print a short synopsis, and exit

-v output_format=FMT

where FMT is one of sgml, html (the default), xml, or none, and

sgml outputs markup according to the target document type's declaration, eg. with omitted end tags for elements having declared or implied content EMPTY, and namecasing rules as specified by the applicable SGML declaration

html outputs markup with lowercase elements and attribute names and name tkens

xml outputs end-element tags for all elements, and with preserving the namecase of element, attribute and notation names and name tokens

none suppresses output

-v dtd_handling=VAL

Option causing inclusion or suppression of output of DTD and XML declarations where VAL is one of preserve (the default), omit, or force

preserve includes base declaration sets and/or XML declarations in the output when parsed from source markup or implied by the result markup document type of an active explicit link process

omit suppresses output of declaration sets and XML declarations

force outputs the fixed string <!doctype html> as DTD (only used in combination with -v output_format=html)

-v forward_link_attributes=VAL

A non-empty, non-zero value causes sgmlproc to produce link attributes in output content

Normally, sgmlproc uses link attributes only for determining a template and any template parameters to apply, if a template is implied in a given element context of an active link process, and outputs attributes from template SGML documents

-v suppress_warnings=VAL

A non-empty, non-zero value causes sgmlproc to not print warnings (on the standard error output stream)

-v treat_recoverable_as_fatal_errors=YES

YES causes sgmlproc to abort processing on the first error, whereas by default, or when given any other value than YES, sgmlproc will print an error message and continue processing, and only abort processing on unrecoverable errors

-v strict_iso8879_compatibility=YES|NO

Specifying strict_iso8879_compatibility=YES switches on the following checks mandated by ISO 8879, but not enforced by default (or when specifying value strict_iso8879_compatibility=NO):

Link sets with mutliple rules declared on the same (source) element must all have link attribute specifications

Declarations of #CURRENT default values for data attributes (attributes of notations) are rejected

-v system_specific_entity_path=DIR

Specifies the directory where sgmlproc looks for resolving system-specific entities

By default, sgmlproc looks in the main input file's directory for files, unless a replacement value or system identifier for a system-specific entity has been supplied on the command line via the -e option

File names within the directory for system-specific entities are resolved by interpreting the entity name as file name, honoring the effective settings for SYNTAX NAMECASE ENTITY

Entity options

-e ent=replacement text

Sets replacement text as the value of the ent system-specific entity

-e ent=<literal>replacement text

Sets replacement text as the value of the ent system-specific entity

This is a variant for the aforementioned option using the <literal> formal system identifier notation syntax to represent string literals

-e ent=<osfile>file name

Sets file name as the file to read the replacement value from for the ent system-specific entity

Note that in addition, the special system-specifc entity sgmlstdin can be used to supply the content of the <osfd>0 formal system identifier (in preference to reading content from the standard input for <osfd>0).

-v target_document_type_name=DOCTYPE

Specifies the document type name to produce from source document;

DOCTYPE must be the name of a document type definition declared in the source document

-v active_lpd_names=LINKTYPE[,LINKTYPE,...]

Specifies one or more (comma-separated) link process name(s) to activate

-v system_specific_implied_lpd_names=LINKTYPE[,LINKTYPE,...]

Specifies a single name or a a comma-separated list of names of additional link process(es) treated as if declared as system-specifc LPDs following actual link process declarations in the document prolog

When giving a name of a link process actually declared in the document prolog, the respective link process name parameter value is ignored (a link process declaration in the document prolog is always used as effective link process declaration in preference to one specified via system_specific_implied_lpd_names)

-v system_specific_implied_lpd_source_document_type_names=DOCTYPE[DOCTYPE,...]

system_specific_implied_lpd_source_document_type_names and system_specific_implied_lpd_result_document_type_names can contain (comma- or space-separated) names of the source and result document type name, resp., of the link processes specified at the respective position in system_specific_implied_lpd_names (where all but the last link process must contain names of explicit link processes)

These parameters are only used internally in nested sgmlproc invocations for propagating source link processing context and state to sub processes, and are not supported (nor required) on basic sgmlproc execution where templates are only executed in the last, or only, link process of a link process pipeline

Only available for sgmljs.net SGML Pro

-v system_specific_implied_lpd_result_document_type_names=DOCTYPE[DOCTYPE, ...]

See above

-v enable_lax_templates=VAL

A non-empty, non-zero value allows a template document to declare a document type with an external declaration set (with the value of the expected_external_dtd_subset_identifier optino as system identifier)

By default, a template document is required to receive markup declarations from its calling context by specifying <!DOCTYPE ... SYSTEM> as base document type, and/or as a target document type

-v expected_external_dtd_subset_identifier=sysid|#IMPLIED

Specifies the system identifier of an external DTD subset that is expected for the main document when "lax" templating is permitted

#IMPLIED indicates that the base DTD is expected to be <!DOCTYPE ... SYSTEM> (where the doctype can be #IMPLIED or specified), or that the prolog may be omitted alltogether

-v disable_referential_attributes=VAL

A non-empty, non-zero value causes attributes with declared value ID, IDREF, IDREFS, ENTITY, ENTITIES, or NOTATION, or attributes with #CURRENT default value to be rejected as recoverable error in content, irrespective of whether declared in the applicable document type definition

This option is used internally to enforce referential integrity when processing "strict" templates in recursive subcontext invocations

-v disable_data_entity_references=VAL

A non-empty, non-zero value causes parsing to produce a recoverable error on data entity references in content

This option is used internally to enforce referential integrity when processing "strict" templates in recursive subcontext invocations

Diagnostic options

-v sax_event_tracing=VAL

Specifying sax_event_tracing with any value, including an empty value, causes sgmlproc to print info about the declaration set from which the element originates (either a document type name for parsed elements, or a link process name for produced result element)

The info is printed in SGML comments in regular output, next to the produced element

Not available in all sgmlproc builds

-v sax_error_context_info_collection=VAL

Specifying sax_error_context_info_collection with any value, including an empty value, causes sgmlproc to print the context location (system identifier of document and line number) of not only the document where an error occurs, but also of the document(s) and place(s) where the erroneous document is included as entity in the running processing context

sax_error_context_info_collection is normally switched off to avoid processing overhead

Not available in all sgmlproc builds

-v disable_path_relativization=VAL

Specifying disable_path_relativization with any value, including an empty value, causes sgmlproc to print file names in error messages as absolute rather than relative paths

Used to produce location-independent error message output in internal sgmlproc testing

Markdown options

-v strict_markdown_pl_compatibility=YES|NO

Specifying strict_markdown_pl_compatibility=YES switches on emulation of Markdown_1.0.1.pl (John Gruber's original markdown formatter) in producing HTML from markdown

Specifically, two newlines (but not more) at the end of a code block are collapsed into a single newline (whereas with strict_markdown_pl_compatibility=NO, any number of trailing newlines at the end of a code block is collapsed into a single newline)

Moreover, three newlines are produced from a blank code line

-v keep_trailing_codeblock_newlines=VAL

A non-empty, non-zero value causes parsing to reproduce blank lines and newline characters at the end of codeblocks as parsed from source (unless strict_markdown_pl_compatibility is set to YES)

-v prune_singleton_html_paras_in_listitems=YES|NO

A value of YES causes sgmlproc to remove HTML p elements (making their content appear directly as child content of the parent li or dd element), if that p element is the sole child of the parent element

p elements specified in markdown HTML blocks are not pruned

Security options

These options are switched on by default for processing SGML on a web server or browser using sgmlweb to prevent markup injection and denial-of-service attacks, but aren't switched on for sgmlproc command-line SGML processing.

-v restrict_parameter_entity_expansion=YES|NO

A value of YES causes sgmlproc to abort an attempt to perform parameter entity expansion in entity declarations outside replacement text literals with an unrecoverable error condition, except if the value expands to (the expansion of) 'SYSTEM "%PATH_TRANSLATED"' or "SYSTEM '%PATH_TRANSLATED'"

-v disable_referential_attributes=VAL

See description of disable_referential_attributes above

-v disable_data_entity_references=VAL

See description of disable_data_entity_references above

SGML declaration options

These options set or override effective SGML declaration properties.

-v sgmldecl_syntax_namecase_general=YES|NO

Sets the effective value of the SYNTAX NAMESCASE GENERAL property

-v sgmldecl_syntax_namecase_entity=YES|NO

Sets the effective value of the SYNTAX NAMECASE ENTITY property

-v sgmldecl_features_minimize_omittag=YES|NO

Sets the effective value of the FEATURES MINIMIZE OMIITAG property

-v sgmldecl_features_minimize_rank=YES|NO

sets the effective value of the FEATURES MINIMIZE RANK property

-v sgmldecl_features_minimize_implydef_doctype=YES|NO

Sets the effective value of the FEATURES MINIMIZE IMPLYDEF DOCTYPE property

-v sgmldecl_features_minimize_implydef_element=YES|NO

Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ELEMENT to either YES or NO

-v sgmldecl_features_minimize_implydef_element_anyother=YES|NO

If specified as YES, and specified in addition to -v sgmldecl_features_minimize_implydef_element=YES, this sets the effective value of the FEATURES MINIMIZE IMPLYDEF ELEMENT property to ANYOTHER

FEATURES MINIMIZE IMPLYDEF ELEMENT ANYOTHER is the default used by sgmlproc

-v sgmldecl_features_minimize_implydef_attlist=YES|NO

Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ATTLIST property

-v sgmldecl_features_minimize_implydef_entity=YES|NO

Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ENTITY property

-v sgmldecl_features_minimize_emptynrm=YES|NO

Sets the effective value of the FEATURES MINIMIZE EMPTYNRM property

-v sgmldecl_features_minimize_shorttag_attrib_omitname=YES|NO

Sets the effective value of the FEATURES MINIMIZE SHORTTAG ATTRIB OMITNAME property

-v sgmldecl_features_minimize_shorttag_starttag_empty=YES|NO

Sets the effective value of the FEATURES MINIMIZE SHORTTAG STARTTAG EMPTY property

-v sgmldecl_features_minimize_shorttag_starttag_netenabl=IMMEDNET

Sets the effective value of the FEATURES MINIMIZE SHORTTAG STARTTAG NETENABL property to the IMMEDNET value used in WebSGML (the Annex K revision to ISO 8897:1986) for supporting XML-style empty elements

-v sgmldecl_features_minimize_shorttag_endtag_empty=YES|NO

Sets the effective value of the FEATURES MINIMIZE SHORTTAG ENDTAG EMPTY property

-v sgmldecl_features_other_validity=TYPE|NOASSERT

Sets the effective value of the FEATURES OTHER VALIDITY property

-v sgmldecl_features_other_formal=YES|NO

Sets the effective value of the FEATURES OTHER FORMAL property

-v sgmldecl_features_other_urn=YES|NO

Sets the effective value of the FEATURES OTHER URN property

Only meaningful if -v sgmldecl_features_other_formal=YES is also specified

Diagnostics

sgmlproc leaves an exit status of 0 on successful completion, a value other than 0 otherwise.

sgmlproc prints error and warning messages with references to the file and line number of error locations and details to the standard error stream.

Note the portable sgmlproc program implemented in the awk programming language may in some builds silently ignore misspelled options. This is an awk limitation (like the required use of the -- end of arguments marker described below).

Examples

To create canonial markup from mydoc.sgm:

sgmlproc mydoc.sgm

To create XML markup (with end-element tags for elements declared EMPTY such as HTML's img element) from mydoc.sgm:

sgmlproc -v output_format=xml mydoc.sgm

To activate a link process pipeline for creating HTML markup (mydoc.sgm is expected to declare one or more link process declaration sets with html result markup in its document prolog), or just normalize input markup if mydoc.sgm already uses html as base document type:

sgmlproc -v target_document_type_name=html mydoc.sgm

To produce HTML as described before, with using the text some text as replacement text for the myent system-specific entity:

sgmlproc -v target_document_type_name=html -- -e myent='some text' mydoc.sgm

Note using the -- end of arguments marker is required for compatibility with the portable sgmlproc program implemented in the awk programming language only. It isn't required, but recognized and tolerated, by other sgmlproc implementations such as the ECMAScript implementation for Node.js.