sgmlproc - normalize and process SGML documents
sgmlproc
( [
-v
option
=value
]
[
--
[
-e
entity
=replacement text
or sysid
]
[
-o
outfile
]]
file
)
| --
-h
| --
-V
sgmlproc
reads SGML markup text from file and
outputs SGML conforming to a specified target document
type, or just the base document type of the input
markup, if no particular target document type has
been requested.
The output of sgmlproc
is a document which does
have all used markup minimization features (such
as tag omission, attribute name and value omission,
attribute quoting, and short references) and other
permitted variant syntax (such as in whitespace or
namecase usage) transformed into the respective
canonical form, and references to any general entity
expanded into the respective replacement text.
sgmlproc
parses and validates input markup according to
the markup rules declared in the document's document type
declaration(s), if any. Moreover, sgmlproc
applies
transformation and templating as declared in the link
process declaration(s) of the input document, if any,
and if instructed to by requesting a particular target
document type and/or activating a link process via the
target_document_type_name
and active_lpd_names
options, respectively.
-o
outfile
Write output to the given outfile
rather than standard output
-V
Print version and built-in features, and exit
-h
Print a short synopsis, and exit
-v output_format=
FMT
where FMT
is one of sgml
, html
(the default), xml
, or none
, and
sgml
outputs markup according to the target document type's
declaration, eg. with omitted end tags for elements
having declared or implied content EMPTY
, and namecasing
rules as specified by the applicable SGML declaration
html
outputs markup with lowercase
elements and attribute names and name tkens
xml
outputs end-element tags for all elements, and with preserving
the namecase of element, attribute and notation names and name tokens
none
suppresses output
-v dtd_handling=
VAL
Option causing inclusion or suppression of output of DTD and XML
declarations where VAL
is one of
preserve
(the default), omit
, or force
preserve
includes base declaration sets and/or XML declarations
in the output when parsed from source markup or implied
by the result markup document type of an active explicit link process
omit
suppresses output of declaration sets and XML declarations
force
outputs the fixed string <!doctype html>
as DTD
(only used in combination with -v output_format=html
)
-v forward_link_attributes=
VAL
A non-empty, non-zero value causes sgmlproc
to produce
link attributes in output content
Normally, sgmlproc
uses link attributes only
for determining a template and any template parameters
to apply, if a template is implied in a given
element context of an active link process,
and outputs attributes from template SGML documents
-v suppress_warnings=
VAL
A non-empty, non-zero value causes sgmlproc
to
not print warnings (on the standard error output stream)
-v treat_recoverable_as_fatal_errors=YES
YES
causes sgmlproc
to abort processing on
the first error, whereas by default, or when given
any other value than YES
, sgmlproc
will print an error message and continue processing,
and only abort processing on unrecoverable
errors
-v strict_iso8879_compatibility=YES
|NO
Specifying strict_iso8879_compatibility=YES
switches on the following checks mandated by ISO 8879,
but not enforced by default (or when specifying value
strict_iso8879_compatibility=NO
):
Link sets with mutliple rules declared on the same (source) element must all have link attribute specifications
Declarations of #CURRENT
default values for data
attributes (attributes of notations) are rejected
-v system_specific_entity_path=
DIR
Specifies the directory where sgmlproc
looks
for resolving system-specific entities
By default, sgmlproc
looks in the main input
file's directory for files, unless a replacement
value or system identifier for a system-specific
entity has been supplied on the command line
via the -e
option
File names within the directory for system-specific
entities are resolved by interpreting the entity name
as file name, honoring the effective settings for
SYNTAX NAMECASE ENTITY
-e
ent
=
replacement text
Sets replacement text
as the value of the ent
system-specific entity
-e
ent
=<literal>
replacement text
Sets replacement text
as the value of the ent
system-specific entity
This is a variant for the aforementioned option using
the <literal>
formal system identifier notation syntax
to represent string literals
-e
ent
=<osfile>
file name
Sets file name
as the file to read the replacement
value from for the ent
system-specific entity
Note that in addition, the special system-specifc entity
sgmlstdin
can be used to supply the content of the
<osfd>0
formal system identifier (in preference to
reading content from the standard input for <osfd>0
).
-v target_document_type_name=
DOCTYPE
Specifies the document type name to produce from source document;
DOCTYPE
must be the name of a document type
definition declared in the source document
-v active_lpd_names=
LINKTYPE[,LINKTYPE,...]
Specifies one or more (comma-separated) link process name(s) to activate
-v system_specific_implied_lpd_names=
LINKTYPE[,LINKTYPE,...]
Specifies a single name or a a comma-separated list of names of additional link process(es) treated as if declared as system-specifc LPDs following actual link process declarations in the document prolog
When giving a name of a link process actually declared in
the document prolog, the respective link process name
parameter value is ignored (a link process declaration in
the document prolog is always used as effective link process
declaration in preference to one specified via
system_specific_implied_lpd_names
)
-v system_specific_implied_lpd_source_document_type_names=
DOCTYPE
[
DOCTYPE,...
]
system_specific_implied_lpd_source_document_type_names
and
system_specific_implied_lpd_result_document_type_names
can
contain (comma- or space-separated) names of the source and
result document type name, resp., of the link processes specified
at the respective position in system_specific_implied_lpd_names
(where all but the last link process must contain
names of explicit link processes)
These parameters are only used internally in nested sgmlproc
invocations for propagating source link processing context and
state to sub processes, and are not supported (nor required)
on basic sgmlproc
execution where templates are only executed
in the last, or only, link process of a link process pipeline
Only available for sgmljs.net SGML Pro
-v system_specific_implied_lpd_result_document_type_names=
DOCTYPE
[
DOCTYPE, ...
]
See above
-v enable_lax_templates=
VAL
A non-empty, non-zero value allows a template document
to declare a document type with an external declaration
set (with the value of the expected_external_dtd_subset_identifier
optino as system identifier)
By default, a template document is required to receive
markup declarations from its calling context by specifying
<!DOCTYPE ... SYSTEM>
as base document type, and/or as
a target document type
-v expected_external_dtd_subset_identifier=
sysid
|#IMPLIED
Specifies the system identifier of an external DTD subset that is expected for the main document when "lax" templating is permitted
#IMPLIED
indicates that the base DTD is expected to
be <!DOCTYPE ... SYSTEM>
(where the doctype
can be #IMPLIED
or specified), or that the prolog may
be omitted alltogether
-v disable_referential_attributes=
VAL
A non-empty, non-zero value causes attributes with
declared value ID
, IDREF
, IDREFS
, ENTITY
, ENTITIES
,
or NOTATION
, or attributes with #CURRENT
default value
to be rejected as recoverable error in content, irrespective
of whether declared in the applicable document type definition
This option is used internally to enforce referential integrity when processing "strict" templates in recursive subcontext invocations
-v disable_data_entity_references=
VAL
A non-empty, non-zero value causes parsing to produce a recoverable error on data entity references in content
This option is used internally to enforce referential integrity when processing "strict" templates in recursive subcontext invocations
-v sax_event_tracing=
VAL
Specifying sax_event_tracing
with any value, including an empty
value, causes sgmlproc
to print info about the declaration
set from which the element originates (either a document type
name for parsed elements, or a link process name for produced
result element)
The info is printed in SGML comments in regular output, next to the produced element
Not available in all sgmlproc
builds
-v sax_error_context_info_collection=
VAL
Specifying sax_error_context_info_collection
with any value,
including an empty value, causes sgmlproc
to print
the context location (system identifier of document and line
number) of not only the document where an error occurs,
but also of the document(s) and place(s) where the erroneous
document is included as entity in the running processing
context
sax_error_context_info_collection
is normally switched off
to avoid processing overhead
Not available in all sgmlproc
builds
-v disable_path_relativization=
VAL
Specifying disable_path_relativization
with any value,
including an empty value, causes sgmlproc
to print
file names in error messages as absolute rather than
relative paths
Used to produce location-independent error message output
in internal sgmlproc
testing
-v strict_markdown_pl_compatibility=YES
|NO
Specifying strict_markdown_pl_compatibility=YES
switches on emulation of Markdown_1.0.1.pl
(John Gruber's
original markdown formatter) in producing HTML from markdown
Specifically, two newlines (but not more) at the end of a
code block are collapsed into a single newline (whereas with
strict_markdown_pl_compatibility=NO
, any number
of trailing newlines at the end of a code block
is collapsed into a single newline)
Moreover, three newlines are produced from a blank code line
-v keep_trailing_codeblock_newlines
=VAL
A non-empty, non-zero value causes parsing to
reproduce blank lines and newline characters at
the end of codeblocks as parsed from source
(unless strict_markdown_pl_compatibility
is
set to YES
)
-v prune_singleton_html_paras_in_listitems=YES
|NO
A value of YES
causes sgmlproc
to remove HTML p
elements (making their content appear directly as child
content of the parent li
or dd
element), if that p
element is the sole child of the parent element
p
elements specified in markdown HTML blocks are not
pruned
These options are switched on by default for processing
SGML on a web server or browser using sgmlweb to
prevent markup injection and denial-of-service attacks,
but aren't switched on for sgmlproc
command-line SGML
processing.
-v restrict_parameter_entity_expansion=YES
|NO
A value of YES
causes sgmlproc
to abort
an attempt to perform parameter entity expansion in
entity declarations outside replacement text literals with
an unrecoverable error condition, except if the value expands
to (the expansion of) 'SYSTEM "%PATH_TRANSLATED"'
or
"SYSTEM '%PATH_TRANSLATED'"
-v disable_referential_attributes=
VAL
See description of disable_referential_attributes
above
-v disable_data_entity_references=
VAL
See description of disable_data_entity_references
above
These options set or override effective SGML declaration properties.
-v sgmldecl_syntax_namecase_general=YES
|NO
Sets the effective value of the SYNTAX NAMESCASE GENERAL
property
-v sgmldecl_syntax_namecase_entity=YES
|NO
Sets the effective value of the SYNTAX NAMECASE ENTITY
property
-v sgmldecl_features_minimize_omittag=YES
|NO
Sets the effective value of the FEATURES MINIMIZE OMIITAG
property
-v sgmldecl_features_minimize_rank=YES
|NO
sets the effective value of the FEATURES MINIMIZE RANK
property
-v sgmldecl_features_minimize_implydef_doctype=YES
|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF DOCTYPE
property
-v sgmldecl_features_minimize_implydef_element=YES
|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ELEMENT
to either YES
or NO
-v sgmldecl_features_minimize_implydef_element_anyother=YES
|NO
If specified as YES
, and specified in addition to
-v sgmldecl_features_minimize_implydef_element=YES
,
this sets the effective value of the
FEATURES MINIMIZE IMPLYDEF ELEMENT
property to ANYOTHER
FEATURES MINIMIZE IMPLYDEF ELEMENT ANYOTHER
is the default
used by sgmlproc
-v sgmldecl_features_minimize_implydef_attlist=YES
|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ATTLIST
property
-v sgmldecl_features_minimize_implydef_entity=YES
|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ENTITY
property
-v sgmldecl_features_minimize_emptynrm=YES
|NO
Sets the effective value of the FEATURES MINIMIZE EMPTYNRM
property
-v sgmldecl_features_minimize_shorttag_attrib_omitname=YES
|NO
Sets the effective value of the FEATURES MINIMIZE SHORTTAG ATTRIB OMITNAME
property
-v sgmldecl_features_minimize_shorttag_starttag_empty=YES
|NO
Sets the effective value of the FEATURES MINIMIZE SHORTTAG STARTTAG EMPTY
property
-v sgmldecl_features_minimize_shorttag_starttag_netenabl=IMMEDNET
Sets the effective value of the FEATURES MINIMIZE SHORTTAG STARTTAG NETENABL
property to the IMMEDNET
value used in WebSGML (the Annex K revision
to ISO 8897:1986) for supporting XML-style empty elements
-v sgmldecl_features_minimize_shorttag_endtag_empty=YES
|NO
Sets the effective value of the FEATURES MINIMIZE SHORTTAG ENDTAG EMPTY
property
-v sgmldecl_features_other_validity=TYPE
|NOASSERT
Sets the effective value of the FEATURES OTHER VALIDITY
property
-v sgmldecl_features_other_formal=YES
|NO
Sets the effective value of the FEATURES OTHER FORMAL
property
-v sgmldecl_features_other_urn=YES
|NO
Sets the effective value of the FEATURES OTHER URN
property
Only meaningful if -v sgmldecl_features_other_formal=YES
is also specified
sgmlproc
leaves an exit status of 0 on successful
completion, a value other than 0 otherwise.
sgmlproc
prints error and warning messages
with references to the file and line number of error
locations and details to the standard error stream.
Note the portable sgmlproc
program implemented in
the awk
programming language may in some builds
silently ignore misspelled options. This is an awk
limitation (like the required use of the --
end of
arguments marker described below).
To create canonial markup from mydoc.sgm
:
sgmlproc mydoc.sgm
To create XML markup (with end-element tags for
elements declared EMPTY
such as HTML's img
element)
from mydoc.sgm
:
sgmlproc -v output_format=xml mydoc.sgm
To activate a link process pipeline for creating HTML markup
(mydoc.sgm
is expected to declare one or more link process
declaration sets with html
result markup in its document
prolog), or just normalize input markup if mydoc.sgm
already
uses html
as base document type:
sgmlproc -v target_document_type_name=html mydoc.sgm
To produce HTML as described before, with using the
text some text
as replacement text for the
myent
system-specific entity:
sgmlproc -v target_document_type_name=html -- -e myent='some text' mydoc.sgm
Note using the --
end of arguments marker is required for compatibility
with the portable sgmlproc
program implemented in the awk
programming
language only. It isn't required, but recognized and tolerated, by
other sgmlproc
implementations such as the ECMAScript implementation
for Node.js.