Container and parsing/manipulation routines for properties representing SGML/WebSGML declaration options.

This module consists of the following parts:

  • transcription of the ISO 8879 SGML declaration grammar with Annex K overrides and additions into a regular grammar, using C preprocessor and POSIX regexp syntax for expansion into a regexp constant (sgmldecl-regexpes.js)

  • predicates and functions for syntax-checking a string containing a SGML declaration against above grammaer, and for matching/classifying against HTML/XML/SGML standard declaration syntax subgrammars

  • member fields representing (significant) SGML declaration properties; only those declaration properties are represented which can be meaningfully changed; SGML properties that can't be changed are hard-coded in (variants of) grammar rules

Note the functions initialize_from_arguments() unconditionally overrides any preexisting properties whereas initialize_defaults(), initialize_for_xml10() and initialize_from_decl() (sgmljs.net Pro only) do so only if a respective property hasn't be set before.

Constructor

new Sgmldecl(errorhandler, locator)

Parameters

Name Type Description
errorhandler Errorhandler
locator Locator

Name Description
added_requirement_public_ids Contains zero or more public identifiers representing "added requirements" as per ISO 8879 Annex K.
declared_shortref_delimiters Contains the reference concrete syntax short reference delimiters as a hash set without mapping to anything meaningful, unless configured for XML.
features_minimize_emptynrm WebSGML EMPTYNRM option allowing EMPTY elements to have end-tags.
features_minimize_implydef_attlist Contains WebSGML's option value related to whether attributes can be implied.
features_minimize_implydef_doctype Contains WebSGML's option value related to whether doctypes can be implied.
features_minimize_implydef_element Contains WebSGML's option value related to whether elements can be implied.
features_minimize_implydef_element_anyother Contains WebSGML's option value related to whether implied elements can be directly nested.
features_minimize_implydef_entity Contains WebSGML's option value related to whether references to undeclared entities can be implied as system-specific entities.
features_minimize_omittag Contains whether tag omission support is enabled.
features_minimize_rank Contains whether the RANK feature is enabled.
features_minimize_shorttag_attrib_default Contains whether attributes with default values may be omitted (YES/NO).
features_minimize_shorttag_attrib_omitname Contains whether attributes may be omitted if the value token is a unique token value among the token lists (enumerations) of all declared attributes (YES/NO).
features_minimize_shorttag_attrib_values Contains whether attribute values must be placed in (single or double quotes) (NO) or quotes around attribute values may be omitted (YES) (for certain attribute types).
features_minimize_shorttag_endtag_empty WebSGML ENDTAG EMPTY option controlling admissability of empty end-tags.
features_minimize_shorttag_starttag_empty WebSGML STARTTAG EMPTY option controlling admissability of empty start-tags.
features_minimize_shorttag_starttag_netenabl WebSGML NETENABL option controlling which element instances may have null-end tags.
features_other_formal Contains the option value to SGML's FEATURES OTHER FORMAL clause to control that public identifiers must be FPIs (or URNs if feature_other_urn is set).
features_other_keeprsre Contains the option value to the WebSGML FEATURES OTHER KEEPRSRE clause to control that trailing and leading whitespace shouldn't be ignored in mixed content.
features_other_urn Contains the option value to WebSGML's FEATURES OTHER URN clause to control that public identifiers must be formal public identifiers (or FPIs if feature_other_formal is set).
features_other_validity Contains the option value to the WebSGML `FEATURES VALIDITY` clause.
predefined_entity_replacement_text Maps predefined character entities to numerical character references.
public_declaration_reference Contains a WebSGML declaration reference public identifier, if any, supplied instead of a SGML declaration body.
syntax_namecase_entity Contains whether names of entities should be converted to uppercase (if YES).
syntax_namecase_general Contains whether names of elements, attributes should be converted to uppercase (if YES).

Name Description
initialize_defaults Populates settings from defaults unless already set.
initialize_for_html Sets sgmldecl features suitable for HTML, including HTML5.
initialize_for_markdown Sets sgmldecl features suitable for when the markdown public declaration reference has been used in an SGML declaration.
initialize_for_markdown_fragment Sets sgmldecl features suitable for when markdown is implied by a file name suffix.
initialize_for_xml10 Sets values for sgmldecl_ vars suited for XML 1.
initialize_from_arguments Populates settings from same-named configuration properties in supplied map.
initialize_from_decl_or_decl_reference Initializes SGML decl from a (possibly multiline) string containing either a full standalone SGML declaration or a declaration reference.
initialize_predefined_entities_for_html Initialize predefined_entity_replacement_text for HTML.
initialize_predefined_entities_for_xml Initializes predefined_entity_replacement_text for XML.
is_html_sgmldecl_publicid Returns whether the supplied string represents the SGML declaration public identifier for use as HTML declaration reference by sgmljs.
is_markdown_sgmldecl_publicid Returns whether the supplied string represents the markdown syntax public identifier for use as SGML declaration reference.
is_supported_xml_decl Returns whether the supplied string contains (just) a supported XML declaration.
is_xml_sgmldecl_publicid Returns whether the supplied string represents the XML 1.
save_to_arguments Populates a map with configuration properties representing declaration settings (the inverse of initialize_from_arguments()).

Member Details

added_requirement_public_ids :Object.<string, string>

Contains zero or more public identifiers representing "added requirements" as per ISO 8879 Annex K.

Doesn't map to anything, is just used as hash set

In particular, this may contain the string

ISO 8879/NOTATION Extensible Markup Language (XML) 1.0//EN`

(as per Annex L) to control that entity references should end with REFC (semicolon).

Note: can't use sole "added_requirements" as member name, will get macro-expanded by sgmldecl-regexpes

declared_shortref_delimiters :Object.<string, undefined>

Contains the reference concrete syntax short reference delimiters as a hash set without mapping to anything meaningful, unless configured for XML.

features_minimize_emptynrm :string

WebSGML EMPTYNRM option allowing EMPTY elements to have end-tags.

YES : allows elements with declared content EMPTY to have end-tags (end tags are controlled by end-tag omission rules)

NO : disallows such (which is SGML's but not our standard behaviour)

Note that, in addition, VALIDITY TAG-TYPE makes EMPTY elements require end-tags (which is wanted for XML but not HTML).

Moreover, EMPTYNRM YES makes SGML accept end-tags for elements which are implied-EMPTY by having a value for a CONREF attribute specified.

features_minimize_implydef_attlist :string

Contains WebSGML's option value related to whether attributes can be implied.

Note this also (for lack of a better option value in WebSGML's extended SGML declaration) enables that a given element type name may occur in more than one attribute list declaration.

features_minimize_implydef_doctype :string

Contains WebSGML's option value related to whether doctypes can be implied.

features_minimize_implydef_element :string

Contains WebSGML's option value related to whether elements can be implied.

Possible values are YES or NO.

If SGMLDECL_FEATURES_MINIMIZE_IMPLYDEF_ELEMENT_ANYOTHER_SUPPORT has been enabled at build time: if IMPLYDEF ELEMENT ANYOTHER has been specified in the SGML declaration or otherwise, this must have the value YES and features_minimize_implydef_element_anyother has also must have the value YES.

features_minimize_implydef_element_anyother :string

Contains WebSGML's option value related to whether implied elements can be directly nested.

If this doesn't contains YES (the default), then elements with implied element declarations can be directly nested; if this contains YES (andfeatures_minimize_implydef_elementalso containsYES`), then, upon encountering a start-element tag for an undeclared element, an end-element tag will be implied if an instance of the same element is open at the top of the output stack.

features_minimize_implydef_entity :string

Contains WebSGML's option value related to whether references to undeclared entities can be implied as system-specific entities.

features_minimize_omittag :string

Contains whether tag omission support is enabled.

Note this also controls whether tag omission flags are expected in an element declaration.

features_minimize_rank :string

Contains whether the RANK feature is enabled.

features_minimize_shorttag_attrib_default :string

Contains whether attributes with default values may be omitted (YES/NO).

features_minimize_shorttag_attrib_omitname :string

Contains whether attributes may be omitted if the value token is a unique token value among the token lists (enumerations) of all declared attributes (YES/NO).

This also controls whether tokens in attlist declarations specifying a token lists (enumerations) have to be unique to an element ("YES", as in SGML without the WebSGML adapations), or not ("NO")

features_minimize_shorttag_attrib_values :string

Contains whether attribute values must be placed in (single or double quotes) (NO) or quotes around attribute values may be omitted (YES) (for certain attribute types).

If this is NO, then features_minimize_shorttag_attrib_omitname must also be NO.

features_minimize_shorttag_endtag_empty :string

WebSGML ENDTAG EMPTY option controlling admissability of empty end-tags.

YES : allows end-element tags to be empty (eg. </>)

NO : disallows end-element tags to be empty

features_minimize_shorttag_starttag_empty :string

WebSGML STARTTAG EMPTY option controlling admissability of empty start-tags.

YES : allows start-element tags to be empty (eg. <>)

NO : disallows start-element tags to be empty

features_minimize_shorttag_starttag_netenabl :string

WebSGML NETENABL option controlling which element instances may have null-end tags.

ALL : allows null end-tags on all elements (unsupported)

IMMEDNET : allows null end-tags on elements without content

NO : disallows any null end-tags (unsupported)

This and the features_minimize_emptynrm option, as well as the settings for the NESTC and NET function characters are WebSGML adaptions for XML; with respect to XML's empty elements, the NESTC and NET delimiter settings and the NETENABLE feature work in concert as follows:

  • NESTC / and NET > make SGML generate the expected parse events for XML empty elements like <bla/>: NESTC ends the start-element tag which SGML sees as <bla/ and enables the null-end delimiter >; > then ends the element
  • technically, it would then be possible to have text such as <bla/further stuff>, but NETENABLE IMMEDNET allows null end-tags only for elements with empty content (basically, synthesizing XML parsing rules)

Currently, only the IMMEDNET or an unset value is supported; the SGML null end-tag feature is always switched on, and supported only in combination with the net-enabling start-tag close feature as explained above for XML. Moreover, NET and NESTC cannot be redefined, but will have always the hard-coded values stated above.

features_other_formal :string

Contains the option value to SGML's FEATURES OTHER FORMAL clause to control that public identifiers must be FPIs (or URNs if feature_other_urn is set).

features_other_keeprsre :string

Contains the option value to the WebSGML FEATURES OTHER KEEPRSRE clause to control that trailing and leading whitespace shouldn't be ignored in mixed content.

Only NO (or no value at all) is supported.

features_other_urn :string

Contains the option value to WebSGML's FEATURES OTHER URN clause to control that public identifiers must be formal public identifiers (or FPIs if feature_other_formal is set).

features_other_validity :string

Contains the option value to the WebSGML FEATURES VALIDITY clause.

TYPE : (the default for SGML) performs full SGML DTD checking

NOASSERT : (the default for XML) doesn't type-check elements and requires MINIMIZE OMITTAG NO

When LEGACY_VALIDITY_SEMANTICS is enabled

TAG : means that only wellformedness is checked/required; this checks that the instance is fully tagged and that empty elements have an end-tag (suited for XML) a DOCTYPE isn't required

TAG-TYPE : performs full SGML DTD checking and ensures that EMPTY elements have an end-tag (suited for XML with XML DTDs) (can be set on command line to have validation etc. turned on for backward compat with most test cases; see also EMPTYNRM)

Note: this appears syntactically as FEATURES OTHER VALIDITY in the SGML decl

Note: Annex K (official WebSGML std) defines validation different from the above description: Annex K has only the NOASSERT and TYPE options here and performs validation according to IMPLYDEF options: in particular,

  • the document element of an instance may be different from the DOCTYPE (ie. if the doctype is implied)
  • WebSGML will always perform validation if a declaration is present; otherwise it will behave according to the IMPLYDEF options

With our implementation right now, the TAG option suppresses any type checking, even if an appropriate declaration is present,

WebSGML's values map into TAG, TYPE, or TYPE, respectively, according to the following rules:

  • if VALIDITY is explicitly set to TYPE, this is taken as-is; otherwise
  • if MINIMIZE OMITTAG is set to YES, this forces TYPE; otherwise
  • if IMPLYDEF ELEMENT is set to YES, this forces TAG; otherwise
  • TAG-TYPE is set
initialize_defaults()

Populates settings from defaults unless already set.

Default values represent those for HTML parsing.

initialize_for_html()

Sets sgmldecl features suitable for HTML, including HTML5.

This just assumes SGML defaults plus MINIMIZE IMPLYDEF ANYOTHER YES, and enables predefined HTML entities.

initialize_for_markdown()

Sets sgmldecl features suitable for when the markdown public declaration reference has been used in an SGML declaration.

The instance being parsed is assumed to use a doctype for HTML or can rely on FEATURES MINIMIZE IMPLYDEF ELMENT YES so that FEATURES OTHER VALIDITY TYPE makes sense.

This performs just the same as initialize_for_html().

initialize_for_markdown_fragment()

Sets sgmldecl features suitable for when markdown is implied by a file name suffix.

This performs just the same as initialize_for_markdown().

initialize_for_xml10()

Sets values for sgmldecl_ vars suited for XML 1.0 processing.

initialize_from_arguments(args)

Populates settings from same-named configuration properties in supplied map.

Parameters

Name Type Description
args Object.<string, string>

the map of configuration properties to populate from

initialize_from_decl_or_decl_reference(decl)

Initializes SGML decl from a (possibly multiline) string containing either a full standalone SGML declaration or a declaration reference.

Can be used (just) for declaration reference parsing when the SGMLDECL_PARSING_SUPPORT macro isn't set at build-time.

Parameters

Name Type Description
decl string

string gleaned from input stream and looking like a SGML declaration or declaration reference

initialize_predefined_entities_for_html()

Initialize predefined_entity_replacement_text for HTML.

Note this does not predefine the large set of MathML entities that made it into HTML 5, but just the much smaller set of HTML4 entities due to space reasons.

initialize_predefined_entities_for_xml()

Initializes predefined_entity_replacement_text for XML.

is_html_sgmldecl_publicid(s): boolean

Returns whether the supplied string represents the SGML declaration public identifier for use as HTML declaration reference by sgmljs.net system.

Parameters

Name Type Description
s string

argument string

Returns

boolean
is_markdown_sgmldecl_publicid(s): boolean

Returns whether the supplied string represents the markdown syntax public identifier for use as SGML declaration reference.

Parameters

Name Type Description
s string

argument string

Returns

boolean
is_supported_xml_decl(decl): boolean

Returns whether the supplied string contains (just) a supported XML declaration.

Parameters

Name Type Description
decl string

argument string

Returns

boolean
is_xml_sgmldecl_publicid(s): boolean

Returns whether the supplied string represents the XML 1.0 declaration syntax public identifier for use as SGML declaration reference

Parameters

Name Type Description
s string

argument string

Returns

boolean
predefined_entity_replacement_text :Object.<string, string>

Maps predefined character entities to numerical character references.

public_declaration_reference :string

Contains a WebSGML declaration reference public identifier, if any, supplied instead of a SGML declaration body.

This is currently used for assuring that the markdown shortref delimiters are usable (and thus markdown processing is possible; in addition, an instance needs to have a DOCTYPE which includes markdown shortref maps via a parameter entity reference)

save_to_arguments(args)

Populates a map with configuration properties representing declaration settings (the inverse of initialize_from_arguments()).

Parameters

Name Type Description
args Object.<string, string>

the map to populate

syntax_namecase_entity :string

Contains whether names of entities should be converted to uppercase (if YES).

syntax_namecase_general :string

Contains whether names of elements, attributes should be converted to uppercase (if YES).