Container holding document prolog metadata and associated parsing/maintenance routines.

During prolog parsing, Tokenizer invokes parse_element_decl(), parse_attlist_decl() and parse_linkset_decl() to build-up document metadata as the respective constructs are encountered in declaration sets.

During content parsing, metadata is accessed from Tokenizer and Validator via direct access to metadata maps. For automaton-related metadata, Validator performs state transitions via is_optional(), is_terminal(), get_marked_symbol(), get_all_marked_symbols(), and states_symbols().

For validating sequences of start- and end-element tags, and inferring omitted tags, this also contains a straightforward implementation of the Glushkov automaton construction following "Regular Expressions into Finite Automata" by A. Bruegemann-Klein/D. Wood.

The construction is parser-driven: reduce() computes sets of positions and transition functions associated with nodes/subexpressions of the input regexp parse tree. Also as a result of reduce(), tokens in the input regex get marked by prepending the symbol position to the symbol name (for example, (a,b) yields positions 1a and 2b).

Sets are constructed as follows:

  • nullable[d, m, e]: if set non-empty indicates expr/pos e is optional
  • firstpos[d, m, e]: set (list) of starting positions of expr/pos e
  • lastpos[d, m, e]: set (list) of ending positions of expr/pos e
  • allpos[d, m, e]: the set (list) of positions of e (ie. just the set of all positions in e)
  • follow[d, m, e, p]: the set (list) of positions of e that can immediately follow position p
  • followset[d, m, e]: the set of p's such that follow[d, m, e, p] exists (this is only used during construction to guard against inadvertently creating entries in follow[])

where

  • d is the context doctype,
  • m is the name of the element for which the top-level expression represents the content model,
  • e is a canonicalized (fully braced) expression, and
  • e is stored as a expression in E in canonical (fully braced) form because reduce() needs/uses canonical form to hide results of already reduced expressions from further being taken into consideration for precedence-parsing/shift/reduce.

Note parse_attlist_decl() (as mentioned above) is refactored into Tokenizer since it, like parse_entity_decl(), needs to be able to parse WebSGML data specifications with data attributes, re-using declaration-driven attribute parsing shared with content parsing.

Constructor

new Markupdefinitions(sgmldecl, errorhandler, locator, resolver)

Parameters

Name Type Description
sgmldecl Sgmldecl
errorhandler Errorhandler
locator Locator
resolver SystemSpecificEntityResolver

Name Description
attribute_data_attribute_values Contains data specification attribute values keyed by declaration set name, element name, attribute name, and data attribute name.
attribute_data_attributes Contains a space-separated list of data specification attribute names, keyed by declaration set name, element name, and attribute name.
attribute_declared_data_notations Contains normalized representations of WebSGML's attribute data specification syntax using NOTATIONS.
attribute_declared_values Contains normalized representations of attribute type declarations.
attribute_default_semantics Contains normalized representations of reserved names associated with declared attribute default values.
attribute_default_values Contains values extracted from attribute default value specifications.
data_attribute_data_attribute_values Contains data specification attribute values keyed by declaration set name, element name, attribute name, and data attribute name.
data_attribute_data_attributes Contains a space-separated list of data specification attribute names, keyed by declaration set name, notation name, and attribute name.
data_attribute_declared_data_notations Contains normalized representations of WebSGML's attribute data specification syntax using NOTATIONs for data attributes.
data_attribute_declared_values Like attribute_declared_values, but for data attributes.
data_attribute_default_semantics Like attribute_default_semantics, but for data attributes.
data_attribute_default_values Like attribute_default_values, but for data attributes.
data_attributes Contains, for a given (doctype, element) combination, the list of associated dat attributes (attributes of notations).
dsdl9_has_prefix_bindings Whether dsdl9_ns_to_prefix_binding isn't empty.
dsdl9_ns_to_prefix_bindings Maps a (doctype, namespace URI) combination to a canonical namespace prefix, as declared in a DSDL-9 bind-ns-to-prefix processing instruction.
element_attributes Contains, for a given (doctype, element) combination, the list of associated element attributes.
element_conref_attributes Contains, for a given (doctype, element) combination, the single attribute name with CONREF declared values, if any.
element_content_models Maps content models within a declaration set/doctype_name to the element name having the respective content model as the first element in document order.
element_declarations Maps (doctype_name, element_name) tuples to (partially) normalized representations of element content declarations.
element_end_tag_omittable Contains a non-empty value for any (doctype, element) having end-tag omission allowed.
element_exclusions Contains, for a given (doctype, element) combination, the list of associated exclusion elements, if any.
element_id_attributes Contains, for a given (doctype, element) combination, the single attribute name with ID declared values, if any.
element_inclusions Contains, for a given (doctype, element) combination, the list of associated inclusion elements, if any.
element_modelgroups Contains, for a given (doctype, element) combination, the automaton identification string returned by invoking prepare_modelgroup() on the element's content model, if any.
element_rank_groups Maps (doctype name, element name including suffix) tuples to a rank group.
element_start_tag_omittable Contains a non-empty value for any (doctype, element) having start-tag omission allowed.
element_stem_rank_groups Maps (doctype name, element name stem) to the rank group.
external_subset_pubids Contains the public identifier of the respective declaration set, if any, mapped by declaration set name.
external_subset_sysids Contains the external system identifier of the respective declaration set, if any, mapped by declaration set name.
link_attribute_specifications Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), element name, link rule number) to the attribute specification string, if any, preparsed from the link rule.
link_attributes Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), element name, link rule number, attribute name) to the normalized attribute value specified for the respective link process, link set, etc.
link_elements Contains the list of elements mapped/captured by a given link set keyed by tuples (link_process_name, link set name or `#INITIAL`/`#IDLINK`).
link_idlink_elements Contains the list of elements mapped by an IDLINK link rule in a respective link declaration set.
link_ids Contains the list of IDs mapped (covered/declared) by the singleton IDLINK link rule on a link process name, if any, indexed by namecase-normalized link process name.
link_postlink_targets Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), element name, link rule number) to the link set name specified as #POSTLINK parameter on the associated link rule, if any.
link_process_result_doctype_specs Contains normalized result doctype specs of an LPD, indexed by namecase-normalized link type name.
link_process_source_doctype_specs Contains normalized source doctype specs of an LPD, indexed by namecase-normalized link type name.
link_result_attribute_specifications Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), (source) element name, link rule number) to the attribute specification string (analogous to link_attribute_specificiations) for a result element as parsed from an explicit link rule (on explicit link processes/sets only).
link_result_elements Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), (source) element name, link rule number) to a result element, as parsed from an explicit link rule (on explicit link processes/sets only).
link_rules Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), element name) to the number of link rules contained in the respective link set for the respective element name.
link_sets Contains a space-separated list of names of link sets (or just the singleton `#INITIAL`) declared in a link type declaration, indexed by namecase-normalized link process name, including #IDLINK link sets.
link_uselink_targets Maps tuples of (link process name, link set name (or `#INITIAL`/`#IDLINK`), element name, link rule number) to the link set name specified as #USELINK parameter on the associated link rule, if any.
markdown_enabled Flag indicating that markdown shortrefs were included via a parameter entity.
notation_names Maps a doctype to a space-separated string of (or a single) notation names declared in the respective declaration set.
notation_public_identifiers Maps, for a given (doctype, notation) combination, the public identifier, if any.
notation_system_identifiers Maps, for a given (doctype, notation) combination, the system identifier, if any.
shortref_map_delimiter_literal_numbers Maps a (declaration set name, short reference map name) combination to the number of mappings/literals used in the respective map.
shortref_map_delimiter_literals Maps a (doctype-name, map-name, literal-no) combination to the replacement entity name for the respective literal in the shortref map "map-name" of the respective declaration set name (doctype name).
shortref_map_delimiter_regexpes Maps a (doctype-name, map-name, literal-no) combination to the regexp used for matching an occurence of the delimiter literal having literal-no in the shortref * map "map-name" of the respective declaration set name (doctype name).
shortref_map_replacement_entities Maps a (doctype-name, map-name, literal-no) combination to the replacement entity name mapped for the delimiter literal having literal-no in the shortref map "map-name" of the respective declaration set name (doctype name).
shortref_maps Maps a declaration set name (must be a doctype) to the list of names of shortref maps declared in that declaration set, if any.
shortref_uses Maps a (doctype-name, element-name) combination to a shortref map name.
storage_manager_notation_names Maps a doctype to a space-separated string of (or a single) notation names declared as storage manager notation in a FSI PI in the respective declaration set.

Name Description
enable_markdown Sets the markdown_enabled flag .
get_all_marked_symbols Returns all the positions/states to which the supplied state can be transitioned.
get_ambigous_token Returns an ambigous token in automaton e or the empty string if automaton e is deterministic.
get_marked_symbol Returns the position/state through which a transition over a specified symbol will be made, starting from the supplied state.
get_state_by_symbol Returns a position from a state list corresponding to a symbol.
is_optional Returns whether the supplied recognizer accepts an empty content token sequence.
is_pcdata_only_state Returns whether the supplied state list is a singleton list containing only a state with a #PCDATA symbol.
is_terminal Returns whether the single supplied state is a terminal state in the recognizer identified by the supplied parameters.
parse_element_decl Processes an element declaration from a string containing a markup declaration and calls store_element_decl() with the extracted details for each declared element.
parse_linkset_decl Processes a link set within a link process declaration.
parse_modelgroup Returns a fully parsed and "binary" parenthesized ("canonical") content model.
parse_shortref_map_decl Parses and processes a short reference map declaration within a document type declaration.
parse_shortref_use_decl Parses and processes a short reference use declaration within a document type declaration.
prepare_states Creates a recognizer for the argument model group expression and returns a token identifying the generated recognizer(s) for subsequent reference in get_marked_symbol()/get_all_marked_symbols() and is_terminal()/is_optional().
reset Resets internal state including parsed schema info, modelgroup tables, stacks.
state_symbol Returns the symbol of the single supplied state.
states_symbols Returns the set of elements in the supplied state/position set.
store_data_attribute_decl Performs the equivalent as store_element_attribute_decl for data attributes.
store_element_attribute_decl Subroutine used to store attribute list declaration details.
toJSON Custom JSON serialization exposing just markup metadata.

Member Details

attribute_data_attribute_values :Object.<string, string>

Contains data specification attribute values keyed by declaration set name, element name, attribute name, and data attribute name.

For a declaration

<!ATTLIST elmt a DATA ISO8879 [format="timeonly"]>

attribute_data_attributes maps the 4-tuple (declaration_set_name, elmt, a, format) to timeonly.

attribute_data_attributes :Object.<string, string>

Contains a space-separated list of data specification attribute names, keyed by declaration set name, element name, and attribute name.

Guards and allows enumeration over attribute_data_attribute_values.

For a declaration

<!ATTLIST elmt a DATA htmlforminputvalue [type="email" pattern=".*@xyz.com"]>

attribute_data_attributes maps (declaration_set_name, elmt, a) to the string email pattern.

attribute_declared_data_notations :Object.<string, string>

Contains normalized representations of WebSGML's attribute data specification syntax using NOTATIONS.

For a declaration

<!ATTLIST elmt a DATA ISO8879 [format="timeonly"]>)

attribute_declared_data_notations maps (declaration_set_name,elmt,attr) to the string ISO8879.

See also attribute_data_attributes.

attribute_declared_values :Object.<string, string>

Contains normalized representations of attribute type declarations.

An entry maps to one of the following strings:

  • a string starting with "(" representing a value group directly usable as a regexp against the actual attribute value
  • CDATA
  • ENTITY
  • ENTITIES
  • ID
  • IDREF
  • IDREFS
  • NAME
  • NAMES
  • NMTOKEN
  • NMTOKENS
  • NOTATION (n1|n2|...)
  • NUMBER
  • NUMBERS
  • NUTOKEN
  • NUTOKENS
  • DATA (for a data notation stored in attribute_declared_data_notations).
attribute_default_semantics

Contains normalized representations of reserved names associated with declared attribute default values.

An entry maps to one of the following strings:

  • #REQUIRED
  • #IMPLIED
  • #CURRENT
  • #CONREF
  • #FIXED
attribute_default_values :Object.<string, string>

Contains values extracted from attribute default value specifications.

If attribute_default_semantics is #FIXED for the respective attribute, an entry in attribute_default_values contains the fixed value; otherwise (no entry in attribute_default_semantics), the value contained is the default value.

data_attribute_data_attribute_values :Object.<string, string>

Contains data specification attribute values keyed by declaration set name, element name, attribute name, and data attribute name.

(like data_attribute_data_attribute_values, but with a notation name in place of an element name)

data_attribute_data_attributes :Object.<string, string>

Contains a space-separated list of data specification attribute names, keyed by declaration set name, notation name, and attribute name.

(like data_attribute_data_attributes, but with a notation name in place of an element name)

data_attribute_declared_data_notations :Object.<string, string>

Contains normalized representations of WebSGML's attribute data specification syntax using NOTATIONs for data attributes.

(like attribute_declared_data_notations, but with a notation name in place of an element name)

data_attribute_declared_values :Object.<string, string>

Like attribute_declared_values, but for data attributes.

data_attribute_default_semantics :Object.<string, string>

Like attribute_default_semantics, but for data attributes.

data_attribute_default_values :Object.<string, string>

Like attribute_default_values, but for data attributes.

data_attributes :Object.<string, string>

Contains, for a given (doctype, element) combination, the list of associated dat attributes (attributes of notations).

Guards key lookups into the per-data-attribute maps.

dsdl9_has_prefix_bindings :number

Whether dsdl9_ns_to_prefix_binding isn't empty.

dsdl9_ns_to_prefix_bindings :Object.<string, string>

Maps a (doctype, namespace URI) combination to a canonical namespace prefix, as declared in a DSDL-9 bind-ns-to-prefix processing instruction.

The mappings are interpreted such that any declared element having the prefix (as a string prefix, followed by a colon, followed by a non-colonized name) as part of the element name in the element declaration, when specified in content with an XML namespace-like colon syntax, is treated as if it were specified using the canonical prefix, where the canonical prefix is resolved using XML-like xmlns-prefix attributes.

For example, given the declarations

    <?DSDL-9 bind-ns-to-prefix
             namespace-iri="http://www.w3.org/1999/xhtml"
             prefix="html"?>
    <!ELEMENT html:a ...>

the content fragment

    <a xmlns:x="http://www.w3.org/1999/xhtml">

is treated as if

    <html:a ...>

had been specified.

element_attributes :Object.<string, string>

Contains, for a given (doctype, element) combination, the list of associated element attributes.

Guards key lookups into the per-attribute maps.

element_conref_attributes :Object.<string, string>

Contains, for a given (doctype, element) combination, the single attribute name with CONREF declared values, if any.

Used for quick lookup from eg. linkhandler

element_content_models :Object.<string, string>

Maps content models within a declaration set/doctype_name to the element name having the respective content model as the first element in document order.

This is a partial reverse map of element_declarations for quick lookup of an element name by a content model and used for automaton caching.

Only contains elements with content models (ie. not EMPTY, CDATA, RCDATA, ANY).

element_declarations :Object.<string, string>

Maps (doctype_name, element_name) tuples to (partially) normalized representations of element content declarations.

The mapped-to element declaration can be either of the following:

  • the fixed string EMPTY
  • the fixed string CDATA
  • the fixed string RCDATA
  • the fixed string ANY
  • a model group starting with a ( character

In the latter case, an entry contains the content model string as specified in the source DTD, without leading or trailing whitespace, but with other whitespace preserved, and with parameter entities expanded.

element_end_tag_omittable :Object.<string, string>

Contains a non-empty value for any (doctype, element) having end-tag omission allowed.

element_exclusions :Object.<string, string>

Contains, for a given (doctype, element) combination, the list of associated exclusion elements, if any.

element_id_attributes :Object.<string, string>

Contains, for a given (doctype, element) combination, the single attribute name with ID declared values, if any.

Used for quick lookup from eg. Linkhandler.

element_inclusions :Object.<string, string>

Contains, for a given (doctype, element) combination, the list of associated inclusion elements, if any.

element_modelgroups :Object.<string, string>

Contains, for a given (doctype, element) combination, the automaton identification string returned by invoking prepare_modelgroup() on the element's content model, if any.

Is populated for the respective element if element_declaration starts with "(").

element_rank_groups :Object.<string, string>

Maps (doctype name, element name including suffix) tuples to a rank group.

The rank group is the space-separated list of elements appearing in the name group of the ranked element declaration.

element_start_tag_omittable :Object.<string, string>

Contains a non-empty value for any (doctype, element) having start-tag omission allowed.

element_stem_rank_groups :Object.<string, string>

Maps (doctype name, element name stem) to the rank group.

Like element_rank_groups, but without rank suffix.

enable_markdown()

Sets the markdown_enabled flag

external_subset_pubids :Object.<string, string>

Contains the public identifier of the respective declaration set, if any, mapped by declaration set name.

external_subset_sysids :Object.<string, string>

Contains the external system identifier of the respective declaration set, if any, mapped by declaration set name.

Contains the string #IMPLIED if a system-specific external subset (as in <!DOCTYPE ... SYSTEM>) was specified on the declaration set.

get_all_marked_symbols()

Returns all the positions/states to which the supplied state can be transitioned.

get_ambigous_token()

Returns an ambigous token in automaton e or the empty string if automaton e is deterministic.

Note that in case there are more than one ambigous tokens, it is undefined which of those will be returned

Formally, this checks that no a_x, a_y E { (p, a) | a E follow[d, m, e, p] } exists such that chi(a_x) == chi(a_y) and phi(a_x) != phi(a_y).

Only accessed from unit tests?

get_marked_symbol()

Returns the position/state through which a transition over a specified symbol will be made, starting from the supplied state. If the supplied state/position is the empty string, the initial states of the specified modelgroup automaton will be transitioned over.

get_state_by_symbol()

Returns a position from a state list corresponding to a symbol.

Ie. returns the single state/position p from states, if any, such that { p | chi(p) = x, p E states }.

is_optional()

Returns whether the supplied recognizer accepts an empty content token sequence.

is_pcdata_only_state()

Returns whether the supplied state list is a singleton list containing only a state with a #PCDATA symbol.

This is equivalent to this.states_symbols(states) == "#PCDATA", but faster as it avoids list construction.

is_terminal()

Returns whether the single supplied state is a terminal state in the recognizer identified by the supplied parameters.

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), element name, link rule number) to the attribute specification string, if any, preparsed from the link rule.

An empty entry indicates the respective rule has an empty attribute specification (and by SGML specs/LPD validation rules ensure that a link rule having no attribute specification is the only link rule for the respective element in the link set, ie. the link_rules value for it is 1);

For IDLINK, thie mapped attribute specification also includes an attribute value specification for matching an ID (which is always interpreted as "filtering" attribute); for example <idlink i e [ a = x]> in link process p, when i is declared as ID-bearing attribute for element e, is represented as the entry tuple(p, #IDLINK, e, 1) -> id=i a=x here.

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), element name, link rule number, attribute name) to the normalized attribute value specified for the respective link process, link set, etc.

Entries in here can be accessed by first enumerating over declared link attributes via element_attributes.

Note this is populated by Tokenizer based on entries in link_attribute_specification as preparsed by parse_linkset_decl() here.

Contains the list of elements mapped/captured by a given link set keyed by tuples (link_process_name, link set name or #INITIAL/#IDLINK).

Guards/allows to enumerate over link_rules.

Contains the list of elements mapped by an IDLINK link rule in a respective link declaration set.

Contains the list of IDs mapped (covered/declared) by the singleton IDLINK link rule on a link process name, if any, indexed by namecase-normalized link process name.

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), element name, link rule number) to the link set name specified as #POSTLINK parameter on the associated link rule, if any.

Contains normalized result doctype specs of an LPD, indexed by namecase-normalized link type name.

The mapped-to value is either #IMPLIED or a namecase-normalized doctype name.

Contains normalized source doctype specs of an LPD, indexed by namecase-normalized link type name.

The mapped-to value is either #SIMPLE or a namecase-normalized doctype name.

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), (source) element name, link rule number) to the attribute specification string (analogous to link_attribute_specificiations) for a result element as parsed from an explicit link rule (on explicit link processes/sets only).

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), (source) element name, link rule number) to a result element, as parsed from an explicit link rule (on explicit link processes/sets only).

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), element name) to the number of link rules contained in the respective link set for the respective element name.

Contains a space-separated list of names of link sets (or just the singleton #INITIAL) declared in a link type declaration, indexed by namecase-normalized link process name, including #IDLINK link sets.

Maps tuples of (link process name, link set name (or #INITIAL/#IDLINK), element name, link rule number) to the link set name specified as #USELINK parameter on the associated link rule, if any.

markdown_enabled :string

Flag indicating that markdown shortrefs were included via a parameter entity.

Set by Tokenizer during doctype processing to switch on markdown. processing.

notation_names :Object.<string, string>

Maps a doctype to a space-separated string of (or a single) notation names declared in the respective declaration set.

Used to enumerate notations of a declaration set.

notation_public_identifiers :Object.<string, string>

Maps, for a given (doctype, notation) combination, the public identifier, if any.

Any declared notation has an entry in either, or both, of these notation_public_identifiers or notation_system_identifiers maps.

notation_system_identifiers :Object.<string, string>

Maps, for a given (doctype, notation) combination, the system identifier, if any.

Any declared notation has an entry in either, or both, of these notation_public_identifiers or notation_system_identifiers maps.

parse_element_decl(declaration_set_name, decl)

Processes an element declaration from a string containing a markup declaration and calls store_element_decl() with the extracted details for each declared element.

Parameters

Name Type Description
declaration_set_name string

name of the document type in which the element declaration occurs

decl string

declaration text to parse as element declaration

parse_linkset_decl(declaration_set_name, decl)

Processes a link set within a link process declaration.

The supplied declaration text is expected to begin with ie. <!LINK ...> or <!IDLINK ...>).

Parameters

Name Type Description
declaration_set_name string

name of the link process in which the link set occurs

decl string

declaration text to parse as link set declaration

parse_modelgroup(doctype, model, s)

Returns a fully parsed and "binary" parenthesized ("canonical") content model.

This is an implementation of an operator precedence parser as discussed in "Compilers: Principles, Techniques, and Tools" by Aho et. al.

Stack items are strings (either parsed/reassembled expressions or primitive tokens). Note that there's nothing significant being passed on the parse stack; reductions during parsing are executed in left-to-right, bottom-up, postorder fashion so construction of automaton tables is performed as a side effect of calling reduce()

Parameters

Name Type Description
doctype string

declaration set name in which the element/model is declared

model string

name of associated element for the model

s string

content model expression

parse_shortref_map_decl(declaration_set_name, decl): string

Parses and processes a short reference map declaration within a document type declaration.

The supplied declaration text is expected to begin with ie. <!SHORTREF ...>.

Parameters

Name Type Description
declaration_set_name string

name of the link process in which the link set occurs

decl string

declaration text to parse as short reference map declaration

Returns

string

the empty string (indicating that no markup declaration should be recorded to the caller)

parse_shortref_use_decl(declaration_set_name, decl): string

Parses and processes a short reference use declaration within a document type declaration.

The supplied declaration text is expected to begin with ie. <!USEMAP ...>.

Parameters

Name Type Description
declaration_set_name string

name of the link process in which the link set occurs

decl string

declaration text to parse as short reference map declaration

Returns

string

the empty string (indicating that no markup declaration should be recorded to the caller)

prepare_states(dt, md): string

Creates a recognizer for the argument model group expression and returns a token identifying the generated recognizer(s) for subsequent reference in get_marked_symbol()/get_all_marked_symbols() and is_terminal()/is_optional().

Note that inclusions and exlusions, as well as tag omissions, have to be handled by the caller.

Parameters

Name Type Description
dt string

doctype the automaton is associated with (note that in this and all other funcs doctype and model are merely identifiers for outside reference and aren't interpreted in any way; moreover, common subexpressions of content models in and accross content models won't generate shared automata tables due to unique position marking requirements of the Glushkov construction)

md string

content model within dt identifying the automaton (this will usually be just the element name/generic identifier for which the modelgroup is the declared content, but there's no assumption about md being made other than uniqueness accross calls)

Returns

string

a string that, together with dt and md, identifies the recognizer

reset()

Resets internal state including parsed schema info, modelgroup tables, stacks.

shortref_map_delimiter_literal_numbers :Object.<string, number>

Maps a (declaration set name, short reference map name) combination to the number of mappings/literals used in the respective map.

Used to identify and enumerate the literals of a short reference map.

The shortref_map_literals map is maintained in such a way that longer literals have smaller indices than shorter ones such that there's no ambiguity when a shorter literal is part of a larger literal for a given context.

shortref_map_delimiter_literals :Object.<string, string>

Maps a (doctype-name, map-name, literal-no) combination to the replacement entity name for the respective literal in the shortref map "map-name" of the respective declaration set name (doctype name).

Used at declaration time.

shortref_map_delimiter_regexpes :Object.<string, string>

Maps a (doctype-name, map-name, literal-no) combination to the regexp used for matching an occurence of the delimiter literal having literal-no in the shortref * map "map-name" of the respective declaration set name (doctype name).

Used at declaration time.

shortref_map_replacement_entities :Object.<string, string>

Maps a (doctype-name, map-name, literal-no) combination to the replacement entity name mapped for the delimiter literal having literal-no in the shortref map "map-name" of the respective declaration set name (doctype name).

shortref_maps :Object.<string, string>

Maps a declaration set name (must be a doctype) to the list of names of shortref maps declared in that declaration set, if any.

shortref_uses :Object.<string, string>

Maps a (doctype-name, element-name) combination to a shortref map name. Will contain no entry for a given element-name if no short reference use was declared for the element, or if it was declared/cancelled by using the #EMPTY keyword in place of a map name in the effective (last in document order) declaration.

state_symbol()

Returns the symbol of the single supplied state.

states_symbols()

Returns the set of elements in the supplied state/position set.

Formally, returns the set { chi(p) | p E states }.

storage_manager_notation_names :Object.<string, string>

Maps a doctype to a space-separated string of (or a single) notation names declared as storage manager notation in a FSI PI in the respective declaration set.

Used to perfom membership testing.

store_data_attribute_decl(declaration_set_name, notationname, attrname, declaredval, semantics, val, data_notation_name, data_notation_attribute_values)

Performs the equivalent as store_element_attribute_decl for data attributes.

Parameters

Name Type Description
declaration_set_name String

declaration set name

notationname String

notation name

attrname String

attribute name

declaredval String

the declared value (such as CDATA, etc.) of the attribute

semantics String

the semantics (such as #FIXED, #CURRENT, etc.) of the attribute

val String

the default (or fixed) value for the attribute, if any

data_notation_name String

for a WebSGML attribute with data specification, the data notation name

data_notation_attribute_values Object.<String, String>=

for a WebSGML attribute with data specification, the optional data notation data attributes as a map

store_element_attribute_decl(declaration_set_name, elmtname, attrname, declaredval, semantics, val, data_notation_name, data_notation_attribute_values)

Subroutine used to store attribute list declaration details.

Invoked by parse_attlist_decl().

This enters the attribute parameters into the attribute_declared_values, attribute_declared_values, attribute_declared_data_notations, attribute_data_attribute_values, attribute_default_semantics, and attribute_default_values maps.

Parameters

Name Type Description
declaration_set_name String

declaration set name

elmtname String

element name

attrname String

attribute name

declaredval String

the declared value (such as CDATA, etc.) of the attribute

semantics String

the semantics (such as #FIXED, #CURRENT, etc.) of the attribute

val String

the default (or fixed) value for the attribute, if any

data_notation_name String

for a WebSGML attribute with data specification, the data notation name

data_notation_attribute_values Object.<String, String>=

for a WebSGML attribute with data specification, the optional data notation data attributes as a map

toJSON()

Custom JSON serialization exposing just markup metadata.

Can be used to serialize a parsed schema into JSON for caching/packaging it along with eg. sgml-ua.

node.js-only as it won't be minified-away in browser target.