Tokenizer - API Reference

Constructor

new Tokenizer(sgmldecl, encoder, errorhandler, locator, resolver, entitydefinitions, markupdefinitions, attributechecker, docinfo, outputstack, contenthandler, lexhandler, validationcontenthandler, validationlexhandler, dtdhandler, recordmanager, saxeventmanager, prologhandler)

Parameters

Name	Type	Description
sgmldecl	Sgmldecl
encoder	Markupencoder
errorhandler	Errorhandler
locator	Locator
resolver	SystemSpecificEntityResolver
entitydefinitions	Entitydefinitions
markupdefinitions	Markupdefinitions
attributechecker	AttributeChecker
docinfo	Docinfo
outputstack	Object
contenthandler	DocumentHandler
lexhandler	LexicalHandler	LexicalHandler to send startDtd()/endDtd() events to
validationcontenthandler	DocumentHandler
validationlexhandler	LexicalHandler
dtdhandler	DTDHandler	DTDHandler to send notations and declarations of unparsed entities (from declaration sets) to
recordmanager	Recordmanager
saxeventmanager	Saxeventmanager
prologhandler	Prologhandler

Property Index

Name	Description
active_lpd_names	Contains optional linktype names to activate.
bundledfunctions	Generated/prebuilt module containing functions configured at build time.
expected_external_dtd_subset_identifier	Contains a (system) identifier that the DTD is expected to reference.
ignore_clear_unresolved_entity_name	Flag indicating that calls to clear_unresolved_entity() (issued from markdown) should be ignored.
no_stalling_at_end_of_markup	Flag to indicate that - `parse_markup()` should proceed, rather than postpone and reparse, on characters at the end of `markup_buffer` - a fatal error should be generated if encountering incomplete markup is detected.
no_stalling_on_unresolved_entity	Public flag indicating that `parse_markup()` should proceed, rather than return if it is stalled on a link entity reference.
running_as_template_subprocessing_context	Flag to indicate that processing is performed on a template (in a template subprocessing context).
system_specific_implied_lpd_names	Contains the (comma-separated or single) name(s) of forced additional LPDs treated as if declared as system-specifc LPDs following actual present LPD in the document prolog, unless an LPD with that name is present in the actual prolog explicitly.
system_specific_implied_lpd_result_document_type_names	Contains the (comma-separated) names of result doctypes of implied LPDs, if any, such that a name represents the result doctype of the implied explicit link process at the corresponding position in system_specific_implied_lpd_names.
system_specific_implied_lpd_source_document_type_names	Contains the (commad-separated) names of source doctypes of implied LPDs, if any, such that a name represents the source doctype of the implied explicit link process at the corresponding position in system_specific_implied_lpd_names.

Method Index

Name	Description
append_markup	Appends text to markup_buf and calls parse_markup() to process markup_buf.
clear_unresolved_entity_name	Resets the unresolved entity name which causes `parse_markup()` to stall, if any.
configure	Configuration method.
derive_storage_manager_notation_metadata	Copies attribute declarations from super-storage manager notations to derived storage manager notations, traversing through the chain of superdcn-values of the argument storage manager notation name.
end_markup	Used after call(s) to `append_markup()` to indicate that no more text will be fed via `append_markup()`.
get_unresolved_entity_name	Returns the unresolved entity name which causes `parse_markup()` to stall, if any.
is_data_specification_attribute	Returns whether supplied named attribute is a DATA specification attribute of elementtype.
parse_attlist_decl	Parses an attribute declaration from a string containing a markup declaration and calls store_element_attribute_decl()/store_data_attribute_decl() with the extracted details for each declared element/attribute combination.
parse_notation_decl	Parses and stores a declaration from a string containing a notation declaration.
remove_linefeed_at_end_of_markup	Helper function to remove last character from markup_buf if it is a newline character.
reset	Resets internal state.
set_debug_emit_ctx_token	Sets the string printed as part of debugEmit messages.
set_unresolved_entity_name	Sets an arbitrary entity name as unresolved, which will cause the next call to `parse_markup()` to stall.
switchoff_stalling_at_end_of_markup	Sets no_stalling_at_end_of_markup.
switchoff_stalling_on_unresolved_entity	Sets no_stalling_on_unresolved_entity.

Member Details

active_lpd_names :string

Contains optional linktype names to activate.

If active_lpd_names is set to a string containing one or more (space- or comma-separated) link type names referring to simple or implicit links, then those will be activated (at most one implicit link is activated, though); if additionally, target_document_type_name is set, then LPDs needed to result in that doctype name will be activated as described above and simple links and a single implicit link according to active_lpd_names will be activated on the ultimate output of the explicit link chain; active_lpd_names may also contain explicit link names, in which case explicit link definitions with matching names will be used as part of the link chain; it's not an error if link processes named by tokens in active_lpd_names aren't actually declared, but if a link process declared in the prolog matches a name of active_lpd_names, these will always be activated and it's and error if those can't be activated (such as when multiple implicit links are attempted to be activated

append_markup(text)

Appends text to markup_buf and calls parse_markup() to process markup_buf.

Parameters

Name	Type	Description
text	string

bundledfunctions

Generated/prebuilt module containing functions configured at build time.

clear_unresolved_entity_name()

Resets the unresolved entity name which causes parse_markup() to stall, if any.

configure(args)

Configuration method.

Parameters

Name	Type	Description
args	Object.<string, string>	Map of configuration properties

derive_storage_manager_notation_metadata(sm_notation_name)

Copies attribute declarations from super-storage manager notations to derived storage manager notations, traversing through the chain of superdcn-values of the argument storage manager notation name.

At most two levels of superdcn-chain are traversed.

Also sets the superdcn value of the argument notation to the ultimate storage manager notation if a three-step derivation is performed, so that the conditions expected by rewrite_custom_into_base_storage_manager_notation() (eg. superdcn-value contains a natively supported storage manager notation) is met.

Parameters

Name	Type	Description
sm_notation_name	string	derived storage manager notation to assert attribute declarations for

end_markup()

Used after call(s) to append_markup() to indicate that no more text will be fed via append_markup(). This drains the output context of any outstanding tag validation/inference or dispatch events, and also demarcates the end of an input entity.

expected_external_dtd_subset_identifier :string

Contains a (system) identifier that the DTD is expected to reference. If this is #IMPLIED, this indicates that we're expecting the base DTD to be <!DOCTYPE ... SYSTEM> (where the doctype can be #IMPLIED or specified), or that we can omit the prolog alltogether.

get_unresolved_entity_name()

Returns the unresolved entity name which causes parse_markup() to stall, if any.

ignore_clear_unresolved_entity_name

Flag indicating that calls to clear_unresolved_entity() (issued from markdown) should be ignored.

The purpose is to ensure that no asynchronous fetches are triggered from markdown cleanup() processing.

Used from Recordhandler.

is_data_specification_attribute()

Returns whether supplied named attribute is a DATA specification attribute of elementtype. qprivate

no_stalling_at_end_of_markup :number

Flag to indicate that

parse_markup() should proceed, rather than postpone and reparse, on characters at the end of markup_buffer
a fatal error should be generated if encountering incomplete markup is detected.

Set by end_markup() before invoking the final call to parse_markup() to flush remaining characters in markup buffer, if any.

no_stalling_on_unresolved_entity

Public flag indicating that parse_markup() should proceed, rather than return if it is stalled on a link entity reference.

Used from end_markup() and other places to force-flush the markup buffer. Will cause markdown roundtrip text to be written out to the output.

Note: the outputhandler might receive more than one consecutive characters() event in this case

parse_attlist_decl(declaration_set_name, decl)

Parses an attribute declaration from a string containing a markup declaration and calls store_element_attribute_decl()/store_data_attribute_decl() with the extracted details for each declared element/attribute combination.

Also used for attlists declared in LPDs

Parameters

Name	Type	Description
declaration_set_name	string	name of the document type in which the attribute list declaration occurs
decl	string	declaration text to parse as attribute list declaration

parse_notation_decl(declaration_set_name, decl)

Parses and stores a declaration from a string containing a notation declaration.

Like parse_entity_decl(), this is called whithout expanded parameter entities and uses its own effective markup declarations recording code.

Parameters

Name	Type	Description
declaration_set_name	string	name of declaration set in which the notation decl occurs
decl	string	code text of the notation declaration to parse

Returns

the parameter-entity-expanded declaration that was processed

remove_linefeed_at_end_of_markup()

Helper function to remove last character from markup_buf if it is a newline character. Factored out from end_markup() for consistency of results of async vs sync processing (ie. used prior to the final call to parse_markup() which, for async case, performs some of what end_markup() does for sync case).

reset()

Resets internal state.

running_as_template_subprocessing_context :number

Flag to indicate that processing is performed on a template (in a template subprocessing context).

Used for warning about entity references to system-specific entities with lower or mixed case names if SYNTAX NAMECASE GENERAL and SYNTAX NAMECASE ENTITY don't match.

set_debug_emit_ctx_token()

Sets the string printed as part of debugEmit messages.

set_unresolved_entity_name()

Sets an arbitrary entity name as unresolved, which will cause the next call to parse_markup() to stall. Used by Recordhandler to force stalling parse_markup() during markdown cleanup.

switchoff_stalling_at_end_of_markup()

Sets no_stalling_at_end_of_markup.

switchoff_stalling_on_unresolved_entity()

Sets no_stalling_on_unresolved_entity.

system_specific_implied_lpd_names :String

Contains the (comma-separated or single) name(s) of forced additional LPDs treated as if declared as system-specifc LPDs following actual present LPD in the document prolog, unless an LPD with that name is present in the actual prolog explicitly. The names in system_specific_implied_lpd_source_document_type_names and system_specific_implied_lpd_result_document_type_names, resp., contain the corresponding source and result doctypes.

Implicit link processes don't have a corresponding source or target doctypes, and are specified as the last (or only) name in system_specific_implied_names, without a corresponding name in either system_specific_implied_lpd_source_document_type_names or system_specific_implied_lpd_result_document_type_names.

For example, a set of consistent values for system_specific_implied_lpd_names, system_specific_implied_lpd_source_document_type_names, and system_specific_implied_lpd_result_document_type_names is

LNK1,LNK2,LNK DOC,OUT OUT,OUT2

which will by treated as though the following link process names were present in the SGML prolog (where the document types names are assumed to be present)

<!DOCTYPE DOC [ ... ]> <!DOCTYPE OUT [ ... ]> <!DOCTYPE OUT2 [ ... ]> <!LINKTYPE LNK1 DOC OUT SYSTEM> <!LINKTYPE LNK2 OUT OUT2 SYSTEM> <!LINKTYPE LNK OUT2 #IMPLIED SYSTEM>

system_specific_implied_lpd_result_document_type_names :String

Contains the (comma-separated) names of result doctypes of implied LPDs, if any, such that a name represents the result doctype of the implied explicit link process at the corresponding position in system_specific_implied_lpd_names.

system_specific_implied_lpd_source_document_type_names :String

Contains the (commad-separated) names of source doctypes of implied LPDs, if any, such that a name represents the source doctype of the implied explicit link process at the corresponding position in system_specific_implied_lpd_names.