SGML

API Reference

Name Description
Parser Implementation of the SAX Parser interface as a simplified public API for parsing SGML streams using callbacks.

Name Description
DocumentHandler Callback interface for receiving general markup events according to the SAX 1.
DTDHandler Callback interfaces for notation and entity declarations.
Errorhandler Interface for an errorhandler the SGML library user must implement and supply to processing components.
LexicalHandler Extension callback interfaces for SGML lexical events.
Recordmanager Interface used by Tokenizer and other code to start, stop, and resume receiving input records (input lines).
Saxeventmanager Interface containing a pair of functions that an outputhandler is supposed to call before and after materializing a data entity/template to stop SAX events from being submitted to it meanwhile.

SGML Parsing

Basic usage of SAX to parse and/or check SGML (or HTML or XML) and perform custom event processing as input markup is read:

var sgml = require('sgml')

var entitymanager = new sgml.NoopEntitymanager()
var resolver = new sgml.Resolver()
var parser = new sgml.Parser()

// implement handler functions according to your needs
parser.documentHandler = {
    startDocument: function() { ... },
    endDocument: function() { ... },
    characters: function(text) { ... },
    startElement: function(name, attributes) { ... },
    endElement: function(name) { ... },
}

// other handlers get initialized to no-op or default handlers
errorhandler = new sgml.Errorhandler()
parser.dtdHandler = new sgml.DtdHandler()

parser.errorHandler = errorhandler
parser.lexicalHandler = new sgml.LexicalHandler()
parser.entityResolver = new sgml.Resolver()

// we're going to parse from a string
recordmanager = new PlatformStringRecordmanager(errorhandler, parser)
recordmanager.set_input(
  "<!doctype html [ <!element html - - (#pcdata)> ]><html>hello</html>"
)

parser.recordManager = recordmanager

recordmanager.start_records()

// ... Your handler functions will get called
//  as the input stream is parsed

Parsing SGML and writing normalized result to a stream

Can be used to eg. sanitize HTML into XML for further processing.

var sgml = require("sgml")

var entitymanager = new sgml.NoopEntitymanager()
var errorhandler = new sgml.Errorhandler()
var resolver = new sgml.Resolver()
var parser = new sgml.Parser()

outputstream = process.stdout
outputhandler = new sgml.Outputhandler(outputstream, entitymanager)
outputhandler.output_format = "html"

parser.documentHandler = outputhandler
parser.dtdHandler = outputhandler
parser.errorHandler = errorhandler
parser.lexicalHandler = outputhandler
parser.entityResolver = resolver 

recordmanager = new PlatformStringRecordmanager(errorhandler, parser)
recordmanager.set_input(
  "<!doctype html [ <!element html - - (#pcdata)> ]><html>hello</html>"
)

parser.recordManager = recordmanager

recordmanager.start_records()