SGML

Markdown Reference

Introduction

The Markdown Wiki syntax was designed as a simplified syntax for producing HTML text based on widely used conventions for writing e-mails, forum posts, and other plain text pieces. Markdown has since become ubiquitous on the web, being used by services such as GitHub and Stack Exchange, and having close to a hundred independent implementations as of 2016.

Markdown defines a Wiki syntax for the most common structural, typographical, and hyperlinking features of HTML (as explained in the remainder of this text). Where markdown doesn't provide a syntax for a particular HTML construct, it's possible to just use HTML directly in markdown text, as either inline markup or markup block.

sgmljs.net Markdown is presented as an application of the SGML SHORTREF feature, which is an SGML mechanism to describe custom Wiki or other domain-specific syntax. Whereas in regular markdown it's possible to use HTML markup, in sgmljs.net Markdown it's also possible to use SGML markup and other constructs, bringing SGML's vast facilities for text organization and processing to markdown in a natural way.

sgmljs.net Markdown implements the original markdown syntax with the "fenced code blocks" and "tables" feature of GitHub-flavored markdown, and with select pandoc markdown extension features. In the remainder of this text, markdown.pl is used as a reference to John Gruber's original Markdown when discussing differences between sgmljs.net Markdown and other markdown formatters.

Typography

Markdown performs the following conversions on special characters in body text, headers, or nearly almost everywhere else as follows:

  • *Emphasized text* or _Emphasized text_ is converted to <em>Emphasized text</em>

  • **Strongly emphasized text** or __Strongly emphasized text__ is converted to <strong>Strongly emphasized text</strong>

  • `Text put in *backtick* characters` is produced to the output verbatim, within <code> tags: <code>Text put in *backtick* characters</code> (with any markup and markdown content escaped)

  • Backslash characters themselves (and other characters having special significance) can be escaped using backslashes

Rules

Balanced pairs of above character tokens get replaced by the respective start- and end-tags according to these rules:

  • token occurrences with following blanks won't be considered as start and token occurrences with preceding blanks won't be considered as tokens
  • additionally, underscore occurrences following non-blanks won't be considered start and token occurrences with preceding non-blanks won't be considered end tokens; this extended behavour introduced by pandoc and not performed by markdown.pl, is intended for underscore characters part of an "identifier"-like word (snake case), which shouldn't be treated as special characters

Markdown produces a hard line break (a HTML <br> tag) on a line ending in two or more space characters.

See also Typography Examples.

Lists

In markdown, a list is created by starting a line with a -, +, or * character:

*   This line starts a list item consisting of a single line of text
*   This list item is formatted similar to the one before,
    even though it is continued on the next line in markdown text.
*   This item has two paragraphs.

    As can be seen, a list item is continued even if its text lines
    are separated by a blank line. A continuation line only needs to
    be indented to the same level.
*   As this list item shows,

	only the first line following a blank line needs indentation;
subsequent lines in markup text may start at the line beginning,
and will still be treated as part of the list item

The following example shows how to produce nested lists:

- Top-level list item 1
- Top-level list item 2
	- Sublist item 21
	- Sublist item 22
- Top-level list item 3
	- Sublist item 31
- Top-level list item 4
	- Sublist item 41
	  Continuation line for sublist item 4
	    - Sub-sublist item 411
             Continuation line for sub-sublist item 411
	- 	- Sub-subsublist item 4111
		- Sub-subsublist item 4112
	- Sublist item 42

Rules

  • Lines starting with either of the characters -, +, or *, followed by one, two, or three space characters or a tab character, followed by a non-space character, create list items, if they are placed at the same indentation level as previous content (or without indentation at the begin of a file). The first such line following some content at equal or lower indentation level creates a new list; subsequent list items add additional list items if no content at lower indentation level is placed in between.

  • List item text is continued in the lines following the list item start line, and may be placed at any indentation level.
  • After a blank line, however, a list item is continued by using one level of indentation more than the list item start line to which it belongs. If list item text is continued after a blank line, it will be put into a new paragraph.

  • If a list item contains paragraphs, sublists or other rich content apart from a single paragraph of text, any text content of that list item is put into paragraphs (HTML para elements). If a list item only contains a single paragraph of text, the text is put directly as the sole content of the HTML <li> element, without wrapping the item's content into <para> tags

    • sgmljs.net doesn't prune <p> elements on list item content which has span-level typography markup (see listitems-in-paras2 example)

  • List item text is also put into paragraphs if the item is separated by a blank line from preceding or following items, even if the item content is a single paragraph.

Note that a list marker without list item text (an "empty list item") will not start a list

Note also that a blank line is needed before a list start after paragraphs, but not after headers

See also List Examples.

Ordered Lists

Lists can also be started using numbers. Numbered lists which will be rendered as HTML ordered list:

1. First item
2. Second item
3. Third item

Rules

  • Indenting and nesting rules for ordered lists are the same as for plain (unordered) lists are
  • markdown doesn't support list numbers with multiple number components such as `1.2.3."; lines starting with such list numbers aren't recognized as list item lines at all

Definition Lists Extension

To produce HTML definition lists, the following syntax is used:

definition term
: definition
: further definition
: ...

See also Definition List Examples.

Headers

In markdown, section headers are commonly created in the atx-header style as follows:

# Header #
Body text

Alternatively, and less commonly, headers can also be created in the setext-header style:

Header
------

Body text

Fragment Id generation Extension

sgmljs.net, like pandoc, also generates fragment (link) IDs for subsequent (or prior) reference in reference links based on the header text as follows:

  1. Remove all punctuation, except underscores, hyphens, and periods
  2. Replace all spaces and newlines by hyphens
  3. Convert all alphabetic characters to lowercase
  4. Remove everything up to the first letter (remove any leading non-alpha chars since identifiers must not begin with a digit or punctuation character)

See also Header Examples, also including details of automatic link id creation.

Inline markup

Markdown doesn't provide syntax for every HTML markup construct. Instead, when needed, HTML can be used directly in a markdown file. For example, to create subscripted text using inline HTML, H<sub>2</sub>O, CO<sub>2</sub> can be used.

Likewise, to place an anchor around a portion of text, inline HTML <span> (or other) elements can be used like this:

this is <span id="mylink">some text that will scroll in view</a>

Rules

  • Inline HTML must be wellformed HTML or XML; in particular, any start-element tag must be matched by a corresponding end-element tag on the same line.
  • Multiple inline HTML fragments can be placed into a single line of markdown text.
  • Markdown syntax within inline HTML fragments is formatted using markdown rules, just like any other text outside of HTML markup.

Creating explicit link targets (fragment identifiers) like above isn't often used in markdown documents because link targets are most often placed on headers and similar structural HTML elements; such links are created automatically by sgmljs.net.

See also Inline HTML Examples.

Markup blocks

A more common use of link targets on elements other than headers is with text citations. Often the author will want to make cited text stand out from surrounding text, so that text stands out as citation. The following example uses an HTML block containing the <div> block-level HTML markup element within markdown text:

markdown text
...
<div>
  <a id="imitation-cite">
  <p>Imitation is the sincerest form of flattery.</p>
  <i align="right">-- attributed to Charles Caleb Colton</i></a>
</div>
...
markdown text

Since a link target element with an id is placed around the citation block, it may be linked to from other places in the document, or from outside of the document.

This example demonstrates HTML blocks. HTML blocks, as opposed to inline HTML, consist of HTML on one or more lines of it's own, rather than within a running line of markdown text. HTML block level elements such as <p>, cannot be used within inline HTML in markdown.

Other possible uses include HTML image maps or HTML tables for tabular data requiring more sophisticated formatting than is possible using markdown alone.

Rules

  • Text lines starting with markup element, DOCTYPE, processing instructions, or marked section tags are considered the start of a markup code block, if preceded by a blank line, or if at the begin of a file.

  • The contents of markup blocks is appended as-is to the output document, without markdown formatting. markdown syntax within a markup block (or a code block) isn't formatted by sgmljs.net.

  • A markup block is ended by a blank line, and must be well-formed (balanced) HTML or XML markup.
  • To include a blank line (newline characters) in a markup block, use HTML character entities, such as &#xA. Note that this will be ignored by HTML renderers, unless it appears within a <pre> and/or <code> block.

See also HTML Block Examples.

Code Blocks

Preformatted text with spacing and line breaks that should be preserved in the output, such as software source code or verbatim HTML/XML text, can be put in Code Blocks to prevent markdown from formatting it.

A Code Block is created by indenting it one level more than the previous or surrounding paragraph or list item to which it belongs.

For example, the following is a code block:

	Newlines are preserved,
	and markdown syntax in code blocks is *NOT* formatted
	/* so that the syntax of a programming language being
	 * displayed is rendered as-is
	 */

Rules

  • A code block must be separated from preceding text by one or more blank lines.
  • A code block doesn't have to be terminated by a blank line; instead, a code block ends if indentation returns to a previous level, so, unlike with lists, omitting space characters for indentation on the second and subsequent line(s) of a code block won't continue a code block, with the following exception.

  • Like other code text, a blank line in code blocks is created by indenting it to the same level as the surrounding non-blank text lines in the code block. But this may become tedious to edit in text editing programs that don't indicate the presence of space characters visually. Therefore, blank lines in code blocks can also be created without any indentation at all, ie. just with a simple non-indented newline character ("lazy" blank lines)
  • in generated HTML, one level of indentation is subtracted from code block text, so a code block aligns horizontally at the hierarchy level it appears in (ie. with standard browser CSS rules)
  • over-indenting, ie. placing more tabs or quadspaces than necessary before code block text, has the effect that the additional whitespace becomes part of the code block text, and will be rendered as tabs or whitespace in the monospaced font used for code blocks; this makes code blocks convenient to use for copy & paste of code snippets written in programming languages with special indentation rules such as Python
  • if writing larger texts on programming with code snippets on different hierarchical levels, additional tabs or quadspaces can be inserted to the left of code text so that code snippets can be horizontal aligned accross a larger text with multiple code blocks; however, if code blocks are placed at different hierarchy levels, output isn't necessary aligned as the initial indent is calculated in the font/rules for non-code block text whereas indentation within code blocks is calculated based on tab stop sizes in the chosen monospace font
  • markdown.pl outputs three newlines for any number of consecutive blank lines, whether the blank line is indented or not; this results in two blank lines in the rendered output being displayed for any single blank line in markdown text
  • sgmljs.net outputs as many newlines as present in the markdown text so that blank lines in the rendered output match those of the markdown text.
  • text from code blocks is put into <pre><code> HTML elements

Fenced Code Blocks Extension

In addition to standard markdown indented code blocks, sgmljs.net also supports GitHub-flavored markdown-style fenced code blocks:

Fenced code block example:

console.log('Code block goes here`)

Rules

  • Fenced code blocks are started and ended using a line beginning with three tilde (~) or backtick (```) characters

  • An optional string following the three tilde or backtick characters in a fenced code block start line is put (after sanitation) into the class attribute of the produced code element in HTML output

  • Fenced code blocks, like standard indented code blocks, can be nested in blockquotes and/or list items

See also Code Block Examples.

Links

Inline links are span-level elements beginning with a left square bracket, followed by link text, followed by a right square bracket, followed by a locator in regular parentheses. The locator consist of an URL, optionally followed by a space character and a double-quoted link title. where the link title is what gets into the title attribute of the produced a anchor element.

For example, the inline link [a link](#) gets formatted into <a href="#">a link</a>.

See also Inline Link Examples.

An inline link having just an URL, without link text or link title can be written in a shortcut auto-link form -- an URL in angle brackets -- like this: <https://daringfireball.net/projects/markdown/> (which will create a link to John Gruber's original markdown page, using the URL put into <code> tags as link text).

So that an auto-link at the begin of a line isn't recognized as a markup block or as inline markup, the initial scheme: portion of the auto-link must be one of http:, https:, file:, or mailto:.

Note that, unlike some other markdown implementations, sgmljs.net does not mangle mailto: auto-links into a Javascript snippet for email address harvesting protection. Such functionality must be implemented as a post-processing step using templating.

Additional shortcuts Extension

A mailto: auto-link may be written by just putting an email address into angle brackets -- the @ character is enough to make sgmljs.net recognize this syntax as auto-link.

In addition, sgmljs.net also supports auto-links with just a : (colon) as scheme: part. Such auto-links are generated to an <a> having the scheme part omitted, such that browsers take the scheme part from the document it is placed in. This cannot be expressed by merely leaving out the scheme part in standard markdown because an auto-link is syntactically required to have a scheme part to be recognized as such.

See also Auto-Link Examples.

Reference links are comprised of two parts: a link similar to an inline link (but without locator), and a link definition elsewhere in the document.

A reference link, like an inline link, begins with a left square bracket followed by link text followed by a right square bracket. After that, reference links take another form than inline links, and are completed by an optional space character, followed by a left square bracket, followed by the link id, followed by a right square bracket.

A reference link definition is a line of itself beginning with a pair of square brackets containing the link id that is being defined, followed by a colon, followed by an URI, followed by a double-quoted link title. The URI may optionally be surrounded by '<' and '>' characters. Arbitrary whitespace may be placed between the colon and the URI, and the URI and the title. The title may also be written in the next line.

See also Reference Link Examples.

A short link (introduced by pandoc) has the same syntax as an external reference link, but with an empty, or omitted, second pair of square brackets. For a short link, the link id will act as the link text.

See also Short Link Examples.

Images

An HTML image link can be inserted using the same syntax as an inline link or as as a reference link, with a ! character placed before the construct:

For example, this is an inline image "link":

An ![Image link](http://some.url/image.png)

And this is a reference-style image "link":

An ![image link][imagelink]

[imagelink]: <http://some.url>

See also Image Examples.

Block Quotes

Text on lines beginning with the > character, and any continuation lines following it until a blank line, will be produced into HTML block quotes. For example, in the following text two levels of block quoting are used:

Block-quoted text following:
> > Nested Block-quoted text
>
> Block-quoted text

Rules

  • block quoting nesting levels are generally determined by the number of > characters at the begin of a line

  • with respect to spaces and tab characters allowed to follow > characters up to a subsequent > character in nested block quoted lines, so that these are recognized as block quote nesting rather than e.g. nested code block, the following steps are applied:

    1. a single space (not tab) following > isn't significant, and gets discarded before the next step (but is significant when preceding the first > on the line)

    2. in further processing of > characters, quadspace rules analogous to those for sublist nesting are applied; however, two spaces rather than four is taken as tab-equivalent (tabs themselves get recognized in the same way as in normal quadspace indenting)

    3. a single space (not tab) preceding '>' isn't significant, and gets discarded before the following step
  • there's an ambiguity in the case of four spaces, which can be interpreted as two doublespaces, or a trailing space plus a doublespace and a leading space; the latter is the interpretation that markdown.pl does (see blockquote6 example)
  • there's an additional subtlety with two spaces and a tab (see blockquote9 example) which isn't recognized as two quadspaces, whereas a tab followed by two spaces (blockquote13) is; to make this consistent with the rules above, we treat a tab with a leading space as a single tab in the 2nd step
  • if there are tabs/quadspaces in place of a block quote character, markdown.pl and pandoc, like sgmljs.net, will treat this as a block quoted code block; however, markdown.pl and pandoc will change the first tab or quad-/doublespace after block quote characters (and only that) into two characters, whereas sgmljs.net leaves it as-is

    The reasoning behind markdown.pl's behavior seems to be an implementation detail: within a block quote, doublespaces rather than quadspaces are significant, and at the point in time markdown.pl recognizes the codeblock, it already has pruned away a doublespace in expectation of another block quote char; markdown.pl then compensates by prepending a doublespace before the code block, irrespective of what actual indentation marker was used

  • these rules also work at the first level (though a tab can't occur), ie. up to three chars from before the initial '>' char get pruned; which is probably why these rule have been designed as they are

Note that whereas markdown.pl and pandoc change tabs into quadspaces in plain codeblocks, sgmljs.net also leaves those as-is; the reason is that some programming languages (e.g. Python) treat whitespace as significant tokens, and the assumption that a tab translates into four spaces (or two spaces in the case of block quoted code block), while it might work for Python, isn't appropriate for a markdown processor to make; for example, classic Makefile syntax wouldn't roundtrip through markdown.pl cleanly.

Note that a single space following a block quote character always gets discarded in the first step if present, even when no further block quote characters are following, so that subsequent processing of the block quoted content never gets to see the space (this is what markdown.pl, pandoc, and sgmljs.net does); markdown.pl and pandoc seem to remove even more than a single space (detailed rules aren't documented, though); sgmljs.net doesn't do this, for already stated reasons.

See Block Quote Examples for details.

Rulers

A line consisting entirely of asterisk, hyphen, or underscore characters and having one of the following forms

* * *
- - -
_ _ _

***
---
___

is formatted as HTML <hr> (horizontal ruler) element.

See Ruler Examples for details.

Tables Extension

As an extension to standard markdown syntax, sgmljs.net has limited support for producing tables in GitHub-flavored markdown style from markdown text such as the following:

Header 1     | Header 2
------------ | --------
Cell 11      | Cell 12
Cell 21      | Cell 22

This will produce the following HTML output:

<table><tr><th>Header 1</th><th>Header 2</th></tr>
<tr><td>Cell 11</th><th>Cell 12</th></tr>
<tr><td>Cell 21</td><td>Cell 22</td></tr></table>

Rules

  • sgmljs.net recognizes table cells when separated by the vertical bar character |; vertical bar characters aren't treated as cell separators in code blocks, markup blocks, backticked code spans, or when escaped by the backslash \ character

  • An optional table row with all cells containing just hyphen (-) characters (ignoring leading and trailing space characters in cells), separates preceding header rows from subsequent body rows; further separator lines aren't recognized and produced as-is to the result table

  • While backticked code spans can run across multiple lines, escaping of vertical bar characters in backticked code spans only considers single lines in isolation when determining whether to interpret vertical bar characters as table cell separator, as opposed to reproducing those verbatim to the output. Consequently, the markdown processor might produce unexpected results when a table is used in combination with a multi-line backticked code span. For this reason backticked code span in table cells should always end on the same line on which it is started
  • If all cells of the first and the last table column are empty (such that all table lines begin with and end in |), then the first and last column is ignored for output to HTML

  • Tables can also be nested in lists and block quotes

  • Multi-line table cells and table captions aren't supported in sgmljs.net
  • Note also that double-backticked code spans (such as this) aren't considered in escaping vertical bar characters

  • Horizontal column alignment markers as supported in GitHub-flavored markdown (using colons to indicate alignment positions) are recognized, but have no effect on rendered HTML

See also Table Examples.

Entities Extension

As an extension to base markdown, sgmljs.net supports full SGML in markup blocks and inline markup. Among many other features (described in detail in SGML Sytax Reference), this brings support for SGML named entities as a straightforward variable expansion and text reuse technique to markdown.

Named entities are symbolic names which can be assigned a piece of markdown or markup text; once declared this way, a variable may be referenced, ie. placed in running markdown text. sgmljs.net will replace a reference to a variable by the text assigned to it when producing output.

In SGML, entities are declared in a special piece of SGML at the begin of the SGML file called a document type definition (DTD). We aren't going to discuss in detail what a DTD is here, but will just explain enough of it to understand entity declarations in a DTD and their use in markdown, so that this section is a self-contained description of SGML entities as used from markdown.

To declare an entity, a DTD must be placed as a markup block at the begin of a file. For example, the following markdown text

<!doctype html [
	<!entity my_variable_name "my replacement text">
]>
  • declares the type name of the document to be html; the type name must match the document element of the generated markup file; since markdown is an abbreviated syntax for HTML, this will always be html for the use cases explained in this chapter

  • declares an entity named my_variable_name having the replacement text my replacement text; the replacement text contains everything in double quotes

With the above DTD at the begin of the document, we can We reference a variable in subsequent markdown using &my_variable_name, and sgmljs.net will replace &my_variable_name; by "my replacement text".

The leading & (ampersand) character starts an entity reference; the trailing ; (semicolon) character is only necessary to delimit the entity reference name from subsequent text, if the subquent text starts with characters that could be part of an entity reference name (such as letters, digits or the dot, hyphen, or underscore character).

There's no need to know what a DTD or document type is, if all that's desired is basic assignment and replacement of entities; it's sufficient to assume it has to be there and declares a html doctype as in the example above.

Entity references on a line of its own, when preceded by a blank line, will be processed using markdown rules. In other words, if a line would be parsed as a paragraph (or list continuation paragraph after a blank line), and contains a single entity reference, markdown syntax present in the replacement text for the entity reference will be expanded into HTML according to the rules in this markdown reference.

On the other hand, inline entity reference (entity reference for which the above condition isn't met) are expanded into markdown-produced HTML as-is, and no processing of markdown syntax is performed on it.

Replacement text for entity can contain arbitrary markup such as elements and attributes.

Entities are declared in a line starting with <!entity ... like shown above; the declaration can also be supplied in uppercase ( <!ENTITY ..) characters.

A DTD can contain any number of entity declarations.

For example, the following DTD declares two entities:

<!DOCTYPE html [
	<!ENTITY my_variable_name "my replacement text">
	<!ENTITY my_other_variable_name "another replacement text">
]>

Entity declarations usable from markdown (those without using further SGML features) take either of the following forms in the DTD:

<!ENTITY varname1 "replacement text">

declares varname1 as an "internal entity" and assigns the string "replacement text" to it

<!ENTITY varname2 "replacement text referencing &varname1;">

declares varname2 as an "internal entity" and assigns the specified replacement text to it, where &varname1; is replaced by it's respective replacement text in turn.

Entity replacement text may reference other variables arbitrarily in this way; however, variable references used in entity replacement text must not reference themselves in a circular fashion (neither directly nor indirectly).

<!ENTITY varname3 SYSTEM "filename.txt">

declares varname3 as an external entity;filename.txt may be any local file path; file names are resolved relative to the location of the document declaring the entity.

This may be useful for e.g. creating separate markdown text files for individual chapters of a larger text, and then include all the chapters into a master document; it may also be used to overcome markdown limitations with respect to nesting and indentation.

See also Markdown Entity Examples.