5.1. Markup: A General Overview

A markup language is a system for marking or tagging a document to define the structure of the document. You may add tags to your document to define which parts of your document are paragraphs, titles, sections, glossary items (the list goes on!). There are many markup languages in use today. XHTML and HTML will be familiar to those who author web documents. The LDP uses a markup language known as DocBook. Each of these markup languages uses its own "controlled vocabulary" to describe documents. For example: in XHTML a paragraph would be marked up with the tagset <p></p> while in DocBook a paragraph would be marked up with <para></para>. The tagsets are defined in a quasi dictionary known as a Document Type Definition (DTD).

Markup languages also follow a set of rules on how a document can be assembled. The rules are either SGML (Standard Generalized Markup Language) or XML (eXtensible Markup Language). These rules are essentially the "grammar" of a document's markup. SGML and XML are very similiar. XML is a sub-set of SGML, but XML requires more precise use of the tags when marking up a document. The LDP accepts both SGML and XML documents, but prefers XML.

There are three components to an XML/SGML document which is read by a person.

NoteContent, markup and transformations

Steve Champeon does a great job of explaining how content, markup languages, and transformations all fit together in his article The Secret Life of Markup. Although he is writing from an HTML perspective, the ideas are relevant and there is an example of DocBook markup.