Search icon Looking for something?

Structured Authoring and XML: Part 2 of 3
2003, Q4 (February 21, 2007)
By Sarah O'Keefe

Editor's Note: Last issue, Sarah O'Keefe introduced the topic of structured authoring and XML (eXtended Markup Language.) In this issue, Sarah covers the impact of structured authoring on a publishing workflow and how roles and responsibilities in a workgroup change when structured authoring is implemented. Next issue, she'll explain how to make a business case for XML, and help you decide whether your publishing workflow should be moved to a structured environment.

Read Part One and Part Three

Thirty years ago, technical writers began to make the transition from typewriters to computer-based writing. Initially, word processing programs allowed writers to store text, but formatting was done in a separate typesetting operation (Figure 1).
Figure 1. First-generation word-processing workflow.
Figure 1. First-generation word-processing workflow.

Next, the transition from dedicated word processing equipment to personal computers led to word processing software with the added ability to control formatting with embedded formatting codes. Authors learned how to write and format their documents (Figure 2).
Figure 2. Text with formatting codes.
Figure 2. Text with formatting codes.%%%Codes let you embed formatting information in a document.

Formatting codes were soon grouped into paragraph styles or tags. Instead of specifying font, font size, alignment, and the like, the author specified a style code, which contained a group of formatting settings (Figure 3).
Figure 3. Text with Styles
Figure 3. Text with Styles%%%Paragraph styles reference a style sheet instead of encoding style explicity for each paragraph.

With paragraph style sheets, a template designer could define the look and feel of documents for an entire workgroup by setting up a formatting template.

In some environments, templates are enforced strictly; in others, individual authors are allowed to customize formatting to suit their document and their personal preferences. When formatting and content development are separated, this "on-thefly" formatting becomes impossible.

In a structured authoring environment, authors create documents by assembling elements and text in an order permitted by the structure definition document (Figure 4).

You might think of structured authoring as being similar to template-based authoring with a strict template. Authors do not assign formatting; the formatting is automatically assigned based on the structure of the document. Formatting may differ for different output media.
Figure 4. Strucutred authoring from the author's point of view.
Figure 4. Strucutred authoring from the author point of view.

Changing perceptions

XML and structured authoring result in a completely different way of looking at information. Instead of the familiar page- and paragraph-based metaphor, structured authoring requires that authors consider information as a hierarchy with a separate formatting layer.

A document's formatting can imply a certain structure—for example, a large, sans-serif font often indicates an important heading—but unstructured files do not describe how paragraphs are related to each other (Figure 5). XML makes it possible to encode structure into a document explicitly (Figure 6).
Figure 5. Formatting can imply structure.
Figure 5. Formatting can imply structure.

Figure 6. XML Captures structure explicitly.
Figure 6. XML Captures structure explicitly

Adding metadata to documents

Metadata is information that describes or classifies other information. A word-processing document usually contains basic metadata, such as the document's title, author, and keywords.

Structured authoring supports metadata with elements and attributes. The element names themselves provide metadata; for example, GlossaryTerm and GlossaryDefinition elements identify content more precisely than Heading3 and Body paragraph tags do. Attributes provide a way to label elements with additional information. Once the attributes are set up, you can then include, exclude, or process information differently based on the value of the attributes.

In structured authoring, you can assign metadata to any element in a document. This allows you to label information with several identifiers, including:
  • Version
  • User level
  • Revision date
  • Author

Element attributes give you much finer control over metadata than the basic file-level information you can store in word-processing documents.

Workflow options

XML and structured authoring do not provide an actual workflow; they must be incorporated into a complete workflow. Before establishing a structured publishing workflow, you should consider the following tasks:
  • Defining content sources
  • Establishing content repositories
  • Implementing content reuse
  • Delivering formatted output

Defining content sources

It's important to examine existing content to establish how it is currently developed and how it will benefit from structure. Other questions include the following:
  • Can all of the content be stored in a single location?
  • Is it necessary to keep different versions of the same content, or can information come from a single source?
  • Who develops the content? How often is it updated?
  • Are there dependencies between different sets of content?

The following table shows a simplified audit of content:
Information productCurrent toolBenefit from strucuture?DependenciesUpdated?
User guideFrameMakerYes Twice a year
Training manualsPowerPointYesUses info from user guidesQuarterly
Online helpDreamweaverYesUses info from user guidesMonthly
Release notesWordNo No
Marketing white papersWordNo No

Establishing content repositories

A content repository or database is not required to work in XML. However, a content repository makes it possible to manage content modules, which allows you to do the following:
  • Search content by elements and attribute
  • Locate content created by a specific author
  • Locate content by topic
  • Identify content chunks that are being used in multiple locations
  • Extract chunks that match certain criteria

XML works very well with content repositories because as a text format, it is easier to manage than proprietary binary formats (for example Word's .doc format or FrameMaker's .fm format). Structured authoring improves consistency across documents. This makes it easier to manage them in a content repository. Content can be automatically chunked at specified element levels, which makes content reuse easier.
Figure 7. Structured authoring with a content repository.
Figure 7. Structured authoring with a content repository.

Implementing content reuse

Content reuse, or single sourcing, doesn't require an XML-based workflow. In XML, though, it's easier to enforce the consistency that's required to make content reuse work. Content reuse means that you develop a particular chunk of information once, and then use it wherever it's needed. Reuse can occur across media—for example, a chunk of content is used in both the printed manual and in the online help for a product. In other cases, you might write a chunk of information that's needed in several different printed books. Reusing that chunk minimizes maintenance and ensures consistency across all of the information products (Figure 8).
Figure 8. Content reuse minimizes the total amount of information being developed.
Figure 8. Content reuse minimizes the total amount of information being developed.

Delivering formatted output Structured authoring separates structure and formatting; this provides both the greatest advantage and the greatest challenge in the structured environment.

Authors are accustomed to working in a visual environment. Requiring them to work solely with tags is impractical, yet providing an approximation of the final output's appearance will bias them toward a particular medium. For example, an authoring environment that looks something like a printed page makes it more difficult to consider how content will function in an online help format.

In the SGML world, software and systems that handle formatting output are well established. They range from proprietary systems that cost $100,000 or more to structured FrameMaker (www.adobe.com), which runs about $800 per seat (it can handle both XML- and SGMLbased publishing).

Commercial systems are also available for XML, but many XML developers are focused on using open technologies, such as Extensible Stylesheet Language (XSL) and formatting objects (FO). These are still quite new, and few people are comfortable with them. By 2005, we will certainly see mature, powerful, GUI-driven software that lets you create XSL files.

Roles and responsibilities

The roles and responsibilities in a typical publishing group change when structured authoring is implemented. This section explains how traditional roles change and describes the new role of the document architect. Note that in a small group, one person may hold any or all of these roles.

Document architect

The document architect defines and implements document structure. The document architect must identify information types and establish their required structure. For example, a document architect would build a structure for a company's training manuals.

Template designer

The template designer is responsible for establishing the look and feel of content deliverables, such as books, online help, e-learning, and so on. In traditional desktop publishing, the template designer is usually a tools expert who can create templates in the appropriate publishing tools. In a structured authoring environment, the designer might also be asked to learn XSL, which transforms XML into other markup, such as HTML or PDF.


In a structured workflow, writers, as always, create content. In the 1990s, writers often were asked to take on additional formatting and publishing responsibilities; in a structured workflow, these tasks are generally automated. The document architect establishes the overall structure of the documents; the template designer implements a look and feel that is automatically assigned based on the structure of the document.

Many writers who are new to structure are uncomfortable with the perceived lack of control over the final document. They have become accustomed to "tweaking" the final output to make it look right. Any implementation of a structured workflow must anticipate some resistance and perhaps even outright hostility from a minority of writers.

This resistance seems misplaced, though. Instead of wrestling with formatting problems, writers can focus on content and organization—typically a better fit for writers' skills and interests than desktop publishing. Working within a structure increases writer productivity and improves the quality and consistency of the final output.

Technical editors

By enforcing correct structure during content development, a structured workflow eliminates the need for editors to check a document for structure. Instead, editors can focus on word choice, grammar, and overall organization. By automating some of the most tedious parts of the editing job, a structured authoring environment makes it possible for editors to do a more thorough edit in the same amount of time.

Editors are also uniquely positioned to assist with structure implementation. Technical editors see more of a total document library than any other member of a publishing team (with the possible exception of production editors). Because of this familiarity with the overall documentation set, editors are excellent resources to assist in establishing an information architecture.

Editors may also have the skills to establish the needed taxonomy for metadata. Taxonomy is a classification system; in a structured workflow, this would mean defining which elements need attributes and what values those attributes could have.

Production editors

With formatting generated automatically by the structure of a document, the workload for production editors should decrease. Many production editors will refocus their efforts on the transformation part of the workflow. Instead of correcting formatting errors after the fact, they will get involved in defining the transformation files that assign formatting based on structure. Production editors will verify that output formatting is working correctly.

A difficult transition?

The transition from "free-form" writing to structure can be difficult. Just as some writers dislike working in a templatedriven environment where formatting is constrained, some dislike the regimentation of structured authoring. Structured authoring offers the business organization compelling advantages, including improved consistency and increased productivity because manual editing and formatting time are decreased. The widespread implementation of structured workflows will likely result in structure being used to deliver information in ways we have not yet even anticipated. It is indisputable, though, that structured information is more valuable than unstructured information. These advantages must be weighed against the arguments from the writers that writing in a structured environment is "less interesting."

Sarah O'Keefe is President of Scriptorium. She can be reached at okeefe at scriptorium dot com. End of article.

More articles like this...
Comments powered by Disqus.