Search icon Looking for something?

Structured Authoring and XML: Part 3 of 3
2004, Q1 (February 21, 2007)
By Sarah O'Keefe

Editor's Note: Previously, Sarah O'Keefe introduced the topic of structured authoring and XML (eXtended Markup Language) and covered its impact on a publishing workflow. She covered how roles and responsibilities in a workgroup change when structured authoring is implemented. In this final installment of her series, Sarah explains how to make a business case for XML. She gives tips on how to decide whether to move your publishing workflow to a structured environment.

Read Part One and Part Two

Not every content-creation group will benefit from structured authoring and XML. Sometimes, the expense of implementation outweighs the benefits realized, especially in smaller groups with less total page count.

There are a number of imperatives that lead to implementation of structured authoring and XML. The following are some of the most common scenarios:
  • Enabling content exchange between incompatible applications
  • Extracting information from databases for publication
  • Reducing content duplication and reusing information
  • Extracting information based on structure and metadata
  • Improving formatting consistency
  • Reducing author learning curve
  • Improving compliance with required document structure, especially in regulated industries

Enabling content exchange XML is platform- and vendor-neutral, which makes it an excellent choice as an intermediate format. It is quite common in a single company for two departments to standardize on different, incompatible publishing tools. As a result, the information developed in one department cannot be reused in another department without extensive manual conversion and reformatting work. This leads to "content silos," where each department owns a separate, private set of information, often with significant amounts of content duplication.

Structured authoring and XML can eliminate this silo mentality without necessarily forcing either group to implement the preferred software tools of the other group. Each group authors in its preferred application, and then exports to XML for interchange. Intensive coordination is required to ensure that the structures used by each group are compatible.

To make this work, each group must use a publishing tool that supports XML import/export (Figure 1).
Figure 1. Breaking down content silos.
Figure 1. Breaking down content silos.

Extracting information from databases for publication

XML provides a useful intermediate format for content that is exported from a database. Most commonly, database publishing is used for parts catalogs, directories, and similar large data sets. The records are extracted from the database and marked up as XML; the XML is then processed to produce the final output (Figure 2).
Figure 2. Database publishing with XML.
Figure 2. Database publishing with XML.

Traditionally, database publishing has required customized, application-specific solutions. XML offers a generalized and significantly less expensive approach, which better separates the data generation task from the output formatting task. Reducing content duplication and reusing information Imposing structure results in improved consistency of content. This, combined with an XMLbased content repository, makes it easier to manage content. Once content is under control, you can search for particular chunks of information and reuse them. The alternative, in a disorganized environment, is that content is written several times. The first writer creates a piece of information. A second writer needs that same information but does not know that the first writer already created it. The second writer rewrites the information. Now, there is duplicated content, which is probably inconsistent. As the two information sources are maintained and updated, they diverge further.

Minimizing the total amount of content being created and modified is one of the most powerful ways to reduce the total cost of content development. Creating what is needed just once requires that all of the writers can locate content as necessary.

Reusing content results in decreased costs, especially as documents are updated from one version to the next. If documents are also translated, significant cost savings will be realized in that effort. The cost savings from translation alone can justify the implementation of an entire structured workflow.

Extracting information based on structure and metadata

Once information is structured and stored with metadata attached to it, it becomes much easier to search for specific information. Consider a structured environment in which each major topic has the following attributes:
  • Author name
  • Revision date
  • Product/topic
  • User level
  • Platform

Based on these attributes, you could perform a search that extracts all of the topics written in the past year that are Windows-specific and for administrators . Improving formatting consistency In a structured environment, formatting is handled automatically based on the structure. "Formatting by rule" greatly improves consistency across a document set—authors or production editors are not required to remember, for example, that in a list of bullets, the first bullet gets a special paragraph tag. Instead, the software applies these types of formatting rules automatically.

Reducing author learning curve

Instead of learning to format documents using a specific publishing tool, writers focus on creating and organizing content. The process of formatting information is automated, which greatly reduces the need for writers to act as their own desktop publishers. However, writers do need to learn to assign useful metadata tagging to documents.

Improving compliance with required document structure United States government contractors, especially those who work with the military, have long been required to deliver documents using specific standards. The aerospace industry also has specific rules for documents such as aircraft maintenance manuals. In these areas, XML's more complex parent, SGML, has been used heavily.

Industries that are heavily regulated and are required to create extensive reports can benefit greatly from the discipline and accountability created by structured authoring workflows. For example, pharmaceutical research companies and drug manufacturers create voluminous amounts of content, which they must provide to government regulatory agencies.

Does your organization need structure?

Armed with basic information about structured authoring, the next logical question is whether your publishing workflow should be moved to a structured environment. In some scenarios, the decision is simple:
  • Content interchange. XML provides an excellent medium for content interchange. If you need to move content from one format to another, structured content will allow you to automate and systematize the process.
  • Enforcing uniformity across a document set. Defining a structure lets you apply and enforce consistency across documents. Larger workgroups, higher turnover, and complex formatting requirements for output all make the automation provided by a structured workflow more appealing.
  • Content management. XML files are in text format, which lends itself to setting up a repository for storage. You can also divide files into small chunks and place them in the repository. The larger the volume of content being produced, the more useful and compelling content management becomes.

Structure is not a panacea for all content development workflows. In some environments, implementing structure will be more trouble than it is worth. The following are some examples where structure probably doesn't make sense:
  • Fiction and other creative writing. Fiction is unlikely to fit into a predefined structure, and it probably doesn't require the type of reuse and management that technical content does.
  • Low-value content. If you do not plan to reuse content, or if a document does not contain sufficient information, the effort of structuring it is probably not worth it. Day-to-day business communications, such as email and memos, generally fall in this category. Be on the lookout, though, for highervalue content, such as complex proposals, that could be reused.
  • Small sets of technical content. Organizations with thousands of pages of content need to consider structure. Organizations with tens of thousands or more pages almost certainly need both structure and content management. An organization that only manages 100 pages of content doesn't need elaborate structure and content management. Somewhere between 100 and 1,000 pages, there is a point where the value of structure outweighs the implementation cost.

Implementing a structured workflow

If you decide to establish a structured workflow, expect a lengthy and possibly painful transition. In an environment where formatting templates are already established and enforced consistently, the addition of structured templates should be relatively straightforward. A workgroup making a transition from a "free-form" authoring environment where templates aren't used to structured authoring should expect major disruption. Structured authoring will completely change the authoring experience.

A minimal implementation process requires that you do all of the following:
  • Analyze content and develop structure definitions
  • Design a new publishing workflow
  • Roll out the new workflow
  • Train users
  • Set up a maintenance process

Analyzing content and developing structure definitions

Document analysis requires different skills from template design. Instead of creating formatting tags based on a document's appearance, the document architect must identify content elements. Often, formatting is a visual indicator of structure (for example, headings are usually larger than surrounding text), but structure elements may be needed in areas where formatting does not provide a cue. The document architect begins by reviewing existing documents and analyzing their structure. Any structure that is developed must also take into account new document types that might be needed.
Designing a new publishing workflow

Once a structure definition is established, it is time for the most controversial part of the implementation process— choosing tools. The following tools will be needed:
  • Authoring tool for creating structured documents
  • Formatting tool in which automated formatting definitions are set up
  • Content management system to keep track of content

A detailed discussion of tools is beyond the scope of these articles. The content management system is likely to be by far the most expensive component of a structured workflow and requires the most extensive analysis.

Rolling out the new workflow

A rollout will require two major tasks: notifying users about what is coming and installing the software, servers, and systems that make everything work. Larger numbers of users will add complexity, as will different location types. For example, rolling out a new system to users in two offices would be relatively simple. Integrating hundreds of users in remote home offices adds a degree of difficulty.

Training users

Users need training in several different knowledge areas:
  • Structured authoring concepts
  • Basic XML concepts
  • Creating usable metadata
  • Working with a content management system

If writers are not accustomed to creating content for multiple output formats, they may also need training on how to write modular, delivery-neutral information.

Setting up a maintenance process

Once the structured workflow is established, it is critical to set up a process that allows authors to request changes to the structure and the metadata framework.


Structured authoring offers the prospect of automated formatting and better management of information. New skills are required both to implement a structured workflow and to work within it. Treating content as complex data that can be managed and manipulated requires a significant shift in mindset from authors, editors, and other publishing professionals.

Sarah O'Keefe is the President of Scriptorium Publishing. She can be reached at okeefe at scriptorium dot com. End of article.

More articles like this...
Comments powered by Disqus.