Search icon Looking for something?

Managing Valuable Chapter Content
2006, Q2 (February 19, 2007)
By Meredith Kinder and Sheila Loring

One of the ways that STC chapters can improve their value proposition to present and prospective members is to make available some of the best content that is created by and owned by the chapter members. Our chapter, the STC Carolina Chapter, is a mid-sized chapter and has members from many different professional specialties and dispersed across central and eastern North Carolina. We have been offering our quarterly newsletter online, in PDF, for several years. This year we recognized that some of the feature articles in past issues have valuable content that is still relevant and worth providing to our members in a medium more flexible than PDF. By providing that content in HTML on dedicated pages on our chapter Web site, we made it more accessible to members who might not want to dig through years worth of PDFs.

While most chapters presently put issues of their newsletter in PDF on their Web site, few offer their best articles in any easy-to-find manner. Some chapters offer their newsletters in HTML but do not have a search engine for their site content, nor do they pull out the feature articles as a separately navigable resource. Other chapters have switched to an HTML newsletter, but do not offer those issues on their Web site. Since 2001, we have been publishing large and colorful newsletters with relevant articles that could stand on their own. Here is a brief sample list of some of the feature articles:
  • Web Design for Small Companies: Pretend that You Have a Programmer, by Kim Flint
  • Mentoring as a Two-Way Street, by Andy Smith
  • Wield the Power of the Written Word, by Michael Uhl
  • Improving Technical Reviews, by Alexia Idoura

There are also articles on structured-authoring, human factors, medical writing, and quality metrics, as well as ones on balancing parenting with work, learning XML, and foretelling professional trends. All of these are as useful today as they were when they were written, some last month, some two years ago.

By finding a catchy title or scanning the list of articles for a particular author, users can more easily find the content of some timeless articles. In addition, by having the articles in HTML, the content of those articles can be included in the search engine on the chapter's Web site. So when a user types in "mentoring" or "editing," the results can include links to the article contents. This is content that makes the chapter Web site more useful and is also another way of acknowledging the talent within your chapter membership.

Converting New Articles to HTML

We now have a process for generating and publishing the newsletter that includes saving some of the articles to HTML for posting on the Web site. Collaboration between the newsletter team and the webmaster has made the conversion process easy.

The articles are written and edited in Word. The production editor copies and pastes the articles into an Adobe InDesign template, refines the layout and formatting, and converts the files to PDF. While the print newsletter is being developed, the webmaster converts the original Word file to HTML. The conversion and cleanup involves the following steps:
  1. Convert the Word files to HTML using a shareware program called Word Cleaner. Word Cleaner converts Word files to HTML format and strips Microsoft Office styles from the HTML file. Microsoft Word does let you save a document as filtered HTML, however, some Office styles still remain in the files. For example, most paragraphs are wrapped in the <p class=MsoNormal> element. Empty paragraphs look like this in the HTML:
    Word also embeds an inline stylesheet in the HTML file, even if you save the document as filtered HTML. Depending on the variations in formatting, the stylesheet can range from 10 to over 40 lines. Here's a short excerpt:
    This extraneous code is deleted so that all web pages are formatted consistently and file sizes are kept to a minimum.
  2. Scan the HTML files for structural and formatting issues that must be manually corrected. Copy the title of the article into the <title> and <h1> elements, wrap subheadings in <h2> and <h3> elements, and put bulleted lists in <ul>/<ol> and <li> elements.
  3. Type in the header and footer includes commands. We store header and footer navigation, CSS and RSS feed links, and other frequently used content in separate files. The commands look like this:
    <!#include virtual="/includes/header1.shtml" >
    <!#include virtual="/includes/header2.shtml" >
    <!#include virtual="navigation.shtml" >
    <!#include virtual="/includes/footer.shtml" >
    When a web page with the .shtml extension is displayed, the server embeds code from the included files in the web page. This method enforces consistency in navigation and formatting throughout the web site.
  4. If the images weren't embedded the source file, insert them in the HTML. Add alternate text and the image width and height to the <img> elements.
  5. Add a link to the article on the Newsletter Archives web page and upload all files.

Single sourcing workflow
Single sourcing workflow

That's our single-sourcing workflow on a shoestring budget and using minimal volunteer resources. The STC Single-Sourcing SIG should be proud of us; we now publish a single set of content to PDF and to HTML on the Web.

Converting Archived PDFs to HTML

We had a backlog of over three years of legacy content that took a bit of effort to transform into HTML, but one volunteer tackled that in a matter of about three working days spread over two weeks. Here's the general process:
  1. Open the PDF file in Adobe Reader and select the text and paste it into a text editor (in this case, Notepad). This makes sure that there are no special characters or formatting — just text.
  2. Do this for several articles at a time, say for an entire issue. Most issues had three or four good articles.
  3. Remove forced line returns needed by copying the text into a MS Word document, searching for "^p^p", and replacing them with a unique and noticeable string, such as "&-&-&". This delineates the article paragraphs with the unique string (like &-&-&).
  4. Remove single paragraph marks by searching for "^p" and replacing them with a blank space.
  5. Restore the pseudo-article paragraph marks "&-&-&" with "</p><p>". This gives you pretty much all the text in enclosed in <p> elements.
  6. Copy the text back into to a text editor and save as a web page.
  7. Clean up the HTML (see steps 2 and 3 in "Converting New Articles to HTML").
  8. Either export the graphics from Acrobat (if you have Acrobat 6 or higher) or screen capture them from the original PDF. Crop the graphics, if necessary, and save them as JPEGs.
  9. Send the files to the webmaster.

The process usually took about an hour for six or eight articles. While the process sounds labor intensive, the number of articles that can be handled this way went quickly. If you have a web site management tool like Dreamweaver or HomeSite, or a development environment tool, such as Visual Studio, you can edit many HTML files at once, making the task that much easier. The tools that let you do quick search-and-replace are very helpful.

Final Thoughts

As technical communicators, we should practice what we preach. The value of content should not be underestimated, especially when the content can help your members and be offered as a valuable asset to all those in the profession. Imagine the wealth of content that could be available if all the chapters in STC made their best newsletter articles more readily available on Web sites.

In the future, this content can be stored in a database and the HTML pages created dynamically as needed. Many businesses are doing this with their technical content. As the amount of content grows, a content management system becomes essential. For now, we'll start with a few manually created HTML pages. If nothing else, the chapter gives volunteers a place to practice some of these ideas of single-sourcing, with a manageable amount of material and a friendly, deadline-free environment. Once you see what the requirements are for posting on the Web and for publishing in PDF, then you can improve your process to handle both steps.

We encourage other chapters to adopt this process of making content available to members. If you want to know details of how we output to HTML or how we handled our legacy content, or provide a search engine of our site content feel free to contact us. If you have experience doing this for your chapter or some good ideas about how to further improve this, let us know.

For more information, visit our Web site, http://www.stc-carolina.org. To download our PDF newsletter and read the articles in HTML format, select Newsletter in the chapter information list.

Meredith can be reached at Meredith dot Kinder at sas dot com. Sheila can be reached at loring at scriptorium dot com. End of article.

More articles like this...
Comments powered by Disqus.