Preparing Word Documents For The Web.

 

When we learned word processing, most of us were told that it was nothing but a glorified typewriter. So that is what we told ourselves. But it is NOT. In fact, it is very different, and understanding t he difference will  help you understand WHY your document will look different, and HOW to fix it.  Instead of blaming Word, we need to look to ourselves for preparing documents using a typewriter frame of mind.

 

At the core, we need to examine the very basic concept of what a paragraph is.

 

What is a paragraph?

 

How many paragraphs do you see in this:

 

 

This is of course a trick questions. In terms of content, there are three paragraphs. BUT in terms of the computer, there are more than what you see,  Many were trained on the computer to think of a word processor as nothing but a glorified typewriter. On a typewriter, we always pressed the return bar twice to advance to the next paragraph. So of course we continue to do that the same way on the computer. Making the paragraphs more visible, we would see

 

 

There are 7 paragraphs!

 

How is this relevant? Word 2003 defaulted to inter-paragraph spacing set equal to 0, to mimic the typewriter. Actually, Word 2007 now defaults to proper paragraph spacing.

 

HTML defaults to proper inter-paragraph spacing, so there is no need in HTML to press that Enter button a second time in order to separate the paragraphs.  This is why, if you save your Word document as HTML, you may see a lot of spacing in between paragraphs. Each time the browser sees the paragraph tag, it adds the space. This is now a problem, since OLS messages need to be in HTML, and you have al l those extra Returns. So long as you keep all those extra paragraph marks, you may continue to have the problem.

 

We now know that we have to take all of our course materials and save them as HTML. We can prepare for this by “cleaning up” the old text formatting that was driven by the concept of the typewriter.

 

This is not the only problem in transitioning to HTML. Tabs won’t work; there is no HTML tag for the tab. First line indent does not work in HTML either. If you have been doing this, the formatting may indeed get lost when converted to HTML. In a single-spaced environment, indenting first line is not necessary to identify paragraphs. Spacing between paragraphs is normally used to visually separate paragraphs, and that it the normal HTML format for a paragraph – WITH inter-paragraph spacing.

 

Both of these problems can be fixed quite quickly with the use of Word’s interesting little-known Search feature. You can search for paragraph marks and tabs and replace them with whatever you want!

 

To fix the paragraphs, Replace ^p^p with ^p (note the lower case p):

 

 

Keep doing that until Word reports there are 0 replacements (though sometimes it gets stuck on 1).  In a few moments all those double Returns will be gone. If you see no spacing between paragraphs in Word, that is OK – you will see the spacing in HTML, and you can fix the paragraph spacing in Word to show that space we want in between paragraphs.

 

To fix Paragraph spacing may be a bit more difficult if you have done a lot of highlighting while formatting. Every time you highlight and apply some formatting, Word will add a new Paragraph Style. Eventually, your document gets bloated with styles in a confusing mess. Assuming you have done NO highlighting, you can see your paragraph Styles using Format>Styles and Formatting. A style-pure document wll have the following styles:

 

Normal controls all paragraphs having the Normal style, which is all text as long as you have not been highlighting.

 

If you change the Normal style, you will fix all paragraphs having the Normal style – that is to say everything.

 

Click on Normal, then Modify

 

 

Then Select Paragraph

 

 

You can now adjust the Normal Style, and that will globally fix the problem of paragraph spacing in this document. It is normal to use paragraph spacing above and below, equal to half the font size. So if you are using Font size 12 points, you can put 6 points above and six points below, This impacts on your whole document, adjusting space between paragraphs without using the extra Return buttons.

 

Tabs also can be cleared quickly by searching for ^t (lower case t) and replacing with nothing!

 

Special notes on math formatting in Word and in HTML.

 

Whenever we insert any kind of math symbol other than the normal linear functions, there are problems with vertical spacing within the paragraph (line spacing, as opposed to  paragraph spacing). Inter-line spacing is called “leading.” That is where typesetters used to pour melted lead to hold the line spacing firm. As soon as you introduce something like a square term, the raised 2 goes up into the leading space. Radicals make it worse. Including a square inside of the radical, well….  For this reason, if you check our professionally prepared textbooks, a cast majority of math expressions get their own paragraph, centered.

 

We must therefore expect some problems when we have math expressions, especially if they are not separated out into their own centered paragraphs. The situation can get worse when you save as HTML. Sometimes you have to take steps manually to adjust vertical spacing in and around math expressions, even if you set the graphic property to be in the Middle of the baseline.

 

You also have to remember that the viewer can change the Window Width, so they may see something very different from you. On your screen, the math expression could be way over on the left, but on another screen, it may appear way over on the right.

 

Consider centering your math expressions as their own paragraphs. If you do keep them inline, manually inspect the content, changing the window width to see it at different widths. Adjust the vertical spacing any way you can.

 

So… before you save your document as HTML

 

Consider clearing all formatting that resulted from highlighting text and applying things like font and color. Remember that many of your favorite fonts will not reside on the reader’s machine.  An easy way to clear all formatting is to use Format>Styles and formatting to view the paragraph styles in your document. Then use Ctrl-A to highlight all text, and select “Clear Formatting” in the styles window.  This alone can clear up a lot of problems!

 

Then….

 

  1. Replace all <Enter><Enter> with <Enter> as above
  2. Replace all Tabs with nothing
  3. Inspect your document for any obvious need for isolating the math expression and setting it up on its own centered line, just as most of our textbooks do.

 

Even then, you should inspect the resulting page in a browser before you decide you are satisfied with it.