Chapter 5

What You Need to Know About HTML


CONTENTS


As you can see from this outline, we have a lot of ground to cover. But please do not feel intimidated by the length of this chapter; you will soon see that HTML is really quite simple. In fact, if you can use a word processor or even Windows NotePad, you can write HTML. The topics in this chapter are more or less ordered by increasing level of difficulty and (roughly) by the evolution of HTML itself. If things begin to get too advanced, just skip it and come back later. Because perhaps 90 percent of your HTML documents will consist of just the basic stuff, let's start with that.

Tip
Here is a tip that might help you learn HTML by seeing how the experts write it. The next time you are browsing the Web and come across an interesting page, it's easy to find out how the page is written in HTML. Execute the View Source option of Netscape or Explorer on Web pages that can be used as examples for pages you want to create yourself. Then you can follow each line of the source window on that Web page, while referring to this chapter or Appendix A to learn about the HTML syntax.

Basic HTML

HTML files are plain ASCII text. HTML files contain two things: content (which is the message that you want to present) and tags (which are commands to the browser about how the content should appear). The content is completely up to you of course. The tags are simply an agreed-upon set of a few dozen character sequences which start with the less-than symbol (<) and end with the greater-than symbol (>). For example, one tag is <HTML>; another is <TITLE>.

Note
In order to help them stand out from regular text, I will try to write all of the HTML tags in this book using uppercase. However, HTML tags are not case-sensitive, so you will often see them written in lowercase too. It is a matter of your own personal taste how you choose to write them, but one suggestion is to be consistent however you do it.

In most cases, the content must reside between a pair of similar tags. In other words, tags come in pairs to mark the beginning and ending of the text (or content) in between them. The ending tag is the same as the beginning tag, except a forward slash (/) is added like this: </TITLE>. For example:

<TITLE> This is my first HTML document. </TITLE>

When the browser loads an HTML file, the title of the document is displayed at the top of the browser application. The <TITLE> tags are used by the author of the document (sometimes called the page designer) to indicate the name of the file as an aid in referencing it.

HTML Comments

I oversimplified things a bit when I said that files consist only of tags and content. Some of the text in an HTML file might be considered overhead to help manage the whole process. Strictly speaking, not everything between the tags will be a part of your content. One example of what I mean by this is that you can insert comments into the HTML file. Comments are not processed or displayed by the browser; they are ignored. So why have them? Comments allow the page designer to speak to anyone who will edit or maintain the HTML file in the future. This brings us to a subtle but important point: HTML is like a computer language between the author and the browser program, but it also has to support person-to-person communication when viewed in its raw text format.

Comments start and end with two dash characters (--) and must be embedded between an opening tag (<!) and a closing tag (>), like this:

<!-- Comments can be inserted for future reference -->

A new addition in HTML 2.0 allows multi-line comments to be inserted in between the <COMMENT> ... </COMMENT> tags. This is rather handy when you already have a block of experimental HTML code and you want to disable it without deleting it. All you have to do is wrap it in these tags.

A Basic Template

There is a very well defined and simple outline for every HTML file. Take a look at Listing 5.1, where you see the smallest standard HTML document.


Listing 5.1: This is the basic structure of any HTML file.

<HTML>
<HEAD>
<TITLE> Your first HTML document </TITLE>
</HEAD>
<!-- Comments can be inserted for future reference -->
<BODY>
<H1> Hello, World! </H1>
</BODY>
</HTML>

Listing 5.1 is like a template just waiting for you to insert more content and tags as you see fit. First, let's talk about the new tags that are introduced here. The <HTML>...</HTML> pair marks the beginning and ending of a valid HTML document. The <HEAD>...</HEAD> pair sets off the part of the document which the browser uses internally; it isn't really displayed as part of the content in the main window of the browser. Notice that the <HEAD> section is closed before the <BODY> section begins. It is important to understand the proper nesting rules of HTML. Each pair of tags is like a sandwich, and only certain toppings can go in between.

The <BODY> section is the main part of any HTML file. It is where you finally get to display your message. Referring to the concept of nesting again, notice how the <BODY>...</BODY> pair starts and ends within the <HTML>...</HTML> pair. Here is one example of an invalid nesting sequence:

<HTML>...<BODY>...</HTML>...</BODY>

The <H1>...</H1> tags describe a level one heading. Headings are used to mark major sections in a document, sort of like chapter titles and section titles in this book. Browsers are expected to display level one headings with a larger or bolder font than any other heading level. There are six levels of headings in HTML, but <H4>, <H5>, and <H6> are seldom used. You can create heading tags in pretty much the same fashion that you would create indent levels in an outline. Because headings are included within the <BODY> section, they do display in the main window of the browser as a part of the document content. Quite often, the level one heading might simply repeat the title of the document.

From Listing 5.1 you can probably see how easy it would be to write a basic HTML document in NotePad. I'm sure you'll get the picture after you see Figure 5.1, which is the trivial document from Listing 5.1 displayed in the Internet Explorer Web browser.

Figure 5.1: A very simple HTML document.

Notice in Figure 5.1 how Internet Explorer displays the title of the document in the titlebar of the browser and the level one heading, Hello World!, in the main window. So now you see how it takes nine lines of HTML code to write one simple message. Don't despair; it's usually not that bad. Once you have the basic framework in place, you can easily add more content. Let's move on to a few more basic tags.

Other Basic HTML Tags

Because most documents include text in the form of sentences within paragraphs, HTML has a pair of tags to mark the beginning and ending of a paragraph. The <P>...</P> tags serve to separate paragraph blocks with a small amount of vertical space when displayed. Starting with HTML 3.2, there are even some attributes you can use to indicate the horizontal alignment of the paragraph within the margins of the browser.

Note
You might see many HTML pages that don't use the </P> tag. It is optional because the browser will ignore it. The </P> tag at the end of a paragraph has no effect on vertical spacing. Basically, the browser will supply vertical spacing whenever it encounters the <P> tag. The combination of these ideas leads many to think that the <P> tag should go at the end of a paragraph, rather than the beginning, but that is also optional. There is some HTML literature that indicates it is bad style to have paragraph text not enclosed within a pair of tags. However, if you follow an <H1>...</H1> block immediately with a <P> tag, the browser might cause more vertical spacing than you want.

Learning about attributes entails going one level further into the details of HTML. That's the bad news. The good news is it isn't very hard and you are rewarded with the ability to, say, center a paragraph or specify the graphic background for the whole page.

Here is an example of a paragraph with a center alignment attribute (remember, the </P> tag is really optional):

<P ALIGN="CENTER"> This paragraph will be centered horizontally. </P>

Note how the ALIGN attribute is contained within the opening tag and the value of the attribute is specified in quotes. Other possible values for the ALIGN attribute are: "JUSTIFY", "LEFT", and "RIGHT".

The <CENTER> tag is one of the most-used Netscape extensions. All lines of text between the beginning and end of <CENTER> are centered between the current left and right margins. Some people consider it controversial because the ALIGN attribute is the standard way to achieve the same effect.

Line Breaks

Another basic tag is <BR>; BR stands for break. You can use this tag to tell the browser where you would like long lines to wrap. You might be wondering why you would need to do this. Shouldn't the browser break lines wherever you press enter in the text editor? It's actually a lot simpler (or depending on how you look at it, harder) than that. HTML treats all contiguous white space (spaces, tabs, and newlines) as one space. Really, this does us a favor because we have no idea how wide the windows of all of the client browsers are. Each browser will automatically handle issues such as word-wrap, unless we specifically ask to do it differently using <BR>.

Here is an example of where you would use <BR>. Suppose you wanted to print your name and e-mail address, and the date the document was last modified at the bottom of the HTML page (as is very often done). For example:

<P> Written by Scott Zimmerman
e-mail: scottz@sd.znet.com
modified: April 20, 1996</P>

Unless the browser is resized to be a very thin window, this will all appear on one long line-despite the hard carriage-returns in the HTML file. However, you can achieve a more readable effect if you add <BR> tags to the end of each line, like this:

<P> Written by Scott Zimmerman<BR>
e-mail: scottz@sd.znet.com<BR>
modified: April 20, 1996</P>

Note that there is no closing tag with <BR>, such as </BR>. By the way, Netscape also invented the <NOBR> tag, which as you might imagine, will force the browser to keep text together on the same line. The <WBR> tag allows you to tell the browser where a word break can occur, in case you have a long sequence of characters without a space. You need to use <WBR> within a block of <NOBR> text only.

Normally, <BR> just inserts a line break. Netscape has added a CLEAR attribute to <BR>, so CLEAR= "LEFT" will break the line and move vertically down until you have a clear left margin. CLEAR="RIGHT" does the same for the right margin, and CLEAR="ALL" moves down until both margins are clear of images.

Addresses

The idea of putting your address at the end of the HTML document is so common that there is even a special pair of tags just for that purpose. Instead of using <P>...</P> in the example above, you can use <ADDRESS>...</ADDRESS> instead. If you use <ADDRESS>, the text inside will also be italicized by most browsers. You could achieve the same effect with <P><I>...</I></P>, but that requires using an additional formatting tag. (I discuss the italics tag in a moment.) Other than the fact that <ADDRESS> will italicize the text, it really makes no difference whether you use it or not. (The only exception is that Internet Robots might be written in the future to read and process the text within <ADDRESS> blocks in some special fashion.)

Tip
In addition to putting the author's name and the modification date at the bottom of the HTML document, another common courtesy is to provide the user with the ability to send you e-mail just by clicking on your address. This requires the use of the mailto element, dis cussed below.

Horizontal Lines

The <HR> tag has become very popular since its introduction in HTML 2.0. Its purpose is to draw a horizontal rule (line) across the width of the page to help separate document sections. This is used for greater emphasis than the <P> tags.

<HR> comes with several fancy attributes to control the thickness and the width of the line. The width can be expressed as a percentage of the width of the browser window. For lines that are shorter than the width of the browser window, it also makes sense to apply the ALIGN attribute, which can take a value of "LEFT", "RIGHT", or "CENTER".

Finally, the SIZE attribute takes an integer value representing browser pixel units (apparently) to control the thickness of the line. Here is an example of a typical horizontal rule:

<HR WIDTH=50% SIZE=5 ALIGN="CENTER">

Installing WebEdit

Before I get into too many details of HTML, let's take a break and install a tool that will help edit HTML files more conveniently than, say, NotePad. In this section, you'll install and use the WebEdit shareware HTML editor from the CD-ROM. This program lets you build a simple Web page with the click of a button.

Note
WebEdit 2.0 is in beta testing as I write this. The CD-ROM includes the very capable version 1.4. By the time you read this, the final release of version 2.0 should be available at: http://www.nesbitt.com/.

WebEdit was developed by Knowledge Works, Inc. There are a lot of other HTML editors out there, but this one happens to have a very nice combination of low price and powerful, easy-to-use features.

To install WebEdit, copy the file from the CD-ROM to a temporary directory on your hard disk. Unzip the file using WinZip and run the program SETUP.EXE from the same directory.

The WebEdit Home Page Wizard can be used to generate the basic framework of an HTML document. Here are the steps to get your first home page on the Web in a jiffy:

  1. Run WebEdit.
  2. From the main menubar, choose File | Wizards | Home Page Wizard.
  3. You can explore the capabilities of the Wizard, but for now let's keep it simple. Just choose the Finish button and then choose the OK button. This results in a WebEdit screen that looks like Figure 5.2. (I'm using WebEdit version 1.4 here, 2.0 is due out by the time you read this.)
    Figure 5.2: WebEdit version 1.4 after using the Home Page Wizard.
  4. The HTML code that is automatically generated by WebEdit is shown in Listing 5.2. Now all you have to do is customize the code to suit your purposes. For now, why not simply insert your company name in between the HTML <TITLE> tag and the </TITLE> tag. For example, <TITLE> Your Company Name Here </TITLE>. Now do the same thing between the <H1> and </H1> tags. For example, <H1>Your Company Name Here </H1>.

    Listing 5.2. The default.htm file created by the WebEdit Home Page Wizard.

    <HTML>
    <!-------------------------------------------------->
    <!--         This Web page was created by          -->
    <!--         the WebEdit Home Page Wizard          -->
    <!-------------------------------------------------->
    <HEAD>
    <TITLE></TITLE>
    </HEAD>
    
    <!-- Modify the BODY tag to change the background  -->
    <!-- image or color, or the color of the font.     -->
    <!-- Background images, colors and font colors may -->
    <!-- not appear in some browsers.                  -->
    <BODY>
    
    <P ALIGN=center>
    <!-- Use H tags to define headings in your pages -->
    <H1></H1>
    <H2></H2>
    <HR>
    </P>
    
    <HR>
    <P>
    Copyright &copy; 1996<BR>
    This Home Page was created by
    <A HREF="http://www.nesbitt.com/">
    WebEdit</A>,Saturday, April 06, 1996<BR>
    Most recent revision Saturday, April 06, 1996
    </P>
    </BODY>
    </HTML>
    


  5. Now you can save the file in Listing 5.2 as default.htm in the HTML Document Directory of the Web server. (I cover this in much greater detail in Chapter 7, "Running the Intranet Web Server.") Then, any Web browser that visits the Intranet server will retrieve this file automatically.
  6. In order to give it a quick try now, you might want to open your Web browser and choose the command to open a File URL. By opening default.htm as a local file, you can test it in different browsers even without running a Web server.

Creating Lists in HTML

HTML has provisions for creating many types of lists. You have your choice of ordered lists, unordered lists, definition lists, directory lists, and menu lists-though I haven't seen directory lists and menu lists used very often. Ordered lists simply number the items from 1 to n. Unordered lists use bullets to mark each item. Definition lists are used to organize terms and their definitions, like a glossary. Menu lists are available if you want to create a menu of hyperlinks, but you can do the same thing with unordered lists. Directory lists can be used to show a listing of files, but again, unordered lists can also serve the purpose if you prefer.

The first thing you want to learn about lists is how to create a list item. Four of the five list types share one tag for specifying items in a list. That tag is <LI>, for list item. There is no </LI> closing tag because the Web browser will know that the current list item ends either at the beginning of the next list item or at the end of the entire list. (I'll get to that in a moment.)

The <LI> tag is so simple that we can defer an example until we get to the next section on ordered lists. One thing to bear in mind is that the <LI> tag does allow a few attributes, though mostly for rather specialized tasks. You can consult Appendix A, "HTML and CGI Quick Reference," for more information about <LI> attributes.

The next three sections discuss ordered lists, unordered lists, and definition lists. I'll follow this up with some sample code that demonstrates all the list types.

Ordered Lists

The ordered list tags simply serve as a bracket around a list of items which the browser will number. By default, ordered lists start with number 1 for the first item and continue sequentially. The ordered list tags are <OL>...</OL>. Between this pair of tags, you can place as many <LI> tags as you want.

One attribute worth mentioning for the <OL> tag is the TYPE attribute. TYPE allows you to change the default numbering scheme to use uppercase (TYPE="A") or lowercase (TYPE="a") letters instead, or you can use Roman numerals (TYPE="I"). Here is a quick example of a complete ordered list using uppercase alphabetic characters instead of numbers:

<OL TYPE="A"> This is an ordered list.
<LI>The browser will precede this item with A.
<LI>The browser will precede this item with B.
</OL>

Unordered Lists

Unordered lists are just like ordered lists, except unordered lists use the <UL>...</UL> pair of tags and the list items are preceded by bullets instead of numbers. Of course, there are various attributes, which can make it more or less interesting. The PLAIN attribute in the <UL> tag will specify that no bullets should be drawn. Or the TYPE attribute can be used to change the type of bullet that is drawn. Possible values for the TYPE attribute are "CIRCLE", "DISC", and "SQUARE".

HTML 3.2 even includes a new attribute to let you specify a graphic to be used for drawing the bullet. This would involve us in a discussion of Uniform Resource Locators (URLs), so I ask your patience until we reach that section below. In the meantime, perhaps you would like a glimpse of the syntax knowing that the description will soon follow. The SRC attribute takes the value of the URL where the graphic file resides, like this:

<UL SRC="http://host.domain-name.domain/bullet.gif">

Finally, one other HTML 3.2 attribute for unordered lists is WRAP. WRAP tells the browser whether to build a horizontal or a vertical list-for exampe, WRAP="HORIZ" or WRAP="VERT".

Definition Lists

Definition lists can be used whenever you need the style of a glossary. Definition lists are bracketed by the <DL>...</DL> pair of tags. Dictionary list items do not use the customary <LI> tag. Instead, definition list items use both the <DT> tag to introduce a term and the <DD> tag to begin the definition text accompanying that term.

The COMPACT attribute, which is new in HTML 3.2 and not universally implemented, requests the browser to use minimal vertical space between terms. There are no other attributes associated with definition lists. Please see the next section for an example of how to create a definition list.

A List of Lists

I'd like to wrap up this section on lists with one HTML file that demonstrates all the list types. Figure 5.3 shows the file as it appears in Internet Explorer. Notice that unordered lists, menu lists, and directory lists are really not very different. In fact, many browsers will put bullet items on menu lists and directory lists.

Figure 5.3: This sample Web page demonstrates all types of basic lists.

Listing 5.3 shows the HTML code behind Figure 5.3. The filename is LISTS.htm, and it is on the CD-ROM in the HTML directory. You may want to use the file to cut and paste code fragments into your own HTML pages.


Listing 5.3. LISTS.htm is simple HTML file that exercises all types of lists.

<HTML>
<HEAD><TITLE>Sample Lists</TITLE></HEAD>
<BODY><H1>Sample Lists</H1>
<H4>Ordered List</H4>
<OL>
<LI>Ordered A
<LI>Ordered B
</OL>
<H4>Unordered List</H4>
<UL>
<LI>Unordered A
<LI>Unordered B
</UL>
<H4>Menu List</H4>
<MENU>
<LI>Menu Item A
<LI>Menu Item B
</MENU>
<H4>Directory List</H4>
<DIR>
<LI>Directory A
<LI>Directory B
</DIR>
<H4>Definition List</H4>
<DL>
<DT>Title A
<DD>Defintion A
<DT>Title B
<DD>Definition B
</DL>
</BODY>
</HTML>

Logical and Physical Formatting

In the early days of HTML, you weren't supposed to care exactly how the browser rendered your document in terms of the fonts it had available. How could the page designer require a certain font if every browser could be running on a different platform? From Windows 3.1 to Macintosh to OS/2 to UNIX workstations, fonts are handled quite differently. Indeed, non-GUI terminals don't even have a notion of fonts-they only have one font hard-coded in ROM.

There are two types of formatting styles in HTML 2.0 (and below). Logical styles are used to indicate that your text is of a certain typical nature, such as text to appear as if it was typed on a keyboard or text that should appear like computer code. There are several of these and some of them, such as <CODE>, <SAMP>, and <KBD>, might seem redundant (which they are in most browsers).

In contrast, physical styles allow you to request a particular type of formatting regardless of the meaning of the information you want to format-for example, <B> for bold. HTML purists tend to discourage us from using the physical formatting tags because browsers (such as lynx) that don't support them will have no way to know what a useful substitute would be. I haven't had a need to use the physical styles myself, but you would think that if a browser can't support bold, it could easily be programmed to make a reasonable substitution from its available resources when it comes across the <B> tag. That would be similar to the kind of choice that the browser programmer made when he chose a rendering for, say, <KBD>.

Logical Formatting Tags

Here is a list of several logical formatting tags. (You will notice a few similarities.)

Physical Formatting Tags

Here is a list of several physical formatting tags:

Other Formatting Tags

There are a couple of formatting styles that don't really fit into either of the two categories above. The difference with these tag pairs is that they are often used as containers of larger text that might contain other embedded text sections, such as paragraphs and lists.

The <BLOCKQUOTE> tag typically marks a larger section of quoted text than the <CITE> tag. In HTML 3.2, <BLOCKQUOTE> can be shortened to <BQ>. As you would expect, <BLOCKQUOTE> is closed with </BLOCKQUOTE>.

The <PRE> tag marks a section of text that is preformatted in terms of spaces. <PRE>...</PRE> can come in handy when you want to line up columns, such as in forms. An example of preformatted text in a form will be given in the section on forms. Preformatted text is usually displayed in a Courier font. We are still expected to use <P> to mark paragraphs within preformatted text blocks.

URLs, Hyperlinks, and mailto

As large as the World Wide Web is, finding things can sometimes be difficult. Fortunately, everything has an address. Every HTML page on the Web has an address. Even the graphics contained within Web pages have addresses. When you click a hyperlink in one Web page to jump to another Web page, you are simply asking your browser to retrieve a document at a new address.

URLs

Uniform Resource Locators (URLs) are used to locate documents on servers. A client computer uses a URL to request a document to be viewed. Here is the format of a fully mature URL (in practice, not all the components are required):

protocol://machine.domain.name:port/path/document

In case you haven't seen one before, here is an example:

http://www.hqz.com/default.htm

The most widely used protocol on the Web is HTTP. Accordingly, most browsers will default to using HTTP if you do not specify a protocol when you enter the name of a document to be retrieved. You have a choice of the following protocols:

It's interesting to observe that although many of these protocols were around long before the Web was invented, they have been adopted into the Web through the simple invention of URLs which support them (actually becoming a part of HTTP). The underlying Internet protocols themselves haven't changed. But their resources, previously available only via command-line programs, can now be accessed easily through a GUI browser. Because of this, the resources offered through these protocols are now more useful than ever before.

Note
For more information about URLs in general, please see this URL (no pun intended): http://www.w3.org/hypertext/WWW/Addressing/Addressing.html

Hyperlinks

Hyperlinks in Web pages are really nothing more than a URL. The browser will underline the text of the link so the user can see that it is available as a clickable item. But the question you are probably asking in this chapter is "How do I create hyperlinks in the HTML code that I write?"

The answer is to use a pair of tags (<A>...</A>) for creating anchors. The <A> tag takes a very common attribute called HREF, which is where you specify the URL of the document to be retrieved. Between the anchor tags, you get to place any descriptive text you want the user to see. The browser will underline the descriptive text; the URL itself will not be shown in the document. (Note that most browsers will display the URL in the status bar when a user holds the mouse pointer over the anchor without clicking.) Perhaps an example will help to clear this up:

<A HREF="http://www.ibm.com"> Jump to the IBM home page! </A>

The anchor tags got their name because they started out as a way to name sections within a document that the reader might want to refer to often. This involves two steps. First, use the NAME attribute to mark an anchor point in the HTML file. Then you can write a hyperlink referring to the name of that anchor when you want to provide a way to make a quick jump back to it. This is very useful for creating a Table of Contents of a long HTML document.

Here is the syntax to create the anchor:

<A NAME="anchor"> A good place to come back to. </A>

And here is the syntax to create the hyperlink back to the anchor:

<A HREF="#anchor"> Jump back to the other place. </A>

Note the number sign (also called a pound sign or a sharp sign) in the HREF used to refer to the anchor. You can also refer to an anchor in another document anywhere on the Web, assuming you know the name of a valid position in that document. You can invent any name you want when you are creating an anchor-you don't have to use the word anchor, as I did.

mailto

I'd like to return to the example given earlier using the <ADDRESS> tags. Now that we know about URLs and hyperlinks, we can polish off the task of creating a standard footer in our HTML documents. When the browser sees that the user clicks on a mailto URL, a mail message dialog box is opened with the recipient's e-mail address already filled in. All the user has to do is fill in the subject and the message and click the Send button. It's a good way to elicit comments on your Web pages. It certainly makes it more likely that you will get feedback than if the user is required to write down the e-mail address and then go launch his mail client and start a new message from scratch.

Here's an HTML code fragment that could be used as a standard document footer:

<HR><ADDRESS>
This document last modified: April 20, 1996<BR>
By Scott Zimmerman<BR>
e-mail: <A HREF="mailto:scottz@sd.znet.com">scottz@sd.znet.com</A>
</ADDRESS>

Notice how the <A HREF...> tag is closed before the text that will appear underlined in the browser. The </A> tag is used to mark the end of clickable area in the browser.

Note
Unlike HTTP and FTP URLs, it is a common syntactical mistake to include double-slash characters in a mailto URL. The correct form of a mailto URL is mailto:user@somedomain.com.

Figure 5.4 shows a sample HTML page using this code. You can see that the status bar is indicating the URL destination of the hyperlink being pointed to.

Figure 5.4: This sample Web page demonstrates the mailto URL.

Graphics

One of the neat things you can do with hyperlinks is include graphics in your HTML pages. Many pages on the Web use a graphic image technique to create imagemaps, which allow the user to click a region of the graphic and then have another Web page retrieved. Server-side imagemaps are beyond the scope of this book; however, I can refer you to the book Web Site Construction Kit for Windows NT. (It contains an entire chapter dealing with the subject of imagemaps.) I'll bring up the topic of client-side imagemaps later in this chapter, because they are an exciting new aspect to HTML 3.2.

You use the <IMG> tag to embed graphics in a Web page. There is no closing tag, such as </IMG>, because the attributes take care of everything. The SRC attribute is where you give the URL of the image document to be displayed. (Most browsers can handle a JPEG or a GIF image.) You do not need to specify a complete URL if the image is in the same directory as the HTML file. The ALIGN attribute allows you to specify how text will flow around the image (more about that later).

Let's dive into this with a simple example. Suppose you want to include a picture of your company guard dog on your corporate home page. Okay, maybe that's a stretch-the point isn't what's in the image, but rather, how do you write the HTML to display any image. Listing 5.4 is the HTML code which demonstrates a sample Web page containing a graphic image.


Listing 5.4. This HTML code demonstrates the IMG tag to embed graphics.

<HTML>
<HEAD>
<TITLE>Boston's Story</TITLE>
</HEAD>
<BODY>
        <H2>Welcome to Boston's Life</H2>
        Hi, my name is Boston. Here is a picture of me:
        <IMG ALIGN="top" SRC="boston.jpg">
        <H5>A Brief Autobiography</H5>
        <UL COMPACT>
               <LI>Born in Bonsall, CA March 5, 1995.
               <LI>Got my shots and went to new home in San Diego, April 30, 1995.
               <LI>Now spend time catching Frisbees and looking out the window.
        </UL>
        <HR width=80%>
        <ADDRESS>Okay, so e-mail me: boston@hqz.com</ADDRESS>
</BODY>
</HTML>

You might notice in Listing 5.4 that the ALIGN="TOP" attribute was used in the <IMG> tag. You can see the effect of this in Figure 5.5, which shows the sample code as it displays in Internet Explorer 2.0.

Figure 5.5: This sample Web page demonstrates an inline image.

More About the ALIGN Attribute

An image specified as ALIGN="LEFT" will float down and over to the left margin (into the next available space there), and subsequent text will wrap around the right side of that image. Likewise, for ALIGN="RIGHT", the image aligns with the right margin and the text wraps around the left. Here are the optional values for the ALIGN attribute:

ALIGN="TOP" aligns itself with the top of the tallest item in the line.
ALIGN="TEXTOP" aligns itself with the top of the tallest text in the line.
ALIGN="MIDDLE" aligns the baseline of the current line with the middle of the image.
ALIGN="ABSMIDDLE" aligns the middle of the current line with the middle of the image.
ALIGN="BASELINE" aligns the bottom of the image with the baseline of the current line.
ALIGN="BOTTOM" is identical to ALIGN="BASELINE".
ALIGN="ABSBOTTOM" aligns the bottom of the image with the bottom of the current line.

A Netscape Extension: <IMG WIDTH=value HEIGHT=value>

The WIDTH and HEIGHT attributes were added by Netscape to <IMG> mainly to speed up display of the document. If the author specifies these, the viewer of the document won't have to wait for the image to be loaded over the network and its size calculated. These attributes are not yet widely supported by other browsers.

The <IMG ALT> Attribute

It's a good idea when embedding images in HTML to use the ALT attribute to provide a description of the image. ALT gets its name because the text serves as an alternative to the image for browsers that don't have graphic capability. Here is a simple example demonstrating how a hyperlink in one document can be tied to a click an image or on the anchor text:

<A HREF="second.htm">
<IMG SRC="second.gif" ALT="[description of the picture]">
Jump to second.htm </A>

It's important to understand why there are two filenames in this code. The first filename is in the HREF attribute; this is how mouse clicks on the image are hyperlinked to documents, which could be anywhere on the Web. The second filename is the image to be displayed. In case the browser doesn't support graphics, it will instead display the text "[description of the picture]". This is also a good example of how to nest images inside of anchors.

Multimedia, MIME, and Interlaced Images

Because this is the section on graphics, I should mention that hyperlinks can point to other file types as well. You can embed audio and video into your Web pages also. This gets us into the subject of MIME, and the rest of the book will be focusing very heavily on that so I will defer it for now.

Another technique that you might have seen on the Web is the use of interlaced images. Normally, an image must be completely downloaded before it can be displayed. With interlaced images, your browser is able to display the whole image in fuzzy detail at first, but as the download progresses the image becomes more and more clear. The advantage of this is that the user can choose to cancel the download if it does not appear to be the image of interest. Transparent images are a similar technique which allows the background of the HTML page to show through the gaps in the picture. You can use Paint Shop Pro to create both transparent and interlaced images, but alas, that is also beyond the scope of this book. One source for a complete treatment of Transparent and Interlaced GIFs is Web Site Construction Kit for Windows NT.

HTML 2.0 Forms

The possibilities for creative form processing in HTML are endless. Perhaps as you glance at Figure 5.6, your imagination will lead you to an idea about a form that you or your office could deploy on the Intranet.

Figure 5.6: This Intranet form is one cool way to save paper.

The form in Figure 5.6 was created with the HTML code in Listing 5.5. The file is named vacation.htm on the accompanying CD-ROM.


Listing 5.5. This vacation form demonstrates all the HTML form tags.

<HEAD>
<TITLE>Intranet Time Away From Work</TITLE>
</HEAD>
<BODY>
<CENTER>
<H3>Intranet Time-Away-From-Work Form</H3>
</CENTER>
<FORM METHOD=POST ACTION="savedata.exe">
<PRE>
<BR>Your Name:      <INPUT NAME="name" TYPE=text SIZE=50 MAXSIZE=50>
<BR>E-mail address: <INPUT NAME="email" TYPE=text SIZE=50 MAXSIZE=50>
<BR>Dates Absent:   <INPUT NAME="item" TYPE=text SIZE=50 MAXSIZE=50>
<BR>Special Notes:  <TEXTAREA NAME="reason" ROWS=2 COLS=55 MAXLENGTH=150></TEXTAREA>
</PRE>
<P>Type of Absence:
<INPUT TYPE=radio NAME="holiday" VALUE="holiday" Checked>Holiday
<INPUT TYPE=radio NAME="vacation" VALUE="vacation">Vacation
<INPUT TYPE=radio NAME="sick" VALUE="sick">Sick
<INPUT TYPE=radio NAME="leave" VALUE="leave">Leave of Absence
<INPUT TYPE=checkbox NAME="pay" VALUE="pay" chECKED>With Pay
</P>
<P>Your Department:
<SELECT NAME="department">
<OPTION>Accounting
<OPTION>Administration
<OPTION>Engineering
<OPTION>Marketing
<OPTION SELECTED>Sales
<OPTION>Support
</SELECT>
<INPUT TYPE=submit VALUE="I'm Outta Here!">
</P>
</FORM>
<HR>
<ADDRESS>
This document last modified: April 20, 1996<BR>
By Scott Zimmerman<BR>
e-mail: <A HREF="mailto:scottz@sd.znet.com">scottz@sd.znet.com</A>
</ADDRESS>
</BODY>
</HTML>

I'll have much more to say about HTML forms in Chapter 19, "Getting the Most out of HTML with CGI." For now, Table 5.1 will function as a quick introduction.

Table 5.1. HTML tags for creating forms.

<FORM> ... </FORM> These tags appear within the <BODY> of the HTML file. Everything you code in between them will comprise the form. In addition to other HTML tags, the tags described in this section are valid within a <FORM> block.
<TEXTAREA> ... </TEXTAREA> These tags cause the browser to present a multi-line text edit box on the form. You can control the width and height in character units with the ROWS and COLS properties. As with the <INPUT> and <SELECT> tags, the NAME property is used to identify the data that is returned to the server for CGI processing or sent in the e-mail body if mailto is used.
<INPUT> ... </INPUT>These tags define a single-line text box for strings or integers; a checkbox; a radio button; a pushbutton; and a few other varieties of controls. The TYPE attribute is what determines the style of <INPUT> control.
<SELECT> ... </SELECT>These tags define a listbox of items from which the user can choose an item. You may use several <OPTION> tags within the <SELECT> block to present the available items.
<OPTION>This tag indicates a selectable item within a <SELECT> block. You might also specify one of the values to be selected by default using the SELECTED attribute.

Creating Searchable Indexes

The <ISINDEX> tag is a way to let the user submit a word for the server to search for. This tag is usually placed in the <HEAD> section of an HTML file. When a document containing <ISINDEX> is first retrieved, the browser will automatically create a textbox and prompt the user to enter a keyword. When the user presses Enter, the same URL will again be requested on the server, but this time it will have supplemental text following a question mark character (which serves as a parsing delimiter). The user's keyword will be tacked onto the end of the URL (following a question mark) to enable the server to process it, search for it, and return custom results instead of the original HTML document. (See Chapter 21, "Indexing Your Intranet with WAIS," for a complete discussion of how to do this.)

Netscape has added the PROMPT attribute to <ISINDEX>. PROMPT has been created so the document author can specify what message is to appear before the text input field of the index. Without a custom prompt, the default is that standard message you may have come across before: This is a searchable index. Enter search keywords:

Further discussion of <ISINDEX> gets us into the topic of CGI, which is covered extensively in Chapters 19-21.

Server-Side Includes

Server-Side Includes (SSIs) are a way to have the server process your HTML file in a custom manner each time before the document is sent to a browser. This technique enables you to write more dynamic HTML. Your pages need not be delivered only as you had created them, but instead they may contain real-time information.

Here's how it works. The browser retrieves a document with a special filetype, such as .shtml. This special filetype serves as a clue to the server that the Web server should read and parse the file before sending it back to the client. This feature exacts a small performance penalty, so you can exclude ordinary documents by giving them filenames ending in .htm or .html. As the server reads the HTML code it searches for HTML comments with SSI commands embedded. When it finds an SSI command, the server replaces that part of the original HTML text with the output of the command before sending it to the client.

There are six basic SSI commands, though some Web servers support several additional SSI commands. Unfortunately, IIS 1.0 only supports the first of these:

Here is an example of an SSI command to insert the contents of another HTML file into the current file, replacing the SSI comment itself:

<!--#include file="standard.html" -->

The potential application of this should be obvious. Say you wish to display a corporate logo in the HTML footer of all your Intranet Web pages. You can write the HTML code that will be pasted into the <BODY> section and save it as a separate file, say standard.html. Then include that file into every document, saving you the trouble of having to modify every document when the logo or the company contact information is changed. You'll only have to edit one file and it will instantly be reflected to the next browser to retrieve any file which includes it.

Server-Push and Client-Pull

The idea behind both of these techniques is to give page designers a way to force a particular document to be updated repeatedly. Although the two techniques are quite different, the effects can be very similar. Let's start with an overview of server-push.

Server-Push

Server-push uses a variation of MIME called multipart/x-mixed-replace that enables each piece of data to replace the data that preceded it. This is a useful trick because the data (in this case the HTML page) isn't necessarily replaced; more likely it is just an updated version of the same page. This can be used to create an animation effect by loading a sequence of slightly different graphics.

The server will maintain the connection with the browser, and the server determines when the successive data parts are sent. If the browser is able to retrieve the subsequent images or HTML pages too quickly (which is entirely possible on an Ethernet-based Intranet), the server can insert an appropriate pause by simply delaying the transmission of each part.

Client-Pull

Client-pull is perhaps the simpler of the two techniques for dynamic self-updating documents. This technique relies on a new feature of HTML 3.2, the <META> tag. <META> should be placed within the <HEAD> section at the top of the HTML file.

Here is an example of a client-pull document that will reload every 30 seconds:

<HTML>
<HEAD>
<META HTTP-EQUIV="refresh" CONTENT="30">
<TITLE>Sample Client-Pull</TITLE>
</HEAD>
<BODY>
<H1>This document will automatically reload itself in 30 seconds.</H1>
</BODY>
</HTML>

The <META CONTENT> attribute also has the capability to retrieve a different document when the timer expires. Here is an example of how a URL can be embedded to cause the current page to load a second page:

<META HTTP-EQUIV="refresh" CONTENT="30; URL=http://domain.com/second.htm">

Tip
In this section, we have barely scratched the surface of the capabilities of server-push and client-pull. For a much more thorough treatment of these subjects, please see this URL at Netscape:
http://www.netscape.com/assist/net_sites/pushpull.html

Tables in HTML 3.0

Tables were the most requested feature for HTML 3.0 and 3.2. The IETF decided to stick to a powerful but simple model for creating nice looking tables. See Figure 5.7 for a very simple example of a table created in HTML 3.2, as it appears running in Microsoft Internet Explorer Version 2.0 (available on the accompanying CD-ROM).

Figure 5.7: A simple HTML 3.2 table displayed in Internet Explorer.

The table in Figure 5.7 was created by the short HTML code shown in Listing 5.6.


Listing 5.6. This HTML 3.2 code creates a simple table.

<HTML>
<HEAD>
<TITLE>Sample table</TITLE>
</HEAD>
<BODY>
<H1>Sample Table</H1>
<TABLE BORDER>
<TR>
<TD>Apples</TD>
<TD>25</TD>
</TR>
<TR>
<TD>Oranges</TD>
<TD>10</TD>
</TR>
</TABLE>
<HR>
Last Updated: April 7, 1996
</BODY>
</HTML>

The <TABLE> tag begins the table. Here, I have added the BORDER attribute in the opening tag. Each <TR> tag defines a table row. The <TD> tags define table data elements, as you read across the table. The </TABLE> tag ends the definition of the table.

Frames in HTML 3.2

One of the most talked-about features that Netscape added to Navigator 2.0 is frames. As frames are yet another addition to HTML 3.2, they are not yet supported in many browsers, including Internet Explorer version 2.0. A framed Web page is much like a Web page within a Web page. Each frame can be sized separately by dragging the border between them. Indeed, each frame can even behave as a separate browser. There are many practical uses for frames.

Note
Once you leave a frames page and access a page that doesn't have them (whether it's on another Web site or the same one), your frames disappear.

Frames are created by replacing the traditional <BODY>...</BODY> tags with a new pair of tags: <FRAMESET>...</FRAMESET>. The main body of the HTML file then becomes quite hollow, because the only thing you put in the <FRAMESET> block are references to other Web pages that contain the actual content to be displayed in each window. You see, you have to create a main HTML file which will govern the layout of the window panes which will load other HTML files.

Because few browsers support frames yet, it is advisable to use the <NOFRAMES>...</NOFRAMES> block as a replacement for what you would have put in the <BODY>...</BODY> block if you weren't using frames. This way, new browsers will use the <FRAMESET> tag and old browsers will still have something to chew on in the main HTML file. One technique is to simply provide an <A HREF> link in the <NOFRAMES> block to each of the framed pages.

An example should help to clear this up. Suppose you want to provide a small horizontal frame window across the bottom of your home page so that whenever someone follows a link in the top page, the bottom page will still remain. This could be a useful way to provide an omnipresent map of your site. Listing 5.7 shows the basic structure of a framed page.


Listing 5.7. This HTML 3.2 code demonstrates the <FRAMESET> tag.

<HTML>
<HEAD>
<TITLE>Example of Frames</TITLE>
</HEAD>
<FRAMESET ROWS="80%,20%">
<NOFRAMES>
<H1>Example of Frames</H1>
<P>This Web page is best viewed using Netscape</P>
</NOFRAMES>
<FRAME SRC="cell1.htm">
<FRAME SRC="cell2.htm">
</FRAMESET>
</HTML>

Notice that the <BODY> tag is replaced by the <FRAMESET> tag in a framed Web page. The <FRAMESET> block will either use the <NOFRAMES> option or it will create two horizontal windows to load other Web pages. The <FRAMESET> tag in this example dedicates the top 80 percent of the browser window to load the file cell1.htm. The bottom 20 percent loads the file cell2.htm.

Although many frames pages on the Web use the new capability primarily to provide navigational bars or banners, you can use them in your Intranet applications to make your ready reference pages available to your customers at all times. You can keep a list of important pages on your Intranet constantly visible-and clickable-regardless of where else in your applications your customer wanders.

Tip
When you're viewing Web pages with frames, the browser's Back button on the toolbar doesn't act as you're accustomed. Hitting it takes you back to the last non-frames page you've visited. To get back your Back capability in a frames document, press your right mouse button. A small pop-up menu will appear with options for moving backward and forward within the frames.

Please see this URL for all the details and the syntax for creating HTML frames:

http://home.mcom.com/assist/net_sites/frame_syntax.html.

Client-Side Imagemaps

HTML 3.2 has a new graphic technique that is an alternative to server-side imagemaps. A client-side imagemap accomplishes several things. First, it enables the browser to know where a mapped image hyperlink will go so this information can be displayed to the user before he clicks on it. Second, this technique works for local files which are opened without using an HTTP server. Third, the server does not have to be queried for the target hyperlink of a region after a click; the browser can go straight to the new destination. Finally, the file format of server-side imagemaps is unfortunately inconsistent among different servers, but client-side imagemaps should be portable among all standard browsers.

Two new tags and one new attribute are involved in building client-side imagemaps. You must use the USEMAP attribute in the <IMG> tag instead of, or in addition to, the ISMAP attribute.

Here is an example of an <IMG> tag that is mapped by the client using client-side imagemaps:

<IMG SRC="mystuff.gif" USEMAP="#mystuff map">

In this case, the name "mystuff map" should appear elsewhere in the HTML file in a <MAP>...</MAP> block. The purpose of the <MAP> block is to define the coordinates of the regions that can be clicked in the image. The <MAP> block, which must also reside within the <BODY> or <FRAMESET> block, can contain any number of <AREA> tags. The shapes in the image can be circles, rectangles, or polygons. Listing 5.8 is the code which completes the example above.


Listing 5.8. This HTML 3.2 code demonstrates client-side imagemaps.

<HTML>
<HEAD>
<TITLE>Example of Client-Side Imagemaps</TITLE>
</HEAD>
<BODY>
<H1>Here is a client-side image with two hyperlinks.</H1>
<IMG SRC="mystuff.gif" USEMAP="#mystuff">
<MAP NAME="mystuff">
<AREA SHAPE=RECT COORDS="0,0,200,200" HREF="image2.htm">
<AREA SHAPE=RECT COORDS="201,201,400,400" HREF="image3.htm">
</MAP>
</BODY>
</HTML>

Note
For the syntax of the <MAP> and <AREA> tags, the IETF proposal paper, and additional examples, please see this page at Microsoft:
http://www.microsoft.com/ie/author/htmlspec/imagemap.htm

Persistent Cookies

The cookie technology invented by Netscape, and now a part of HTML 3.2, allows a browser to "remember" data about a Web page the user has visited so that the browser can return the data to the server each time it is revisited. Cookies probably got their names because computer scientists have long talked of the concept of magic cookies, which for lack of a better name, are like nuggets of numbers that can perform special feats when used in a proper manner inside the right program.

The need for persistent cookies in HTML is because HTTP connections only last but a fleeting moment; HTTP is a stateless protocol. A Web server cannot store substantial information about each client that visits because the client may not ever reappear, or it might next retrieve some other random Web page on the same server. Basically, it should be up to the client to tell the server about the status of the client.

The oft-cited reason for the invention of cookies is the shopping cart example. If you bounce around a department store Web page "picking up" several items to pay for at the "cash register," the check-out Web page is going to need to know what you have in your basket. Cookies do the trick. Cookies are sent by the server in the HTTP header. The browser simply sends the same data back whenever it visits that server. It is important for you to know that cookies are an available technique in Web page design, but as it depends somewhat on CGI programming, the syntax details are a little beyond the scope of this chapter.

Note
For a thorough discussion of persistent cookies, please see the Netscape specification at
http://home.netscape.com/newsref/std/cookie_spec.html

Style Sheets in HTML 3.2

HTML is intended for document content markup on the server-side. By design, it is not intended to provide direct control over the exact appearance of the document in the browser on the client side. One reason is portability. There is no way of controlling which browser and which font every client will have available. This state of affairs can be frustrating to graphic artists. Style sheets are intended to give page designers, or even the user of the browser, the opportunity to govern how the page is displayed in terms of fonts, colors, and other elements.

The IETF is still considering exactly how to implement style sheets in HTML 3.2. For the latest information, please visit the HTML specification:

http://www.ietf.cnri.reston.va.us/ids.by.wg/html.html

Another great place to search is

http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/

Writing Math in HTML 3.2

Most of us probably haven't missed being able to write the calculus integral symbol into our HTML code. However, considering the Web got started in a physics lab, it's only natural that these features would eventually find their way into HTML. Unfortunately, these 3.0 features are just as much a moving target as style sheets. To learn more about math support in HTML 3.2, please see the section later in this chapter titled "HTML 3.2 Resources."

Character Entities

Because HTML is limited to 7-bit ASCII characters, it isn't ordinarily possible to include special symbols in documents. However, HTML has the concept of escape codes to provide for this. You write the ANSI number of the non-ASCII character in between &# and a semicolon. A couple of symbols are so widely used on business pages, Netscape even invented a name for them:

&reg;-the Registered Trademark symbol; also available as &#174;
&copy;
-the Copyright symbol; this entity works just like &#169;, but now you don't have to look up the magic number anymore.

HTML 3.2 Resources

HTML 3.2 has several important new features. To name a few: frames, cookies, tables, figures (<FIG>) as a substitute for the image tag (<IMG>), support for mathematical formulae, banners, divisions, footnotes, and style sheets. You will find a quick reference to HTML in Appendix A, including tags that are specific to HTML 3.2.

Tip
An excellent place on the Net to find information about the differences between HTML 2.0 and HTML 3.2 is "How to Tame the Wild Mozilla."
http://webreference.com/html3andns/
(Mozilla is how Netscape refers to itself in the HTTP request message.) This URL also includes late-breaking news about the Web. Definitely check it out!

Note that the specification of HTML 3.2 is still a draft. Although some parts have been stable for some time, others will undoubtedly change. The best way to track the changes to HTML is to go online to any of these sites:

Quick Tips on HTML Style

HTML style and HTML style sheets might sound similar, but they are different subjects. Although style sheets provide some degree of control over the appearance of Web documents, the subject of style is more a topic of what to do and what not to do in Web page design (whether style sheets are used or not).

Web purists make remarks such as, "It's the content, not the presentation." That philosophy notwithstanding, Web designers obviously do have a great deal of control over many appearance factors, such as when to use a hyperlink, a bullet list, or a heading level 2. You might be tempted to think that some choices are arbitrary, but be careful. If your pages demonstrate disregard for certain accepted Web standards or are hard to read, you might not get any repeat visitors. There is a lot to be said for having good style.

Here are a few resources concerning HTML style:

Summary

This chapter has covered a lot of ground. I concluded the tour of the basic preparations for building an Intranet, and now I'm sure you are ready to get to the real work at hand. Part II is all about building the server that will serve as the core of the Intranet.