Linux
IN THIS CHAPTER
- What Is HTML?
- What Does HTML Look Like?
- Starting an HTML Document
- Links
- Lists
- Changing Character Appearances
n A Few Other Tags 987 Hypertext Markup Language (HTML) is the language used to
write World Wide Web pages. It is quite an easy language, and as several versions
have been introduced over the past few years, it has become quite powerful too. We
can't hope to teach you HTML in a single chapter in this book, but we can give you
an overview of the language and of how to use the basics to produce a simple Web
page or two. A lot of good books on HTML are out there, so if you want to become
very proficient in writing Web pages, we suggest you pick up one of them.
A lot of automated Web page production tools are available on the market, mostly
for Windows and Windows NT machines. These use a WYSIWYG editor to lay out a Web
page, then generate HTML code for you. With this type of tool, you don't need to
know much (if any) HTML. Not very many HTML generators are available for Linux, however.
On top of that, HTML is quite easy to learn, and anyone who is interested in setting
up a Web site for the Internet or an intranet should learn at least the basics. Several
tools are available for Linux that scan HTML code to make sure that it is syntactically
correct, but we won't bother using any in this chapter. If you want to find a syntax
checker, check out one of the Linux support sites, such as http://www.xnet.com,
which is a good starting place to find Linux software. Also, the Linux home site
of http://www.linux.org usually has information about available software.
We'll assume you already know what the World Wide Web (WWW) is. If you've seen
a Web page before, you have seen the results of HTML. HTML is the language used to
describe how the Web page will look when you access the site. The server transfers
the HTML instructions to your browser, which converts those HTML lines of code into
the text, images, and layouts you see on the page.
A Web browser is usually used to access HTML code, but other tools can carry out
the same function. Many kinds of browsers are out there, starting with the grandaddy
of them all, NCSA's Mosaic. Netscape's Navigator is the most widely used browser
right now, although Microsoft is slowly making inroads with its Internet Explorer.
Which browser you use doesn't matter, because all browsers do mostly the same job:
display the HTML code they receive from the server. A browser is almost always acting
as a client, requesting information from the server.
The HTML language is based on another language called SGML (Standard Generalized
Markup Language). SGML is used to describe the structure of a document and allow
for better migration from one documenting tool to another. HTML does not describe
how a page will look; it's not a page description language like PostScript. Instead,
HTML describes the structure of a document. It indicates which text is a heading,
which is the body of the document, and where pictures should go. But it does not
give explicit instructions on how the page will look; that's up to the browser.
Why use HTML? Primarily because it is a small language and therefore can transfer
instructions over a network quickly. HTML does have limitations because of its size,
but newer versions of the language are expanding the capabilities a little. The other
major advantage to HTML is one most people don't think about: it is device independent.
It doesn't matter which machine you run; a Web browser takes the same HTML code and
translates it for the platform. The browser is the part that is device dependent.
That means you can use HTML to write a Web page and not care which machine is used
to read it.
HTML code is pretty straightforward, as you will see. For the most part, it consists
of a bunch of "tags" that describe the beginning and ending of a structure
element (such as a heading, paragraph, picture, or table). For each element, there
should be a beginning and ending tab. A sample HTML page is shown in Figure 53.1.
Don't worry about understanding it all now; you will see this code built up in this
chapter. For now, you need to see only that there are beginning and ending tags around
each element in the structure. (All the screen shots used in this chapter are taken
from either a Windows 95 or a Windows 3.11 machine accessing the Linux server on
which we are writing the HTML code through an Ethernet network. The browser is NCSA's
Mosaic.)
FIGURE
53.1. A simple example of HTML code.
A couple of important things to know about tags as we get started: they are case
insensitive (so you don't have to be careful about matching case), and they are almost
always paired into beginning and ending tags. The most common errors on Web pages
are mismatched or unterminated tags. In many cases, the Web page will appear OK,
but there might be severe formatting problems in some cases. A quick scan of your
HTML code will help solve these types of problems.
-
NOTE: Not all HTML tags
have a beginning and ending tag. A few are single ended, meaning they usually have
just a beginning. Some others are called containers because they hold extra information.
These are not always tagged at both ends.
Tags are written in angle brackets. These brackets signal to the browser that
an HTML instruction is enclosed. A sample HTML code element looks like
<tag_name> text text text </tag_name>
where <tag_name> and </tag_name>
are the starting and ending tags for the text in the middle. The ending tag has the
same name as the starting tag, but is preceded by a slash to indicate the tag's conclusion.
The type of tag describes how the text will look. For example, if the tags are heading
tags, the text will appear larger than normal body text and might be in bold or highlighted
in some way.
How do you write HTML code? There are several ways to do it, the easiest being
to use any ASCII editor. Be sure not to save HTML documents in a proprietary format
like Word documents, because a Web browser can't understand anything but ASCII. Some
specialized HTML editors are available that feature pull-down lists of tags and preview
screens. These can be handy when you are working with very large Web pages, but for
most people a simple editor is more than enough to get started with.
The start of an HTML document usually begins with an instruction that identifies
the document as HTML. This is a tag called <HTML> that is used by
the browser to indicate the start of HTML instructions. Here's a sample chunk of
code from a Web page:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is the first heading on my page. </H1>
This is a bunch of text that is written on my home page. I hope you like it.
</BODY>
</HTML>
You can see that the first and last tags, <HTML> and </HTML>,
mark the start and end of the HTML code. The slash in the second tag indicates the
end of the structure element. These tags should be at the start and end of each HTML
document you write. The <HEAD> and </HEAD> tags mark
a prologue to the file and are often used for just the title and key words. Only
a few tags are allowed inside <HEAD> tags. One of them is the <TITLE>
and </TITLE> pair, which gives the title of the document. The <BODY>
and </BODY> tags mark the start and end of the document's main body.
The <H1> and </H1> tags are for a heading on the page.
This code can be read by any browser. The result is shown in Figure 53.2. As you
can see, the title material is not displayed on the page itself; only the material
between the body tags is shown. The title is used at the top of the browser to show
the page you are logged into. This acts as an identifier.
FIGURE
53.2. The sample HTML code displayed under
Mosaic.
The format of the code shown previously is line-by-line, but it is handled this
way just for readability. You can write everything on one long line, if you want,
because HTML ignores whitespace unless told otherwise. For debugging and rereading
purposes, however, it is helpful to keep the code cleanly organized.
A few other comments about the tags we've used. The <TITLE> tag
always goes inside the header tags (<HEAD> and </HEAD>)
to describe the contents of the page. You should have only a single title for your
page. You can't have other tags inside the head tags. It is useful to pick a short,
descriptive title for your documents so that others who see it will know what they
are accessing.
The <BODY> and </BODY> tags are used to enclose
the main contents of your Web page, and you will probably have only one pair of them.
All text and contents (links, graphics, tables, and so on) are enclosed between body
tags.
There are several levels of heading tags, each of which is like a subheading of
the one higher up. The heading we used in the code shown previously is <H1>,
which is the highest heading level. You can structure your document with many heading
levels, if you want. For example, you could write this bit of code:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is an H1. </H1>
This is a bunch of text.
<H2> This is an H2 </H2>
This is more text.
<H3> This is an H3 </H3>
This is text about the H3 heading.
<H3> This is another H3 </H3>
Here's more text about the H3 heading.
<H2> This is yet another H2 </H2>
Text to do with H2 goes here.
</BODY>
</HTML>
This code is shown in a browser in Figure 53.3. As you can see, the levels of
heading are slightly different, with the higher headings (lower numbers) more distinctive
and bolder. This difference lets you separate your pages into logical categories,
with a heading or subheading for each category. You can use these headings just as
we do when writing a book: H1s can contain H2s, H3s go below H2s, and so on. There
are no rules about mixing headings (you could use only H3s, for example), but common
sense usually dictates how to structure your page.
FIGURE
53.3. Headings with different tags have
different appearances.
What about paragraphs? You can handle paragraphs in several ways, and the rules
have changed with each version of HTML. The easiest approach, though, is to use the
<P> and </P> tags to mark each individual paragraph.
For example, this code uses three paragraph tag pairs:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is an H1. </H1>
<P> This is the first paragraph. It is a really interesting paragraph and
should be read several times because of its content. </P>
<P> Another paragraph. It's not quite as exciting as the first, but then
it's hard to write really exciting paragraphs this late at night. </P>
<P> The closing paragraph has to be strong to make you feel good. Oh well,
we can't always meet your expectations, can we? </P>
</BODY>
</HTML>
The appearance of this code in the browser is shown in Figure 53.4. Note how each
paragraph is distinct and has some whitespace between it and the next paragraph.
What happens if you leave out the <P> and </P> tags?
Because browsers ignore whitespace, including carriage returns, the text is run together
as shown in Figure 53.5. So you should use <P> and </P>
tags to separate paragraphs on your page. Remember that putting lots of blank lines
between paragraphs in your HTML code doesn't matter. Browsers will ignore them and
run everything together.
FIGURE
53.4. The use of paragraph tags separates
text into discrete chunks with whitespace between them.
-
NOTE: Strictly speaking,
you don't need </P> tags to indicate the end of a paragraph because
another <P> would indicate the start of a new one. The <P>
tag is one example of an open-ended tag, one that doesn't need a closure. It is good
programming practice, however, to close the pairs.
FIGURE
53.5. Without paragraph tags, all the
text is run together.
What about comments in HTML code? You might want to embed some comments to yourself
about who wrote the code, what it does, when you did it, and so on. The way to write
a comment into HTML code is like this:
<! - This is a comment ->
The comment has angle brackets around it, an exclamation mark as the first character,
and dashes before and after the comment text. Here's an example of some HTML code
with comments in it:
<HTML>
<!- Written 12/12/95 by TJP, v 1.23->
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is an H1. </H1>
<!- This section is about the important first para tag ->
<P> This is the first paragraph. </P>
</BODY>
</HTML>
Links to other places and documents are an important part of the World Wide Web.
Links are quite easy to write in HTML. They begin with the link tag <A>
and end with </A>. This is an example of an anchor tag, so named because
it creates an anchor for links in your document.
The <A> tag is different from the tags we've seen so far in that
it has some more text inside the angle brackets. Here's a sample link in a document:
<A HREF="page_2.html">Go to Page 2</A>
In this example, the text between the two tags is what is displayed on-screen,
so the user would see the text "Go to Page 2" underlined and usually in
another color to indicate that it is a link. If the user clicks on the link, the
HREF reference in the <A> tag is read and the document page_2.html
is read in to the browser. HREF, meaning hypertext reference, gives the
name of a file or a URL that the link points to.
You can use links either in the body of text or as a separate item on a menu,
for example. The following code shows a link in a paragraph and one on a line by
itself:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is the first heading on my page. </H1>
<P>This is a bunch of text that is written on my home page. I hope you like it.
If you would like to know more about me, choose <A HREF="about_me.html">Tell
me more about You</A> and I'll tout my virtues for you. </P>
<P><A HREF="biblio.html">See Bibliography</A>
</BODY>
</HTML>
When displayed in a browser, this code looks as shown in Figure 53.6. Each link
is underlined in the text to show that it is a link. (Some browsers change the color
of the link text, and others do different things as well.)
FIGURE
53.6. A document with two links in it.
When you are specifying a link to a filename, you must be sure to specify the
filename properly. You can give either relative or absolute paths. Absolute simply
means you give the full pathname, whereas relative means you specify from the current
document's location. For example, these are absolute pathnames (the first in DOS
format, the second in Linux format) in a link:
<A HREF="c:\html\home\home.htm">
<A HREF="\usr\tparker\html_source\home.html">
Relative path references are from the current location and can use valid directory
movement commands. These are valid examples of relative paths in a link:
<A HREF="..\home.htm">
<A HREF="../../html_source/home.html">
A link to another URL is much the same as a link to a document, except that you
give the URL after HREF. For example, this is a link to the Yahoo! home
page:
<A HREF="http://www.yahoo.com">Go to Yahoo!</A>
You can have as many links in your documents as you want. It helps to make the
link description as useful as possible so that users don't end up at pages or sites
they didn't want to access. If you are linking to other sites, you should occasionally
check to make sure that the link is still valid. A lot of home pages change location
or drop off the Web as time goes by, so verify links to avoid annoyed users.
HTML lets you use a few different formats of lists, such as ordered, numbered,
labeled, and bulleted. The lists are surrounded by tags such as <OL>
and </OL> (for ordered list) or <MENU> and </MENU>
(for menus). Each item in the list has its own tag, <LI> or something
similar, to separate it from other items. A few special types of list tags are for
handling glossaries and similar purposes, but we'll ignore them in this HTML overview.
Here's an example of a simple list using the <UL> tags for unordered
lists:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is a list of some books I have written. </H1>
Here are the books I wrote on last summer's vacation.
<UL>
<LI> Mosquitos Bug me
<LI> Fun with Bears
<LI> What to eat when you have no food
<LI> Why is it raining on my vacation?
<LI> Getting lost in three easy lessons
</LI>
</UL>
</BODY>
</HTML>
An unordered list is like a normal list, except that it has bullets and is not
marked by any special numbering scheme. This code is shown in a browser in Figure
53.7, in which you can see the way the bullets line up and the list is presented.
FIGURE
53.7. An unordered list in HTML.
The same code could be written with <OL> and </OL>
tags for an ordered list. An ordered list has numbers in front of the items, as shown
in Figure 53.8. This is the same code as shown previously, except that we changed
the <UL> tags to <OL> tags.
FIGURE
53.8. An ordered list uses numbers rather
than bullets.
Character tags can be used to change the appearance of text on the screen. There
are a few character tags in HTML, including styles (such as italics and boldface)
and logical (which indicate emphasis, code, or other types of text). Forcing character
type changes with style tags is not usually a good idea because different browsers
might not present the text the way you want to. You can use them, however, if you
know that your server will be used only with a particular type of browser and if
you know how the text will look on that browser.
Logical tags are a much better choice because browsers can implement them across
platforms. They let the individual browser decide how italics, for example, will
look. For that reason, we'll concentrate on logical tags; you should use them when
you can. Eight logical tags are in general use:
- <CITE> a citation
- <CODE> code sample (Courier font)
- <DFN> a definition
- <EM> emphasis, usually italics
- <KBD> keyboard input to be typed by the user
- <SAMP> sample text, much like <CODE>
- <STRONG> strong emphasis, usually boldface
- <VAR> a variable name to be displayed as italics or underlined
(usually in code)
The following code shows an example of the use of some of these styles, and the
resultant Web page is shown in Figure 53.9.
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is an H1. </H1>
<P> This is a sample entry that should be <EM> emphasized using EM</EM> and with
the <STRONG> use of Strong </STRONG> emphasis.
</P>
</BODY>
</HTML>
As you can see, this browser (Mosaic) interprets the <EM> tag to
be italics and the <STRONG> tag to be bold. Most browsers perform
this conversion, but other tags might look different with other browsers.
If you want to force character tags, you can do so with <B> and
</B> for boldface, <I> and </I> for
italics, and <TT> and </TT> for typewriter monospaced
font (code).
FIGURE
53.9. The use of logical character tags
changes the way text appears.
To wrap up, a few other tags are useful in general Web page production. The first
is the <PRE> tag, which means the contents between the tags are preformatted
and should be left alone. Between the <PRE> and the </PRE>,
whitespace is important. Use of the <PRE> tag lets you preformat tables
or other content exactly as you want it (subject to wrapping rules in the browser).
For example, the following code has a PRE section in it:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is an H1. </H1>
<P> This is a sample entry that should be <EM> emphasized using EM</EM> and with
the <STRONG> use of Strong </STRONG> emphasis. </P>
<PRE>
This is preformatted
text that should appear
exactly like this in the Browser
</PRE>
</BODY>
</HTML>
As you can see in Figure 53.10, the spacing of the PRE material is retained,
and even the text font is the same as the source (Courier).
FIGURE
53.10. The PRE tags let you preformat
text.
Another tag that is handy is simple. The <HR> tag creates a horizontal
rule across the page. For example, the preceding code can be enhanced with a couple
of <HR> tags like this:
<HTML>
<HEAD>
<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>
<BODY>
<H1> This is an H1. </H1>
<P> This is a sample entry that should be <EM> emphasized using EM</EM> and with
the <STRONG> use of Strong </STRONG> emphasis. </P>
<HR>
<PRE>
This is preformatted
text that should appear
exactly like this in the Browser
</PRE> <HR>
</P>
</BODY>
</HTML>
As you can see in Figure 53.11, two horizontal rules now appear on the page. The
exact appearance of the rule might change with browsers, but the overall effect is
to put a divider on the page.
FIGURE
53.11. Use <HR>
to draw horizontal rules across the page.
Many more HTML tags are available to you, but they are used for special items
such as tables, graphics, and other add-ins. As we mentioned at the start, this chapter
is designed to just give you a quick introduction to HTML, not to teach you everything
there is to know. As you have seen, though, HTML is a fairly simple language to work
with, and you should have a lot of fun designing your own Web pages.
Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.