Just about everyone on the planet knows about the World Wide Web. It's the most
talked about aspect of the Internet. With the Web's popularity, more system users
are getting into the game by setting up their own WWW servers and home pages. There
are now sophisticated packages that act as Web servers for many operating systems.
Linux, based on UNIX, has the software necessary to provide a Web server.
You don't need fancy software to set up a Web site, only a little time and the
correct configuration information. That's what this chapter is about. We look at
how you can set up a World Wide Web server on your Linux system--whether for friends,
your LAN, or the Internet as a whole.
The major aspect of the Web that attracts users and makes it so powerful, aside
from its multi-media capabilities, is the use of hyperlinks. A hyperlink lets one
mouse click move you from document to document, site to site, graphic to movie, and
so on. All the instructions of the move are built into the Web code.
There are two main aspects to the World Wide Web: server and client. Client software,
such as Mosaic and Netscape, is probably the most familiar. However, many different
Web client packages other than these two are also available, some specifically for
X or Linux.
There are three primary versions of Web server software that will run under Linux.
They are from NCSA, CERN, and Plexus. The most readily available system is from NCSA,
which also produces Mosaic. NCSA's Web system is fast and quite small, can run under
inetd or as a standalone daemon, and provides pretty good security. For
this chapter, we will use NCSA's Web software, although you can easily use either
of the other two packages instead (although the configuration information will be
different, of course).
If you have obtained a library of source code or binaries from an FTP or BBS site,
you probably have to untar and uncompress them first. (Check with any README
files, if there are any, before you do this; otherwise you may be doing this step
for nothing.) Usually, you will proceed by creating a directory for the Web software,
and then changing into it and expanding the library with a command such as this:
zcat httpd_X.X_XXX.tar.Z | tar xvf -
The software is often named by the release and target platform, such as httpd_1.5_linux.tar.Z.
Use whatever name your tar file has in the preceding line. Installation
instructions are sometimes in a separate tar file, such as Install.tar.z,
which you have to obtain and uncompress with the following command:
zcat Install.tar.z
Make sure you are in the target directory when you issue these commands, though,
or you will have to move a lot of files. You can place the files anywhere; however,
it is often a good idea to create a special area for the Web software that can have
its permissions controlled, such as /usr/web, /var/web, or a similar
name.
Once you have extracted the contents of the Web server distribution and the library
files are in their proper directories, you can look at what has been created automatically.
You should find the following subdirectories:
cgi-bin |
Common gateway interface binaries and scripts. |
conf |
Configuration files. |
icons |
Icons for home pages. |
src |
Source code and (sometimes) executables. |
support
|
Support applications.
|
If you don't have to modify the source and recompile for Linux (because your software
is the Linux version), you can skip the configuration details mentioned in the rest
of this section. On the other hand, you may want to know what is happening in the
source code anyway, because you can better understand how Linux works with the Web
server code. If you obtained a generic, untailored version of the NCSA Web server,
you have to configure the software.
Begin by editing the src/Makefile file to specify your platform. There
are several variables that you have to check for proper information:
AUX_CFLAGS |
Uncomment the entry for Linux (identified by comment lines and symbols, usually).
|
CC |
The name of the C compiler (usually cc or gcc). |
EXTRA_LIBS |
Add any extra libraries that need to be linked in (none are required for Linux).
|
FLAGS
|
Add any flags you need for linking (none are required for most Linux linkers).
|
Finally, look for the CFLAGS variable. Some of the values for CFLAGS
may be set already. The following are valid values for CFLAGS:
-DSECURE_LOGS |
Prevents CGI scripts from interfering with any log files written by the server software.
|
-DMAXIMUM_DNS |
Provides a more secure resolution system at the cost of performance. |
-DMINIMAL_DNS |
Doesn't allow reverse name resolution, but speeds up performance. |
-DNO_PASS |
Prevents multiple children from being spawned. |
-DPEM_AUTH |
Enables PEM/PGP authentication schemes. |
-DXBITHACK |
Provides a service check on the execute bit of an HTML file. |
-O2
|
Optimizing flag.
|
It is unlikely that you will need to change any of the flags in the CFLAGS
section, but at least you now know what they do. Once you have checked the src/Makefile
for its contents, you can compile the server software. Issue this command:
make linux
If you see error messages, check the configuration file carefully. The most common
problem is the wrong platform (or multiple platforms) selected in the file.
Once the software is in the proper directories and compiled for your platform,
it's time to configure the system. Begin with the httpd.conf-dist file.
Copy it to the filename httpd.conf, which is what the server software looks
for. This file handles the httpd server daemon. Before you edit the file,
you have to decide whether you will install the Web server software to run as a daemon,
or whether it will be started by inetd. If you anticipate frequent use,
run the software as a daemon. For occasional use, either is acceptable.
There are several variables in httpd.conf that need to be checked or
have values entered for them. All the variables in the configuration file follow
the syntax
variable value
with no equal sign or special symbol between the variable name and the value assigned
to it. For example, a few lines would look like this:
FancyIndexing on
HeaderName Header
ReadmeName README
Where pathnames or filenames are supplied, they are usually relative to the Web
server directory, unless explicitly declared as a full pathname. You need to supply
the following variables in httpd.conf:
AccessConfig |
The location of the access.conf configuration file. The default value is
conf/access.conf. You can use either absolute or relative pathnames. |
AgentLog |
The log file to record details of the type and version of browser used to access
your server. The default value is logs/agent_log. |
ErrorLog |
The name of the file to record errors. The default is /logs/error_log. |
Group |
The Group ID the server should run as (used only when server is running as a daemon).
Can be either a group name of group ID number. If a number, it must be preceded by
#. The default is #-1. |
MaxServers |
The maximum number of children allowed. |
PidFile |
The file where you want to record the process ID of each httpd copy. The
default is/logs/httpd.pid. Used only when the server is in daemon mode.
|
Port |
Port number httpd should listen to for clients. The default port is 80.
If you don't want the Web server generally available, choose another number. |
ResourceConfig |
The path to the srm.conf file, usually conf/srm.conf. |
ServerAdmin |
The e-mail address of the administrator. |
ServerName |
The fully qualified host name of the server. |
ServerRoot |
The path above which users cannot move (usually the Web server top directory or usr/local/etc/httpd).
|
ServerType |
Either standalone (daemon) or inetd. |
StartServers |
The number of server processes that are started when the daemon executes. |
TimeOut |
The amount of time in seconds to wait for a client request, after which it is disconnected
(the default is 1800, which should be reduced). |
TransferLog |
The path to the location of the access log. The default is logs/access_log. |
TypesConfig |
The path to the location of the MIME configuration file. The default is
conf/mime.conf. |
User
|
Defines the user ID the server should run as (only valid if running as daemon). Can
be name or number, but must be preceded by # if a number. The default is
#-1.
|
The next configuration file to check is srm.conf, which is used to handle
the server resources. The variables that have to be checked or set in the srm.conf
file are as listed here:
AccessFileName |
The file that gives access permissions (default is .htaccess). |
AddDescription |
Provides a description of a type of file. For example, an entry could be AddDescription
PostScript file *.ps. Multiple entries are allowed. |
AddEncoding |
Indicates that files with a particular extension are encoded somehow, such as AddEncoding
compress Z. Multiple entries are allowed. |
AddIcon |
Gives the name of the icon to display for each type of file. |
AddIconType |
Uses MIME type to determine the icon to use. |
AddType |
Overrides MIME definitions for extensions. |
Alias |
Substitutes one pathname for another, such as Alias data /usr/www/data.
|
DefaultType |
The default MIME type, usually text/plain. |
DefaultIcon |
The default icon to use when FancyIndexing is on (the default is /icons/unknown.xbm).
|
DirectoryIndex |
Filename to return when the URL is for your service only. The default value is index.html.
|
DocumentRoot |
Absolute path to the HTML document directory. The default is /usr/local/etc/httpd/htdocs.
|
FancyIndexing |
Adds icons and filename information to the file list for indexing. The default is
on. (This option is for backward compatibility with the first release of HTTP.) |
HeaderName |
The filename used at the top of a list of files being indexed. The default is Header.
|
IndexOptions |
Indexing parameters (including FancyIndexing, IconsAreLinks, ScanHTMLTitles,
SuppressLastModified, SuppressSize, and SuppressDescription).
|
ReadmeName |
The footer file is displayed with directory indexes. The default is README.
|
Redirect |
Maps a path to another URL. |
ScriptAlias |
Similar to Alias but for scripts. |
UserDir
|
Directory users can use for httpd access. The default is public_html.
Usually set to a user's home page directory. Can be set to DISABLED.
|
The third file to examine and modify is access.conf-dist, which defines
the services available to WWW browsers. Usually, everything is accessible to a browser,
but you may want to modify the file to tighten security or disable some services
not supported on your Web site. The format of the conf-dist file is different
than the two preceding configuration files. It uses a set of "sectioning directives"
delineated by angle brackets. The general format of an entry is
<Directory Dir_Name>
...
</Directory>
and anything between the beginning and ending delimiters (<Directory>
and </Directory>, respectively) are directives. It's not quite that
easy, because there are several variations that can exist in the file. The best way
to customize the access.conf-dist file is to follow these steps for a typical
Web server installation:
- 1. Locate the Options directive and remove the Indexes option. This prevents
users from browsing the httpd directory. Valid Options entries are discussed
shortly.
2. Locate the first Directory directive and check the path to the cgi-bin
directory. The default path is /usr/local/etc/httpd/cgi-bin.
3. Find the AllowOverride variable and set it to None (this
prevents others from changing the settings). The default is All. Valid values
for the AllowOverride variable are discussed shortly.
4. Find the Limit directive and set it to whichever value you want.
The Limit directive controls access to your server. The following are valid values
for the Limit directive:
allow |
Allows specific host names following the allow keyword to access the service.
|
deny |
Denies specific host names following the deny keyword from accessing the
service. |
order |
Specifies the order in which allow and deny directives are evaluated
(usually set to deny,allow but can also be allow,deny). |
require
|
Requires authentication through a user file specified in the AuthUserFile
entry.
|
The Options directive can have several entries, all of which have a different purpose.
The default entry for Options is
Options Indexes FollowSymLinks
You removed the Indexes entry from the Options directive in the first step of
the preceding customization procedure. These entries all apply to the directory the
Options field appears in. The valid entries for the Options directive are
All |
All features enabled. |
ExecCGI |
cgi scripts can be executed from this directory. |
FollowSymLinks |
Allows httpd to follow symbolic links. |
Includes |
Include files for the server are enabled. |
IncludesNoExec |
Include files for the server are enabled but the exec option is
disabled. |
Indexes |
Enables users to retrieve server-generated indexes (doesn't affect precompiled indexes).
|
None |
No features enabled. |
SymLinksIfOwnerMatch
|
Follows symbolic links only if the user ID of the symbolic link matches the user
ID of the file.
|
The AllowOverride variable is set to All by default, and this should
be changed. There are several valid values for AllowOverride, but the recommended
setting for most Linux systems is None. These are the valid values for AllowOverride:
All |
Access controlled by a configuration file in each directory. |
AuthConfig |
Enables some authentication routines. Valid values: AuthName (sets authorization
name of directory); AuthType (sets authorization type of the directory,
although there is only one legal value: Basic); AuthUserFile (specifies
a file containing user names and passwords); and AuthGroupFile (specifies
a file containing group names). |
FileInfo |
Enables AddType and AddEncoding directives. |
Limit |
Enables Limit directive. |
None |
No access files allowed. |
Options
|
Enables Options directive.
|
After all that, the configuration files should be properly set. While the syntax
is a little confusing, reading the default values shows you the proper format to
use when changing entries. Next, you can start the Web server software.
With the configuration complete, it's time to try out the Web server software.
In the configuration files, you made a decision as to whether the Web software will
run as a daemon (standalone) or will start from inetd. The startup procedure
is a little different for each method (as you would expect), but both startup procedures
can use one of the following three options on the command line:
-d |
The absolute path to the root directory of the server files (used only if the default
location is not valid). |
-f |
The configuration file to read if not the default value of httpd.conf. |
-v
|
Displays the version number.
|
If you are using inetd to start your Web server software, you need to make
a change to the /etc/ services file to permit the Web software. Add a line
similar to this to the /etc/services file:
http port/tcp
Here, port is the port number used by your Web server software (usually 80).
Next, modify the /etc/inetd.conf file to include the startup commands
for the Web server where the last entry is the path to the httpd binary:
httpd stream tcp nowait nobody /usr/web/httpd
Once this is done, restart inetd by killing and restarting the inetd
process or by rebooting your system. The service should be available through whatever
port you specified in /etc/services.
If you are running the Web server software as a daemon, you can start it at any
time from the command line with the following command:
httpd &
Even better, add the startup commands to the proper rc startup files.
The entry usually looks like this:
# start httpd
if [ -x /usr/web/httpd ]
then
/usr/web/httpd
fi
substituting the proper paths for the httpd binary, of course. Rebooting
your machine should start the Web server software on the default port number.
To test the Web server software, use any Web browser and type the URL field
http://machinename
where machinename is the name of your Web server. If you see the contents of the
root Web directory or the index.html file, all is well. Otherwise, check
the log files and configuration files for clues as to the problem.
If you haven't installed a Web browser yet, you can still check to see if the
Web server is running by using telnet. Issue a command like this, substituting
the name of your server (and your Web port number if different than 80):
telnet www.wizard.tpci.com 80
You should get a message similar to this if the Web server is responding properly:
Connected to wizard.tpci.com
Escape character is `^]'.
HEAD/HTTP/1.0
HTTP/1.0 200 OK
You'll also see some more lines showing details about the date and content. You
may not be able to access anything, but this shows that the Web software is responding
properly.
Having a server with nothing for content is useless, so you need to set up the
information you will share through your Web system. This begins with Uniform Resource
Locators (URLs), which are addresses to file locations. Anyone using your service
only has to know the URL. You don't need to have anything fancy. If you don't have
a special home page, anyone connecting to your system will get the contents of the
Web root directory's index.html file, or failing that, a directory listing
of the Web root directory. That's pretty boring, though, and most users want fancy
home pages. To write a home page, you need to use HTML (HyperText Markup Language).
A home page is like a main menu. Many users may not ever see it because they can
enter into any of the subdirectories on your system, or obtain files from another
Web system through a hyperlink, without ever seeing your home page. However, many
users want to start at the top, and that's where your home page comes in. A home
page file is usually called index.html. It is usually at the top of your
Web source directories.
Writing an HTML document is not too difficult. The language uses a set of tags
to indicate how the text is to be treated (such as headlines, body text, figures,
and so on). The tricky part of HTML is getting the tags in the right place, without
extra material on a line. HTML is rather strict about its syntax, so errors must
be avoided to prevent problems.
In the early days of the Web, all documents were written with simple text editors.
As the Web expanded, dedicated Web editors that understand HTML and the use of tags
began to appear. Their popularity has driven developers to produce dozens of editors,
filters, and utilities--all aimed at making a Web documenter's life easier (as well
as to ensure that the HTML language is properly used). There are HTML editors for
many operating systems.
You can write HTML documents in many ways: You can use an ASCII editor, a word
processor, or a dedicated HTML tool. The choice of which method you use depends on
personal preference and your confidence in HTML coding, as well as which tools you
can easily obtain. Because many HTML-specific tools have checking routines or filters
to verify that your documents are correctly laid out and formatted, they can be appealing.
They also tend to be more friendly than non-HTML editors. On the other hand, if you
are a veteran programmer or writer, you may want to stick with your favorite editor
and use a filter or syntax checker afterward.
-
NOTE: One of the best
sites to look for new editors and filters is http://www.ncsaa.uiuc.edu/SDG/Software/Mosaic/Docs/FAQ-Software.html,
which contains an up-to-date list of offerings.
You can use any ASCII editor to write HTML pages, including simple screen-oriented
editors based on vi or emacs. They all enable you to enter tags
into a page of text, but the tags are treated as words with no special meaning. There
is no validity checking performed by simple editors, because they simply don't understand
HTML. There are some extensions for emacs and similar full-screen editors
that provide a simple template check, but they are not rigorous in enforcing HTML
styles.
If you wish to use a plain editor, you should carefully check your document for
the valid use of tags. One of the easiest methods of checking a document is to import
it into an HTML editor that has strong HTML tag checking. Another easy method is
to simply call up the document on your Web browser and carefully study its appearance.
You can obtain a dedicated HTML authoring package from some sites, although they
are not as common for Linux as for DOS and Windows. If you are running both operating
systems, you can always develop your HTML documents in Windows, and then import them
to Linux. There are several popular HTML tools for Windows, such as HTML Assistant,
HTMLed, and HoTMetaL. A few of the WYSIWYG editors are also available for X, and
hence run under Linux, such as HoTMetaL. Some HTML authoring tools are fully WYSIWYG,
while others are character-based. Most offer strong verification systems for generated
HTML code.
An alternative to using a dedicated editor for HTML documents is to enhance an
existing WYSIWYG word processor to handle HTML properly. The most commonly targeted
word processor for these extensions is Word for Windows, Word Perfect, and Word for
DOS. Several extension products are available in varying degrees of complexity. Most
run under Windows; although a few have been ported to Linux.
The advantage to using one of these extensions is that you retain a familiar editor
and make use of the near-WYSIWYG features it can provide for HTML documents. Although
it can't show you the final document in Web format, it can be close enough to prevent
all but the most minor problems.
CU_HTML is a template for Microsoft's Word for Windows that gives a very-near-to
WYSIWYG view of HTML documents. Graphically, CU_HTML looks much the same as Word,
but with a new toolbar and pull-down menu item. CU_HTML provides a number of different
styles and a toolbar of oft-used tasks. Tasks such as linking documents are easy,
as are most tasks that tend to worry new HTML document writers. Dialog boxes are
used for many tasks, simplifying the interface considerably.
The only major disadvantage to CU_HTML is that it can't be used to edit existing
HTML documents if they are not in Word format. When CU_HTML creates an HTML document,
there are two versions produced, one in HTML and the other as a Word DOC file. Without
both, the document can't be edited. An existing document can be imported, but it
loses all the tags.
Like CU_HTML, ANT_HTML is an extension to Word. There are some advantages and
disadvantages of ANT_HTML over CU_HTML. The documentation and help is better with
ANT_HTML, and the toolbar is much better. It also has automatic insertion of opening
and closing tags as needed.
One system that has gained popularity among Linux users is tkWWW. This system
is a tool for the Tcl language and its Tk extension for X. tkWWW is a combination
of a Web browser and a near-WYSIWYG HTML editor. Although originally UNIX-based,
tkWWW has been ported to several other platforms, including Windows and Macintosh.
-
NOTE: tkWWW can be obtained
through anonymous FTP to harbor.ecn.purdue.edu in the dir-ectory /pub/tcl/extensions.
Copies of Tcl and Tk can be found in several sites depending on the platform required,
although most distributions of Linux have Tcl and Tk included in the distribution
set. As a starting point, try anonymous FTP to ftp.aud.alcatel.com in the
directory tcl/extensions.
When you create a Web page with tkWWW in editor mode, you can then flip modes
to browser to see the same page properly formatted. In editor mode, most of the formatting
is correct, but the tags are left visible. This makes for fast development of a Web
page.
Unfortunately, tkWWW must rely on Tk for its windowing, which tends to slow things
down a bit on average processors. Also, the browser aspect of tkWWW is not impressive,
using standard Tk frames. However, as a prototyping tool, tkWWW is very attractive,
especially if you know the Tcl language.
Another option is to use an HTML filter. HTML filters are tools that let you take
a document produced with any kind of editor (including ASCII text editors) and convert
the document to HTML. Filters are useful when you work in an editor that has its
own proprietary format, such as Word.
HTML filters are attractive if you want to continue working in your favorite editor
and simply want a utility to convert your document with tags to HTML. Filters tend
to be fast and easy to work with, because they take a filename as input and generate
an HTML output file. The degree of error checking and reporting varies with the tool.
There are filters available for most types of documents, many of which are available
directly for Linux, or as source code that can be recompiled without modification
under Linux. Word for Windows and Word for DOS documents can be converted to HTML
with the CU_HTML and ANT_HTML extensions mentioned earlier. A few standalone conversion
utilities have also begun to appear. The utility WPTOHTML converts WordPerfect
documents to HTML. WPTOHTML is a set of macros for WordPerfect versions
5.1 and 6.0. The WordPerfect filter can also be used with other word processor formats
that WordPerfect can import.
FrameMaker and FrameBuilder documents can be converted to HTML format with the
tool FM2HTML. FM2HTML is a set of scripts that converts Frame documents to HTML,
while preserving hypertext links and tables. It also handles GIF files without a
problem. Because Frame documents are platform independent, Frame documents developed
on a PC or Macintosh could be moved to the Linux platform and FM2HTML executed there.
-
NOTE: A copy of FM2HTML
is available by anonymous FTP from bang.nta.no in the directory /pub.
The UNIX set is called fm2html.tar.v.0.n.m.Z.
LaTeX and TeX files can be converted to HTML with several different
utilities. There are quite a few Linux-based utilities available, including LATEXTOHTML,
which can even handle inline LaTeX equations and links. For simpler documents,
the utility VULCANIZE is faster but can't handle mathematical equations.
Both LATEXTOHTML and VULCANIZE are Perl scripts.
-
NOTE: LATEXTOHTML
is available through anonymous FTP from ftp.tex.ac.uk in the directory pub/archive/support
as the file latextohtml. VULCANIZE can be obtained from the Web
site http://www.cis.upenn.edu/~mjd/vulcanize.html.
RTFTOHTML is a common utility for converting RTF format documents to
HTML. Many word processors handle RTF formats, so an RTF document can be saved from
your favorite word processor and then RTFTOHTML run to convert the files.
-
NOTE: RTFTOHTML
is available through http:\\www.w3.org/hypertext/www/tools/rtftohtml-2.6.html.
Once you have written a Web document and it is available to the world, your job
doesn't end. Unless your document is a simple text file, you will have links to other
documents or Web servers embedded. These links must be verified at regular intervals.
Also, the integrity of your Web pages should be checked at intervals, to ensure that
the flow of the document from your home page is correct.
There are several utilities available to help you check links and also to scan
the Web for other sites or documents you may want to provide a hyperlink to. These
utilities tend to go by a number of names, such as robot, spider, or wanderer. They
are all programs that move across the Web automatically, creating a list of Web links
that you can access. (Spiders are similar to the Archie and Veronica tools for the
Internet, although neither of these cover the Web.)
Although they are often thought of as utilities for users only (to get a list
of sites to try), spiders and their kin are useful for document authors, too, because
they show potentially useful and interesting links. One of the best known spiders
is the World Wide Web Worm, or WWWW. WWWW enables you to search for keywords or create
a Boolean search, and can cover titles, documents, and several other search types
(including a search of all known HTML pages).
A similarly useful spider is WebCrawler, which is similar to WWWW except it can
scan entire documents for matches of any keywords. It displays the result in an ordered
list from closest match to least likely match.
-
NOTE: A copy of World
Wide Web Worm can be obtained from http://www.cs.colorado.edu/home/mcbryan/WWWW.html.
WebCrawler is available from http://www.biotech.washington.edu/WebCrawler/WebCrawler.html.
A common problem with HTML documents as they age is that links that point to files
or servers may no longer exist (because either the locations or the documents have
changed). It is therefore good practice to validate the hyperlinks in a document
on a regular basis. A popular hyperlink analyzer is HTML_ANALYZER. It examines each
hyperlink and the contents of the hyperlink to ensure that they are consistent. HTML_ANALYZER
functions by examining a document for all links, and then creating a text file that
has a list of the links in it. HTML_ANALYZER uses the text files to compare the actual
link content to what it should be.
HTML_ANALYZER actually does three tests: It validates the availability of the
documents pointed to by hyperlinks (called validation); it looks for hyperlink contents
that occur in the database but are not themselves hyperlinks (called completeness);
and it looks for a one-to-one relation between hyperlinks and the contents of the
hyperlink (called consistency). Any deviations are listed for the user.
HTML_ANALYZER users should have a good familiarity with HTML, their operating
system, and the use of command-line driven analyzers. The tool must be compiled using
the make utility prior to execution. There are several directories that
must be created prior to running HTML_ANALYZER, and when it runs, it creates several
temporary files when that are not cleaned up, so this is not a good utility for a
novice.
Setting up your home page requires you to either use an HTML authoring tool or
write HTML code directly into an editor. The HTML language is beyond the scope of
this book, but you should find several good guides to HTML at your bookstore. HTML
is rather easy to learn. With the information in this chapter, you should be able
to set up your Web site to enable anyone on the Internet to connect to you. Enjoy
the Web!