-->
Previous Table of Contents Next


Although many different types are supported by freeWAIS, only a few are really in common use. The most common file types supported by freeWAIS are the following:

  filename Same as text, except the filename is used as the headline.
  first_line Same as text, except the first line in the file is used as the headline.
  ftp Contains FTP code users can use to retrieve information from another machine.
  GIF GIF images, one image per file. The filename is used as the headline.
  mail or rmail Indexes the mbox mailbox contents as individual items.
  mail_digest Standard email, indexed as individual messages. The subject field is the headline.
  netnews Standard Usenet news, each article a separate item. The subject field is the headline.
  one_line Indexes each sentence in a document separately.
  PICT PICT image, one image per file. The filename is used as the headline.
  ps A PostScript file with one document per file.
  text Indexes the file as one document, the pathname as the heading.
  TIFF TIFF image, one image per file. The filename is used as the headline.

To tell waisindex the type of file to be examined, use the -t option followed by the proper type. For example, to index standard ASCII text, you could use the command:


waisindex -t text -r /usr/waisdata/*

This command indexes all the files in /usr/waisdata recursively, assuming they are all ASCII files.


Tip:  
When a document has been indexed, changes in the document are not reflected in the WAIS index unless a complete reindex is performed. Using the -a option does not update existing index entries. Instead, start the index process again. You should do this at periodic intervals as a matter of course.

Getting Fancy

You can provide some extra features for users of your freeWAIS service in a number of ways. Although this section is not exhaustive by any means, it shows you two of the easily implementable features that make a WAIS site more attractive.

To begin, suppose you want to make video, graphics, or audio available on a particular subject. Suppose, for example, your site deals with musical instruments, and you have several documents on violins. You may want to provide an audio clip of a violin being played, a video of the making of a violin, or a graphic image of a Stradivarius violin. To make these extra files available, you should have all the files with the same filename but different extensions. For example, if your primary document on violins is called violins.txt, you may have the following files in the WAIS directories:

violins.TEXT Document describing violins
violins.TIFF Image of a Stradivarius
violins.MPEG Video of the making of a violin body
violins.MIDI MIDI file of a violin being played

All these files should have the same root name (violins) but different types (recognized by waisindex). Then, you have to associate the multimedia files with the document file. You can do this with this command:


waisindex -d violin -M TEXT,TIFF,MPEG,MIDI -export /usr/waisdata/violin/*

This tells waisindex that all four types of files are to be handled. When a user searches for the keyword “violin,” all four types of files are matched, and options on the browser may let them play, view, or hear the nontext components.

Another common feature is the use of synonyms to account for different methods of specifying a subject. For example, a scientist may use the keyword “feline,” whereas a layperson may use “cat.” You want to be able to match these two words to the same thing. This is done through a file called SOURCE.syn, which is automatically read by the search engine when it is working. The SOURCE.syn file has the following format, where word is the word to be used to search the databases, and synonym is the word(s) that should match it:


word synonym [synonym …]

For example, if you are dealing with domestic pets in your WAIS site, you may have the following entries in the SOURCE.syn file:


cat  feline

dog  canine hound pooch

bird  parrot budgie

The synonym file can be very useful when people use different terms to refer to the same thing. An easy way to check for the need for synonyms is to set the logging option for waisindex to 10 for a while, and see what words people are using on your site. Don’t keep it on too long, however, because the logfiles can become enormous with a little traffic.

Summary

Now that WAIS is up and running on your server, you can go about the process of building your index files and letting others access your server. WAIS is quite easy to manage and offers a good way of letting other users access your system’s documents. The alternative approach, for text-based systems, is Gopher, which we examine in the next chapter. From here, there are a number of chapters you can go to for more information:

To learn how to set up a World Wide Web server on your Linux machine, see Chapter 51, “Configuring a WWW Site.”
To learn how to program in HTML to set up your home pages for the Web, see Chapter 53, “HTML Programming Basics.”
To learn how to use Java to provide more flexibility to your home pages, see Chapter 54, “Java and JavaScript Basics.”


Previous Table of Contents Next