-->
Previous Table of Contents Next


WAIS Index Files

The freeWAIS index files are not usually readable by a system user (although one or two files can be read with some success). Usually, waisindex creates seven index files, although the number may vary depending on requirements. Each index file has a specific file extension to show its purpose, based on a root name (specified on the waisindex command line, or defaulting to index). The index files and their purposes are as follows:

  index.doc A document file that contains a table with the filename, a headline (title) from the file, the location of the first and last characters of an entry, the length of the document, the number of lines in the document, and the time and date the document was created.
  index.dct A dictionary file that contains a list of every unique word in the files cross-indexed to the inverted file.
  index.fn A filename file that contains a table with a list of the filenames, the date they were created in the index, and the type of file.
  index.hl A headline file that contains a table of all headlines (titles). The headline is displayed in the search output when a match occurs.
  index.inv Inverted files that contain a table associating every unique word in all the files with a pointer to the files themselves and the word’s importance (determined by how close the word is to the start of the file, the number of times the word occurs in the document, and the percentage of times the word appears in the document).
  index.src A source description file that contains descriptions of the information indexed, including the host name and IP address, the port watched by WAIS, the source file name, any cost information for the service, the headline of the service, a description of the source, and the email address of the administrator. The source description file is editable by ASCII editors. We will look at this file in a little more detail shortly.
  index.status A status file containing user-defined information.

The source description file is a standard ASCII file that is read by waisindex at intervals to see whether information has changed. If the changes are significant, waisindex updates its internal information. A sample source file looks like this:


 (:source

  :version 2

  :ip-address “147.120.0.10”

  :ip-name: “wizard.tpci.com”

  :tcp-port 210

  :database-name “Linux stuff”

  :cost 0.00

  :cost-unit: free

  :maintainer “wais_help@tpci.com”

  :subjects “Everything you need to know about Linux”

  :description “If you need to know something about Linux, it’s here.”

You should edit this file when you set up freeWAIS because the default descriptions are rather sparse and useless.

The waisindex Command

The waisindex command allows a number of options, some of which you have seen earlier in this chapter. The following list contains the primary waisindex options of interest to most users:

  -a Appends data to an existing index file (used to update index files instead of regenerating them each time a new document is added).
  -contents Indexes the file contents (default action).
  -d Gives the filename root for index files (for example, -d /usr/wais/foo named all index files as /usr/wais/foo.xxx).
  -e Gives the name of the log file for error information (default is stderr—usually the console—although you can specify -s for /dev/null).
  -export Adds the host name and TCP port to descriptions for easier Internet access.
  -l Gives the level of log messages. Valid values are as follows:
0, no log
1, log only high priority errors and warnings
5, log medium priority errors and warnings, as well as index filenameinformation
10, log every event
  -M Links multiple types of files.
  -mem Limits memory usage during indexing (the higher the number specified, the faster the indexing process and the more memory used).
  -nocontents Prevents a file from being indexed (indexes only the document header and filename).
  -nopairs Instructs waisindex to ignore adjacent capitalized words from being indexed together.
  -nopos Ignores the location of keywords in a document when determining scores.
  -pairs Indexes adjacent capitalized words as a single entry.
  -pos Determines scores based on locations of keywords (proximity of keywords increases scores).
  -r Recursive subdirectory indexing.
  -register Registers your indexes with the WAIS Directory of Services.
  -stdin Uses a filename from the keyboard instead of a filename on the command line.
  -stop Indicates a file containing stopwords (words too common to be indexed), usually defined in src/ir/stoplist.c.
  -t Data file type indicator.
  -T Sets the type of data to whatever follows.

The waisindex program has to be told the type of information in a file; otherwise it may not be able to generate an index properly. Many file types are currently defined with freeWAIS, and you can display them by entering this command with no argument:


waisindex


Previous Table of Contents Next