home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


Table 31.2 Some Available Commercial Indexing Search Engines

Company Tool Name URL Free?

Verity Search 9 http://www.verity.com/products/datasheets/dk.html No
Thunderstone The Webinator Web Index & Retrieval System http://www.thunderstone.com/webinator/ Yes (shareware)
AltaVista AltaVista Search eXtensions http://altavista.software.digital.com/search/index.htm No
Inmagic/Lycos DB/SearchWorks http://www.inmagic.com/textprod.htm#sm No
Excite Excite for Web Servers http://www.excite.com/navigate/home.html Yes
Netcreations Pinpoint http://www.netcreations.com/pinpoint/ No (free trial)
SDSU ht://dig http://htdig.sdsu.edu Yes

The sections that follow focus on implementing the following five indexing search engines:

  WebGlimpse, developed by the University of Arizona
  ICE by Christian Neuss
  Simple Web Indexing System for Humans - Enhanced (SWISH-E) by Kevin Hughes
  freeWAIS from the University of Dortmund, Germany
  Excite for Web Servers

Implementing WebGlimpse

GLIMPSE (which stands for GLobal IMPlicit SEarch) and its Web companion, WebGlimpse, are projects of the University of Arizona’s Computer Science Department. WebGlimpse is available for free for nonprofit use. A small licensing fee is charged for commercial users. The University has recently developed a new program called the Search Broker. The Search Broker forwards your query to a search engine dealing specifically with the subject of your question, which you specify as the first word of your query.

A recent search of the Web turned up hundreds of sites that are using this popular tool or its precursor, GlimpseHTTP. A partial list of sites is available at http://glimpse.cs.arizona.edu/ghttp/sites.html. GLIMPSE is also used as a basis for Harvest Information Discovery and Access System (http://harvest.cs.colorado.edu/).

As the name implies, the program displays glimpses of context samples from the files. This makes it a particularly useful tool, even though it doesn’t offer relevance ranking.

GLIMPSE is available at

ftp://ftp.cs.arizona.edu/glimpse/

You can obtain WebGlimpse at

http://glimpse.cs.arizona.edu/webglimpse/


CAUTION:  

Be sure to get the most recent version of WebGlimpse. In July 1997, a security hole was discovered in the program and fixed.


The distribution comprises GLIMPSE, written in C, glimpseindex, another C program that creates the index, the webglimpse script itself, written in Perl, and an assortment of Perl utilities that you use to create and manage your indexes.

Installation is mostly automated but definitely not foolproof. Sometimes several attempts are needed to get it installed smoothly. After it is installed, you need to run a Perl script that creates the WebGlimpse index using glimpseindex. GLIMPSE can build indexes of several sizes, from tiny (about 1% of the size of the source files) to large (up to 30% of the size of the source files). Even small indexes are practical and offer good performance.

Other welcome features include the capability to index pages that have been added only since the last index, a facility to index offsite links, the capability to set a tolerance for spelling errors, and the capability to establish neighborhoods. Neighborhoods are defined as all links within an arbitrary number of hops from a page or all pages within a directory.


NOTE:  Running the index can consume quite a lot of time. Using WebGlimpse’s option that enables indexing of external links as well as local pages, indexing took 45 minutes to index almost 600 files on my site. After that index was done, however, a re-index without the external option took only a few minutes.

After the index has been established, you can use a cron job (a program which executes applications for you at defined times) to run it periodically to maintain it. The installation routine even creates the job for you.

Using the WebGlimpse Perl script (created by the install) to perform searches is easy. After aliasing to the proper directory, you call the script with a parameter that indicates where the index resides. The user sees a basic search form if the script is called directly.

Alternatively, you can include either of two code fragments in your Web pages to provide a nicer looking interface. The two interface styles are created using the HTML code fragments in Listing 31.15.

Listing 31.15 Glimform.txt—Two Forms for Calling WebGlimpse


<H2>Basic WebGlimpse Interface</H2>

<CENTER>
<TABLE border=5><TR border=0>
<TD align=center valign=middle>
<A HREF=http://glimpse.cs.arizona.edu/webglimpse>
<IMG src=/images/glimpse-eye.jpg alt=”WG” align=middle width=50><BR>
<FONT size=-3>WebGlimpse</FONT></A></TD>
<TD> <FORM method=get ACTION=/$CGIBIN/webglimpse$ARCHIVEPWD>
<INPUT NAME=query size=20>
<INPUT TYPE=submit VALUE=”Search”>
<INPUT name=file type=hidden value=”$FILE”>
<A HREF=/$CGIBIN/webglimpse-fullsearch$ARCHIVEPWD?file=$FILE>
Search Options</A></TD></TR>
<TR><TD colspan=2>
Search:
<INPUT TYPE=radio NAME=scope VALUE=neighbor CHECKED>
The neighborhood of this page
<INPUT TYPE=radio NAME=scope VALUE=full>The full archive
</TD></TR></FORM></TABLE></CENTER><HR>
<H2>Full-Featured WebGlimpse Interface</H2>
<TABLE border=5>
<TR><TD align=center valign=middle>
<A HREF=http://glimpse.cs.arizona.edu/webglimpse>
<IMG src=”/images/glimpse-eye.jpg”
align=middle></TD>
<TD align=center valign=middle>
<A HREF=http://glimpse.cs.arizona.edu/webglimpse>
<FONT size=+3>WebGlimpse </A> Search<BR></FONT></TD>
</TR>

<TR><TD colspan=2>
<FORM method=get ACTION=>
<INPUT name=file type=hidden value=/home/msmith/public_html/big/index.html>
Search:
<INPUT TYPE=radio NAME=scope VALUE=neighbor>
The neighborhood of <Ahref=””>the ACNielsen Web Site
</A>
<INPUT TYPE=radio NAME=scope VALUE=full CHECKED>The full archive:
<AHREF=””>the ACNielsen Site including links offsite</A>
</TD></TR>

<TR><TD colspan=2>
String to search for: <INPUT NAME=query size=30>
<INPUT TYPE=submit VALUE=Submit>
<BR>
<CENTER>
<INPUT NAME=case TYPE=checkbox>Case&#160;sensitive
<!SPACES>&#160;&#160;&#160;
<INPUT NAME=whole TYPE=checkbox>Partial&#160;match
<!SPACES>&#160;&#160;&#160;
<INPUT NAME=lines TYPE=checkbox>Jump&#160;to&#160;line
<!SPACES>&#160;&#160;&#160;
<SELECT NAME=errors align=right>
<OPTION>0
<OPTION>1
<OPTION>2
</SELECT>
misspellings&#160;allowed
<BR>
</CENTER>
Return only files modified within the last <INPUT NAME=age size=5>
days.
<BR>
Maximum number of files returned:
<SELECT NAME=maxfiles>
<OPTION>10
<OPTION selected>50
<OPTION>100
<OPTION>1000
</SELECT>
<BR>Maximum number of matches per file returned:
<SELECT NAME=maxlines>
<OPTION>10
<OPTION selected>30
<OPTION>50
<OPTION>500
</SELECT>
<BR>
</FORM>
</TD></TR>
<TR><TD colspan=2>
<CENTER>
<FONT size=-2><A HREF=http://glimpse.cs.arizona.edu>
Glimpse</A> and <A HREF=http://glimpse.cs.arizona.edu/webglimpse>
WebGlimpse</A>, Copyright &copy; 1996,
Arizona Board of Regents.
</CENTER>
</FONT></TD></TR>
</TABLE></CENTER>
</CENTER>


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.