home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


The user can choose either a concept-based search or a conventional keyword AND search. When you run a search, EWS lists search results in decreasing order of confidence. Each result consists of a title, an URL, a confidence rating, and an automatically generated summary of what the page is about. Relevance ranking is the default, but a click of the mouse enables the user to see the same results grouped by subject or topic.

Excite also supports relevance feedback, or query-by-example, searching. Using this technique, if you visit a found page and find it is pretty much what you’re looking for, you can return to the search results and click the icon next to the listing to initiate another search. The subsequent search uses the found page as a parameter and returns similar pages.

Excite doesn’t require a thesaurus to do concept-based searching, but the company indicates that an external thesaurus can improve results. Because a thesaurus is not necessary, adding support for new languages supposedly is not as difficult as with some other software. Architext claims that independent software developers can also write modules to support additional data file formats without facing too many obstacles.

Architext currently offers the software at no charge and sells annual support contracts. Further information about Excite for Web Servers can be found at

http://www.excite.com/navigate/

Quite a few sites are running the Excite search engine. One good example is the Houston Chronicle search page at

http://www.chron.com/content/search/

Installing Excite is described as Plug and Play, and it couldn’t be easier. Download the distribution archive (along with the C++ libraries if you need them), run a shell script that asks a few questions, and you’re just about ready to go. You need to run an administrative script that creates the index and another script that creates the search page. Both scripts are run from Web forms.


NOTE:  EWS took 16 minutes, 40 seconds of CPU time to index my UNIX site; elapsed time was 23 minutes. It thoughtfully provided status pages that enabled me to keep tabs on the progress of the indexing. EWS created an index that was around 7MB in size on a collection 4,490 files consisting of slightly more than 90MB. It even emailed me when it was done.

After generating the index, you then generate the search page by using an HTML form. EWS creates a page that includes a search form and a link to the custom-generated search script for this collection. The resulting search page looks similar to that shown in Figure 31.9.


FIGURE 31.9  The Excite for Web Servers search form enables users to search for keywords or concepts.

Notice that the form does not provide options for case sensitivity or Boolean searches. This is because Excite employs concept matching to do its searching. The company suggests creating queries that are descriptions of information rather than lists of keywords.

Excite for Web Servers will search for documents that are a best match for the words in your query. Excite for Web Servers will also search for documents that are about the same concepts that your query describes, so sometimes Excite for Web Servers will bring back articles that don’t mention any of the words in your original query.

The more search words, the better the query. Unfortunately, because the search algorithm is proprietary, you have to trust that EWS will perform.

Excite for Web Servers uses Excite’s proprietary Intelligent Concept Extraction (ICE) search method (which is apparently not related to Christian Neuss’ ICE engine). An excellent discussion of search strategies can be found on Excite’s site at http://www.excite.com/ice/tech.html. Although Excite does not provide a lot of detail about its patent-pending proprietary search techniques, ICE is described as a means to find and score documents based on a correlation of their concepts as well as actual keywords. Excite states that this capability to go beyond simple Boolean searches of keywords is the key to its technology.

Using techniques similar to Latent Semantic Indexing (for more information on LSI, visit http://n106.is.tokushima-u.ac.jp/member/kita/EPrint/index-LSI.html), Excite claims that it can perform rapid searches without significant resources as well as maintain performance when the size of the index is scaled up. According to Excite, “Unlike other systems which need more time to perform a query as the size of the database increases, the Excite search engine can perform most queries in a constant amount of time.”

A typical results page resembles that shown in Figure 31.10.


FIGURE 31.10  The results page from Excite for Web Servers includes links to the found file, a summary, and the confidence rating. The icon on each line enables you to submit a new query to find similar pages.

Producing this search page took a little more than a second of CPU time and five or six seconds of elapsed time.

If you click the icon beside a page listing, Excite performs a query by example search using that document as the criteria.

A nice touch to the search display is the inclusion of confidence scores. Just the fact that the search hits are ranked conveys information. If you receive 20 responses with the highest confidence of only 50%, you might want to reformulate your search terms and try again rather than examining pages with low probability of satisfying your request.

EWS ignores stop words. These words are maintained in a table, but no way seems to be available to edit or add to them.

Excite for Web Servers is quite an impressive search tool that is easy to install and easy to implement. It creates a small index file and searches consume little system resources and are quite rapid. The inability to maintain the stop words tables and the lack of significant documentation on the operation of the system are its only drawbacks. Given its ease of use and strong features, however, such complaints are minor.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.