|
To access the contents, click the chapter and section titles.
Platinum Edition Using HTML 4, XML, and Java 1.2
This script works; it finds instances of a search string in all files in a directory tree. But it ignores some problems and is definitely lacking in features. It would be nice, for example, to be able to specify the search to be case sensitive and whether multiple words should be treated as Boolean AND or OR. The display does not provide a link to the found files. Another missing feature is the context of the search hit. You know that the search terms are found in these files, but you have no idea if the use of them is trivial or important. You dont know how many times the search string was found, and you have no way to evaluate the relevance of a file.
Implementing a Third-Party Grepping Search EngineSeveral very popular grepping search engines are available on the Web. The following sections examine three of them:
All are written in Perl, and each has a little something to recommend it. All solve many of the problems mentioned in the last section and provide added functionality. Implementing Matts Simple Search Engine You can find Matts Simple Search Engine in Matts Script Archive at http://www.worldwidemart.com/scripts/, one of the most popular Perl script archives on the Web. Implementing Matts search engine is fairly simple: get the distribution archive, install it on your site, configure it, and create a search form. To configure the script, you need to edit several lines at the top to point to the base directory. The base directory is the base URL for the site and is used to create links to the found pages. You also need to insert a title to put on the resulting page and furnish links for the home page and search page. Because Matts script does not do recursion, you also need to specify all the subdirectories you want searched. This can be tedious to maintain as your site changes, so you may want to modify the file finding script from the previous example and combine it with calls to Matts engine to perform the search. After you finish configuring, you need to create a page that incorporates something similar to Listing 31.11. Listing 31.11 mattform.txtA Simple Form Allowing the Selection of Search Parameters <FORM method=POST action=http://worldwidemart.com/scripts/cgi-bin/demos/search.cgi> <CENTER><TABLE border> <TR> <TH>Text to Search For: </TH> <TH><INPUT type=text name=terms size=40><BR></TH> </TR><TR> <TH>Boolean: <SELECT name=boolean> <OPTION>AND <OPTION>OR </SELECT> </TH><TH>Case <SELECT name=case> <OPTION>Insensitive <OPTION>Sensitive </SELECT><BR></TH> </TR><TR> <TH colspan=2><INPUT type=submit value=Search!> <INPUT type=reset><BR></TH> </TR></TABLE></FORM></CENTER> <HR size=7 width=75%><P> This form produces a Web page similar to that shown in Figure 31.3. You may wish to design your own search interface. If so, your form needs to present the following three parameters to the search script:
The result of a search using Matts Simple Search Engine interface will look similar to that shown in Figure 31.4.
Notice that each found page is represented by a link to that page. The search terms are also provided, along with the Boolean and case sensitivity settings. Matts script works fine and is fairly fast. It took 3 CPU seconds and about 10 elapsed seconds to search about 250 files on my site. Some desirable features are lacking, howeverfor example, only the titles of found files are displayed. No context indicates whether the search term is merely mentioned in the file or whether significant information about the term is contained in it. When presented with a list of dozens of files as the result of a search, with no way to distinguish between them, users may become weary of trying to find the information and visit a different site. File titles are presented in no particular order, which is not very helpful in determining their relevance. The results also do not indicate how many times a search term was found in a particular file or, in the case of multiple-word search terms, whether the words were found in close proximity. The user has no control over partial matches such as finding state within estate and intestate. Whatever the user types becomes the search string. In addition, various implementation problems exist with this simple search engine. Because it does not support recursion, control over which directories are searched rests entirely in the hands of the Webmaster, who must remember to add new directories to the variable in the script file. Files or directories also are not easily excluded from a search. In addition, no limit is placed on the number of files that can be returned, nor are stop words ignored. Given the way that directories must be explicitly specified, this may not seem to be a big drawback, but what if you have painstakingly added all directories on your site to the script and someone searches for the word the? A better way is definitely needed to control the directories that are searched. Fortunately, Htgrep satisfies many of these objections.
|
Products | Contact Us | About Us | Privacy | Ad Info | Home
Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement. |