|
To access the contents, click the chapter and section titles.
Platinum Edition Using HTML 4, XML, and Java 1.2
Implementing the Htgrep Search Engine Htgrep, written by Oscar Nierstrasz, can be obtained at http://iamwww.unibe.ch/~scg/Src/Doc/htgrep.html or in the Software Composition Group Software Archives at http://iamwww.unibe.ch/~scg/Src/. The major differences between Htgrep and Matts script is that Htgrep automatically recurses subdirectories, and it supports Boolean AND searches as well as case-sensitive searches. After you have installed the Perl script htgrep.pl and the associated scripts find.pl, html.pl, and bib.pl, you configure the base directory by changing a variable at the beginning of htgrep.pl. Other variables you configure include the path to users public HTML directories and any pseudo URLs (URLs that have been aliased) that you want included in the search.
You will need to modify the wrapper to configure the location of your Perl library files. The CGI wrapper assumes that find.pl, which was used in an earlier example, is located in the library. You can find the find.pl program in the Htgrep distribution, if you dont already have it.
After you have configured the CGI wrapper, you need to build a form for your users to specify parameters. The form provided with the distribution appears in Listing 31.12. Listing 31.12 Htform.txtSample Form for Use with HTGREP <H2>Generic Form</H2> <FORM ACTION=/~scg/cgi-bin/htgrep.cgi> <P> <INPUT NAME=file SIZE=30 VALUE=/~scg/Src/Doc/htgrep.html > <! VALUE=/~scg/Src/Doc/htgrep.html !> <B>File to search</B> (relative to WWW home) <BR> <INPUT NAME=isindex SIZE=30> <B>Query</B> <INPUT TYPE=submit VALUE=Submit> <INPUT TYPE=reset VALUE=Reset> <DL> <DT><B>Query style:</B> <DD> <INPUT type=checkbox name=case value=yes> Case Sensitive <DD> <INPUT type=radio name=boolean value=auto checked=yes> Automatic Keyword/Regex <INPUT type=radio name=boolean value=yes> Multiple Keywords <INPUT type=radio name=boolean value=no> Regular Expression <DT><B>HTML Files:</B> <DD> <INPUT type=radio name=style value=none checked=yes> Ordinary Paragraphs <INPUT type=radio name=style value=ol> Numbered list <INPUT type=radio name=style value=ul> Bullet list <INPUT type=radio name=style value=dl> Description list <DT><B>Plain Text:</B> <INPUT type=radio name=style value=pre> (preformatted) <DD> <INPUT type=checkbox name=grab value=yes> Make URLs live (works with plain text only) <DT><B>Refer Bibliography files:</B> <INPUT type=checkbox name=refer value=yes> <DD> <INPUT type=checkbox name=abstract value=yes> Show Abstract <INPUT type=checkbox name=ftpstyle value=dir> Link to directories, not files (for refer files) <DD> <INPUT type=radio name=style value=ul> Bullet list (instead of numbered) <DT><B>Max records to return:</B> <INPUT NAME=max VALUE=250" SIZE=10> </DL> </FORM> This code produces a form similar to the one shown in Figure 31.5.
A welcome feature of Htgrep is its support for regular expressions. Although most users are probably not well-versed in the use of regular expressions, most at least can understand using the asterisk to fill out portions of words. Additionally, unless you use regular expressions, Htgrep searches on whole words, which is a nice feature. Using the default search form, you can also determine the format of the resulting hits pageeither full paragraphs or various types of listings.
To enable the return of entire paragraphs from a search, Htgrep takes a different approach to finding text in files. Rather than assembling one huge string from all the lines in the files, Htgrep enables you to specify a record delimiter and then searches each record in a file. You may decide, for example, that you want HTML paragraph tags (<P>) to be your record delimiter. It is the record orientation of the search that allows Htgrep to return the context for a search hit. Htgrep returns the entire record in which it found the search term. The user thus sees the entire paragraph and can better determine whether the page meets his or her needs. Htgrep does this by using Perls capability to define a record delimiter. This is demonstrated in the following code fragment: # the default record separator is a blank line #$separator = ; $separator = <P> [. . .] # normally records are separated by blank lines # if linemode is set, there is one record per line if ($tags{linemode} =~ /yes/i) { $/ = \n; } else { $/ = $separator; }
|
Products | Contact Us | About Us | Privacy | Ad Info | Home
Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement. |