|
To access the contents, click the chapter and section titles.
Platinum Edition Using HTML 4, XML, and Java 1.2
Any time you add a piece of software to your site, you need to be concerned with its impact on site security. Can the software be overwhelmed by an attack and provide direct access to the site? Does it offer a way for users to execute programs on your server? Before releasing a search engine for production use, you may want to experiment with it, to try to overwhelm it or get it to produce unpredictable results.
The potential for users to use your search engine to execute arbitrary code on your Web server is obviously a serious security concern. If the search engine uses the Perl eval command to perform the search, you need to be sure to screen search terms to remove potentially harmful characters and code before passing them to the search engine. On UNIX systems, this means preventing the user from entering a search term containing the escape symbol (!) or any commands that could be used to invoke a command interpreter (!sh, for example). Even if your search engine doesnt offer a security hole, you still need to be sure that users cant see information on your site that they usually are prevented from seeing. It is common on sites using the NCSA Web server, for example, to use access control files (typically .htaccess) to control access to sensitive directories. If the search engine ignores these access control files, it can return links to or summaries of the files contained in protected directories. At best, your users will be frustrated at seeing links that they are prevented from following. At worst, file summaries can compromise the confidentiality of protected information. And finally, a security concern that is really a resource concern: You may want to limit the amount of resources any one user of your search engine can consume, or the number of simultaneous searches that can occur. A malicious user can bring your server to its knees by launching a large number of time-consuming searches. Most search engines do provide a method of controlling access in this way. You may need to use other system-management tools to regulate search engine use. Making the DecisionWhich search engine you select depends in part on whether you prefer the timely, but resource-hungry, grepping approach or the faster, CPU-friendly, indexing approach. Regardless of the approach you pick, you should evaluate several requirements before selecting your engine:
These are just some of the questions you should ask yourself as you plan to add a search capability to your site. The discussion that follows examines how well various approaches satisfy these requirements. Implementing a Grepping Search EngineGrepping search engines share a common methodology: Start at an arbitrary point in the directory tree, open each HTML file in the tree, and search the file for the search term. Optionally, the engine might recursively follow each subsequent directory branch encountered and repeat the search process. This allows for unsophisticated searches, although it is possible to enable support for searches using regular expressions. Building Your Own Grepping Search EngineTo help you better understand how grepping search engines work, this section shows how you can use the Perl language to build your own. In building your own grepping search engine, you will need to tackle two problems: finding files to search, and searching those files for search terms. First, it is important to examine the problem of finding files to search. Using a couple of key Perl capabilities, it is easy to build a recursive routine that will identify the types of files contained within a directory tree, perform an operation on them, and continue the process with underlying directories. The Perl script in Listing 31.7 demonstrates this approach.
|
Products | Contact Us | About Us | Privacy | Ad Info | Home
Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement. |