|
|
|
To access the contents, click the chapter and section titles.
Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98
Where Bad Data Comes From
These situations can arise in several wayssome innocent, some not. For instance, your script can receive data that it doesnt expect because somebody else wrote a form (that requests input completely different from yours) and accidentally pointed the forms ACTION attribute to your CGI script. Perhaps they used your form as a template and forgot to edit the ACTION attributes URL before testing it. This would result in your script getting data that it has no idea what to do with, possibly causing unexpectedand dangerousbehavior.
Or the user might have accidentally (or intentionally) edited the URL to your CGI script. When a browser submits form data to a CGI program, it simply appends the data entered into the form onto the CGIs URL (for GET methods). The user can easily modify the data being sent to your script by typing in the browsers Address bar.
Finally, an ambitious hacker might write a program that connects to your server over the Web and pretends to be a Web browser. This program, though, can do things that no true Web browser would do, such as send a hundred megabytes of data to your CGI script. What would a CGI script do if it didnt limit the amount of data it read from a POST method because it assumed that the data came from a small form? It would probably crash and maybe crash in a way that would allow access to the person who crashed it.
Fighting Bad Form Data
You can fight the unexpected input that can be submitted to your CGI scripts in several ways. You should use any or all of them when writing CGI.
- First, your CGI script should set reasonable limits on how much data it will accept, both for the entire submission and for each name/value pair in the submission. If your CGI script reads the POST method, for instance, check the size of the CONTENT_LENGTH environment variable to make sure that its something that you can reasonably expect. Although most Web servers set an arbitrary limit on the amount of data that will be passed to your script via POST, you may want to limit this size further. For instance, if the only input your CGI script is designed to accept is a persons first name, it might be a good idea to return an error if CONTENT_LENGTH is more than, say, 100 bytes. No reasonable first name will be that long, and by imposing the limit, youve protected your script from blindly reading anything that gets sent to it.
NOTE: In most cases, you dont have to worry about limiting the data submitted through the GET method. GET is usually self-limiting and wont deliver more than approximately one kilobyte of data to your script. The server automatically limits the size of the data placed into the QUERY_STRING environment variable, which is how GET sends information to a CGI program.
Of course, hackers can easily circumvent this built-in limit by changing the METHOD attribute of your form from GET to POST. At the very least, your program should check that data is submitted using the method you expect; at most, it should handle both methods correctly and safely.
- See Standard CGI Environment Variables, p. 708.
- Next, make sure that your script knows what to do if it receives data that it doesnt recognize. If, for example, a form asks that a user select one of two radio buttons, the script shouldnt assume that just because one isnt clicked, the other is. The following Perl code makes this mistake.
if ($form_Data{radio_choice} eq button_one)
{
# Button One has been clicked
}
else
{
# Button Two has been clicked
}
- Your CGI script should anticipate unexpected or impossible situations and handle them accordingly. The preceding example is pretty innocuous, but the same assumption elsewhere can easily be dangerous. An error should be printed instead, as follows:
if ($form_Data{radio_choice} eq button_one)
{
# Button One selected
}
elsif ($form_Data{radio_choice} eq button_two)
{
# Button Two selected
}
else
{
# Error
}
Of course, an error may not be what you want your script to generate in these circumstances. Overly picky scripts that validate every field and produce error messages on even the slightest unexpected data can turn users off.
| The balance between safety and convenience for the user is important. Dont be afraid to consult with your users to find out what works best for them.
|
- To have your CGI script recognize unexpected data, throw it away, and automatically select a default is a possibility, too. The following is C code that checks text input against several possible choices, for instance, and sets a default if it doesnt find a match. You can use this to generate output that might better explain to the user what you expect.
/* Notes for non-C programmers: */
/* Contrary to what its name implies, */
/* the C string comparison function strcmp used below */
/* returns 0 (false) when its two arguments match and */
/* returns nonzero (true)_when its two arguments differ. */
/* A better name might be strDIFF. */
/* In C, && is logical AND. */
/* */
/* If the help_Topic is not any of the three choices given... */
if ((strcmp(help_Topic,how_to_order.txt)) &&
(strcmp(help_Topic,delivery_options.txt)) &&
(strcmp(help_Topic,complaints.txt)))
{
/* then set help_Topic to the default value here */
strcpy(help_Topic,help_on_help.txt);
}
- However, your script might try to do users a favor and correct any mistakes rather than send an error or select a default. If a form asks users to enter the secret word, your script can automatically strip off any whitespace characters from the input before doing the comparison, such as the following Perl fragment:
# Remove whitespace by replacing it with an empty string
$user_Input =~ s/\s//g;
if ($user_Input eq $secret_Word)
{
# Match!
}
|