Chapter 5

Designing Your CGI Application


CONTENTS


When you encounter anything in life, your brain is subconsciously doing a little bit of planning in your head. Maybe it's what you'll do now that you've gotten that new raise, or how you're going to dodge traffic to get to the new movie premiere. Whatever it is, you make a plan of attack, and then you decide how you want to follow up on that plan. You might do it, you might wait, or you might ditch it and start concentrating on something else or nothing at all. Aren't choices great?

You have plenty of choices when you sit down to write a CGI program. For instance, you can hack away in your programming language of choice until something pops out that's somewhat like what you thought you wanted when you sat down. You could also sit for some indeterminate amount of time while you flow-chart, plan, re-flowchart, analyze, call a few committee meetings together, and try to agree on what the heck the thing's going to be, much less how it's going to work. There's also that wonderfully large area in between.

CGI applications can be anything from a couple of lines of code to a behemoth that does more things than a Swiss army knife with an attitude. The key factor in any CGI development effort is how you're most comfortable with getting from point A to point B. If it's five lines of Perl, you may just want to do it between commercials or while you're waiting for some big file to download. The more complex an application is going to be, the more design time you'll probably want to spend on it.

This chapter is all about different components of CGI design. It's impossible to be psychic and know what you're going to want your application to do, so instead we'll cover the basics and more in a general overview, including the following:

There's nothing that says the design process has to take any longer than you need it to. None of the things in this chapter takes much time to apply to any CGI design process, whether it's just a couple of lines of code or a huge project that threatens to overwhelm you. By the end, no matter what kind of application you're making, you'll be in a good position to go out and make it happen quickly and easily. (And hopefully not spend too much time visiting Chapter 6, "Testing and Debugging.")

Sizing It Up

CGI comes in three difficulty levels: "Oh, that's easy," "I think I saw a script that does that," and "Ummm...." The fun part, though, is that two different people can easily assign very different difficulty levels to the same project, based on their experiences. So the first question to look at becomes: What's your first impression of how hard your application will be for you to create? If it's the first CGI script you've ever tried to make, or the first one using a particular function, chances are it's not going to be something you view as a casual thing to do while brushing your teeth.

As you write more and more, you familiarize yourself with basic concepts and tricks, and you get to the point where what was once frightening or frustrating becomes run of the mill. No matter how familiar you are with a language or even the CGI functions, though, you've got to start at the beginning to develop a complete idea of what your application will do.

What Does the Application Have to Do?

This is an easy question to answer, normally. Chances are it has to perform a specific function or set of functions, for a specific reason. People who are learning or just hacking about might not want to address a function so much as explore a concept, so the purpose may be more ambiguous. It's important, though, to make sure you have the purpose of the application set firmly in your head when you begin, so that you can keep a focus on just what it is you're going to make use of to meet your goal.

Just as there are roughly three levels of difficulty for a CGI program, there are roughly two schools of thought for CGI programs: Want User Input and Don't Want User Input. This doesn't rule out either class getting data from some other source like a file, a camera, or some other process, but it narrows down what kinds of things the application is going to be playing with. If you want user input, there's some mechanism on the user's side that allows him or her to dynamically control what information you receive, like a form. If you don't, there's probably a fixed link like "View the LochNess MonsterCam" or "Get a Random Link."

This is just the first of many questions you want to cover to get a better feel for your application. Don't worry about getting things down on paper and analyzing them; this isn't rocket science yet. All you want to do is clarify your position on what the program will do and what it won't do, by thinking over some of the following questions:

Note
In the formal design process, this is what's more stoically referred to as the "Needs Analysis."

Now you're ready to move up in the world and start thinking about the program and everything you're going to want it to do.

Preliminary Sketches

Mental picture firmly in hand, it's time to start making some sketches, to put a framework around the application itself. One of the easiest ways to do this is by sitting down and writing the flow of your program in words, putting down the steps as you see them in nontechie, crystal clear language. It's almost a sketch of your program, using the logical flow of what you feel it needs to do as an outline for code that will come later. You start out with the purpose of the program, then work from there. Let's take an example:

Purpose: Collect Product Survey Data from Customers

Steps
1.
Give customers some way of entering data.
2.
Get the data that's been entered.
3.
Store the data the customers supply.
4.
Thank the customer for stopping by.

You don't need any programming experience to make a sketch like that. This isn't the point where you concern yourself with what can and can't be done; this is where you idealize what will happen in very general terms. You'll be able to refine your program sketch as you go along, but already you have something that many developers don't have: a written outline of the program. If you're ever in a position where you need to lay out your site's functions to someone who's not familiar (or doesn't want to be familiar) with the technical side of things, a brief description like that is all the person needs to see in order to know what's going on. If that person wants to know the intimate details and inner workings, you can always give him or her those from one of the later sketches, but short and sweet is normally the best.

Now you have a real conceptual outline, but it doesn't do much for the details. The next step is to add those details. Things like the following: What will they use to enter the data? What data should they be entering? Are there certain bits of information we need them to enter? Where do we want to store the information? What kind of format should the information be stored in? As you can see, it's just the next logical step of questions. You've left the "Why" of the application behind, and now you're at the "What." It just takes a little bit more information to extend the program sketch out into more detail.

Purpose: Collect Survey Data from Customers
Steps
1.
Customers Fill out a Survey Form on our Web site.
The survey form asks for Name, e-mail address, product version, and comments.
2.
Information is read from the form.
3.
Information they entered is examined.
Did they give us an e-mail address? A name?
4.
If they gave us the information we needed, store it.
Should be added to a text file.
5.
If they didn't give the information we needed, ask them to do it again.
Should be an HTML page.
6.
Once we have the information, thank the customer for stopping by.
Should be an HTML page.

Now you know a little more about just what elements need to be in your design. You're going to need a form, and you're going to need that form to accept data in four specific fields: Name, E-Mail Address, Product Version, and Comments. Your program needs a convenient way to read the form data and then to be able to process it to make sure you gather the information you need. There's even the beginning of some error-catching code. If they provided what you were looking for, then you store it to a text file; if not, then you send them back a response asking for them to provide the missing information. At the end, assuming everything goes okay, you thank them for their time.

So far, there's nothing to it. That's the point: The design phase of a CGI application isn't brain surgery or rocket science. There's nothing difficult about expressing in plain language what you want a program to do, and it gives you the opportunity to review what you're going to do before you ever spend any time really coding it. That way, if you want to add something or change something, you haven't spent a whole lot of time getting into details that might not really matter later on. The goal is speed and clarity, and a simple outline meets that goal easily.

Once you have this outline and it looks like it will meet your needs, you're ready for the next step: figuring out how you're going to accomplish each task.

Scoping It Out

While a CGI application is a single entity, it's composed of parts that each performs very distinct functions. The key to a successful design is to recognize each of those pieces and see how they fit together to accomplish your goal. To do this, you just have to look at the steps of your outlined application and relate them to a CGI function. One of the best ways of doing this is to turn your outline into pseudocode.

Pseudocode

Pseudocode is just plain language with a little bit of techie added for good measure. It's still little more than an application sketch, but it's the first phase where you begin to draw in the elements you'll have to deal with when coding the application. What you start to add are the Hows and Wheres into your overall statement of What.

First of all, you have to wonder how data is going to find itself heading toward your program: Does your program want user input or not? Is it from a user filling out a form, like in the example we're building? Does it come from a database, like a random link program? Does it just do the same thing every time, like a fish-cam?

In this example, information is coming from a form that might look roughly like the one shown in Figure 5.1.

Figure 5.1: Shows name, e-mail address, product version, and comments fields.

Note
Forms are one of the most common methods of allowing users to enter data to be used by your CGI program. To understand all the things that forms can do for you, you'll want to make sure you look at the material presented in Chapter 8, "Forms and How to Handle Them."

This particular form specifies the four bits of data we're interested in. When the user clicks on the Submit Survey button, it tells the server to execute the CGI program. This is where the processes of CGI come into play.

Planning for Processing

Changing your general program sketch into something a little more on the techie side can best be approached in steps. In Chapter 3 "Crash Course in CGI," you were introduced to the processes of CGI and where data gets placed. Now you just need to define where those processes fit into your sketch. This is a little bit of a jump, in some cases, but not much.

What you want to do is break your sketch into sections, and then deal with each one of those sections individually before putting them back together into a real listing of pseudocode. What kind of sections should you be breaking it up into? Well, there are really four types of operations for a CGI program:

  1. Initialization/Termination
  2. Gathering Input
  3. Processing Input
  4. Generating Output

Out of these four possible phases, you're normally most concerned with parts 2 through 4. The initialization and termination of the CGI program involve memory and process allocation by the server, as well as some other background processes. While they're important steps, they're taken care of by the server software and the operating system; and they're out of your hands other than providing someone with a way to start your script through a hard-coded link or a form action. Sure, if you do something very strange with allocated memory or file locking, you'll want to be certain you clear that up (in case the server can't do it for you); but for the most part, you're out of the loop, if you're careful.

Gathering Input

There are very few CGI programs out there that don't take input of one kind or another. Whether it's from a user's form, a link, or even from an external file or device, something's normally being read in. In your program, where is data coming from? For every possible source, you need to apply acceptable methods of going in and getting that data.

The example we've been batting about, where you're obtaining the name, e-mail address, and comments from users, involves a form. To read the data in, the program has to determine where the data is. In this case, it's user data, so it's coming from environment variables and possibly standard input (STDIN), as well. You'll need to isolate where the data is by determining how it's getting to you, and then you'll be able to read it in. Because there are really two methods for a client to send data through an HTTP request, you want to determine if it's one of the two, and then act on it. If it's not either of the conditions you were expecting, bail out. Listing 5.1 shows an example of pseudocode for checking where the source of the data is and producing an error if it's neither of the expected situations.


Listing 5.1. Determining the source of user data.
read in REQUEST_METHOD environment variable
if REQUEST_METHOD is GET
    read in QUERY_STRING environment variable
if REQUEST_METHOD is POST
    read in CONTENT_LENGTH environment variable
    read CONTENT_LENGTH bytes from Standard Input (STDIN)
otherwise
    create an error message and end the program

Environment variables, like REQUEST_METHOD, provide storage for information about the client's request. When the client requested something through the GET method, all the data is stored in the QUERY_STRING environment variable. When using the POST method, the data has been sent to STDIN, and a count of how much data was sent is made available to your program in the environment variable CONTENT_LENGTH.

Processing

The Processing phase of a CGI program is where you let your design run wild. There's nothing that says you have to do your task in a specific way or what the result of it all has to be. The two things you need to pay attention to are making sure you correctly interpret information that's sent to the application and that it finishes the tasks you assign it.

Dealing with Input

By the time you get hold of incoming user data, two steps have been applied to it by the client and the server. It's up to your program to undo those steps and get the information back that it needs, but to do that you need to understand what's already been done.

Ordered Pairs
Information comes to your program in ordered pairs. That means that wherever applicable, there's a named chunk of data and a value that goes along with that name. The format looks something like this:
name1=value1&name2=value2&name3=more+values+here
In this example, there are three separate pairs of information, each separated by an ampersand (&). As you'll see in Chapter 8, you can control what the names are for these value pairs, which will make it easier to do things with the data and identify what the values are really there for.
URL Encoding
The other step that takes place when the data is sent is the replacement of special characters with a substitute value. In the preceding example, for the name and value pairs, you'll notice that in the last pair of information there are plus signs (+) between the words more, values, and here. This is the tip of the iceberg: When sending data that has spaces in it, those spaces are changed over to plus signs (+) so that the data is one continuous string with no information that could be interpreted as a break.
Other special characters include back and forward slashes, ampersands, line feeds/carriage returns, tildes, percent signs, and a variety of others. Whenever one of these characters is encoded, you'll see a percent sign followed by two digits, such as %25. What this means is that the two characters are actually the hexadecimal value of the character that originally went there.
For instance, because you use the percent sign as a special character, you'd need to encode it. Instead of seeing the character % in the data, you'd see its encoded equivalent, which is %25.
So, before your program decides to try to do anything with that data, make sure you run through and convert all plus signs (+) to spaces, find all the %## combinations, and convert them back to their original form, using whatever's available in your programming language.

Completing Your Tasks

What's the point of having a CGI program if it doesn't do what you want it to do in the first place? While you have complete control over what's done, and how, keep these things in mind:

  1. Provide error checking at every complex step.
  2. Don't get fancy when simple will work just as well.
  3. Be prepared for the unexpected: provide time-outs and other failsafes to ensure that your program doesn't just sit there.
  4. Be concerned about security: don't leave a hole that you think no one will find. They'll find it.
  5. Make sure you've provided for all possible cases of data.

Generating Output

Is your program going to tell the user when it's done doing what it was doing? Most likely it will, unless you're playing around with server-push images and just letting it sit there forever. Because output is a very important part of the application, give it at least as much thought as you give to accepting input. If your program has error handling, consider what kinds of errors you're going to return to the user. Would Error 4A give the user any idea what to do next? How about I'm sorry, I can't do that right now? Feedback is either data that the user was expecting or information the user needs to know, such as an execution error. If you've taken the time to check for the errors in the first place, take a little more time and help create errors that make sense, or at least don't impart a feeling of hopelessness in the user.

Output the user was expecting can vary, as well. Any type of output you send back to the server and the client needs to be prefaced with some instructions telling the server what kind of data it is. For instance, if you're thanking a user for filling out the survey, you're normally sending back HTML. The way to do this is to instruct the server that you're sending back HTML, and then send it. You can do this in Perl, as shown in Listing 5.2.


Listing 5.2. Sample HTML response in Perl.
print "Content-type: text/html \n\n";
print
"<h1>Survey Received</h1> \n";
print "Thanks for submitting the survey, we appreciate it. \n";

All that's needed is a Content-type: header. This is the MIME (Multipart Internet Mail Extensions) type that the information consists of, which gives the server some clue as to what to do with it.

The Fine Print

With pseudocode in hand, roll up your sleeves and sit down in front of the machine. The time of reckoning has come: It's time to let the code hit the machine. What you need to consider now are the ways of performing the tasks you've laid out for yourself and make sure everything is going to work smoothly, without too much effort on your part.

Libraries

Let's take a look at "without too much effort" for a moment. Looking at your application, are there things in it that you're not sure you know how to do-things that could be a real pain? For instance, writing your own special code to generate images on-the-fly or creating a whole URL decoding sequence just for one tiny, little three-line program that's just supposed to echo someone's name back as a cool example of what you did with CGI. Don't worry; you're not alone.

CGI libraries are very common because there are so many people doing CGI programming, and people have found easy ways of getting some of the most repetitive and complex tasks done without too much suffering. In fact, Chapter 4, "Comparison of the Various CGI Programming Libraries," is devoted entirely to the topic of libraries. They're everywhere. The point of libraries is to save you time and effort by providing you (normally at no charge) premade and pretested routines that perform certain tasks for you.

A great example of this is the classic cgi-lib.pl library for Perl, written by Steven Brenner and in current use by more people and their programs than can be counted easily. This simple library takes the drudge work out of reading in data and turns lines upon lines of code that beginning programmers may not be comfortable with into one reference to a subroutine that does everything for you. Imagine being able to find several pieces of code like this that people have made freely available that do the things you've been dreading trying to figure out in your program. Don't imagine any longer. Review Chapter 4, with your outline in hand, and see what you can find to save yourself some effort.

Languages

What programming language are you going to write this in? More importantly, what programming languages do you know? That will often make the decision for you. If you're ambitious enough to be versed in more than one programming language, you'll want to consider which language gives you the most benefit in using it. Speed of development is great, and thus the immense popularity of scripting languages, but how important is speed? Native compiled functions are normally a good bit faster than interpreted languages, but the speed of development can be a lot slower and fraught with more difficulty.

Are you going to need to take this to a different operating system? If you're starting on Windows, for instance, were you thinking of taking your Visual Basic program to a UNIX server? Let's hope not, or you'll be disappointed (at least at present).

Be sure that if where you start and where you end up are different, you're prepared to use the right language. Chapter 2, "The CGI Specification," provides a number of details on the languages you can and might choose, and now might be a good time to review it if your mind isn't already made up.

Share with Your Neighbors

Faster isn't always better because it takes more effort on the server machine to do things as fast as possible. Remember, your application may be trying to compete against itself for memory space and general file access, and you can't hog it all! The following three principles suggest a couple of different tactics you can use to be friendly to your server's environment.

Slow It Down

Two things you can do to make your applications more processor friendly are to slow it down and to be careful with memory. To slow it down, all that's required are occasional pauses. These don't have to be long pauses-in some cases no more than a tenth of a second or so-but if you've just done a huge process that takes up tons of memory and are about to do another, give your poor server a chance to recover. Imagine if it's running 30 copies of your application at once-or more.

Minimize Memory

Being careful with memory is more appropriate for compiled applications because most scripting languages don't normally force you to deal with memory allocation. If you're expecting to receive no more than 2K of data, don't specify that your program has a 20K buffer "just in case." If you want only 2K of data, force your program to read in only 2K of data, and dump the rest of it. This will also protect you somewhat from bogus or accidental requests that fill the input buffer with lots of junk. Also, take out any unreferenced local variables. C compilers often give you an error when they see them, and there's good reason: they're a waste of space.

Remember, though, that if you start to get tricky and reuse variables just to save a little bit of space, you can start making it much harder to make sense of your code. Be careful with memory, maybe even border on stingy, but keep your sanity and structure it so that the code is easy to deal with.

Enough Files to Go Around

If your program will be reading from files or writing to them, it's a good idea to place a lock check inside the program if you want to make sure that data doesn't get overwritten. A lock check can be as simple as creating a temporary file that the program checks for before trying to open or write to a specific file. If the temporary file is there, the program waits a moment and then checks again, until the lock is gone. Be sure to delete the lock file when the program has no more need for it!

Planning for the Future

One thing that is so easy to do during development, but so often overlooked, is the inclusion of comments. Are you going to remember why you did something in a particular way six months from now? Would someone else be able to review your code and understand it if they had to? There's no need to comment every line, but well-organized code with comments before major sections or tricky operations can turn a potential nightmare into a walk in the park when you have to make changes.

Placing comments is also a good way to notify yourself in areas where you think problems could develop later on, or where you want to add an additional function in the next version of the code. Sometimes these are just revision notes where you mark down what you've changed and what might still need to be changed later; sometimes they're just lines inserted wherever you feel like it. Do what comes naturally.

You Can Take It with You

If you have the luxury of writing your CGI application on the server that will eventually house it, moving your code around isn't really a big deal. Maybe you want to change a directory or two, change some permissions, or make other minor modifications, but the script has been running on the machine it needs to run on, and you're happy. Now you want to move the script to another site or sell it to someone, and they're running something different. Whoops, you're not so happy anymore.

The wonderful world of Web servers is not homogeneous. There are multiple operating systems in use by sites and different HTTP server software available for every platform. What you specify, design, and implement may not be portable to someone else's server. If you never want anyone to use your code except you, that may not be a bad thing. If, on the other hand, you're hoping to sell a special program that you wrote to the widest possible audience, you need to consider the differences in what's out there and account for them in your design. What are these all-important differences? Well, they can be broken down into two main categories: server software and operating systems.

Server Software

One of the easiest moves (normally) is between types of server software and staying on the same operating system. So maybe you have a Perl script on a Windows NT machine running Process Software's Purveyor, and you want to move it to another department's NT machine running Netscape's Commerce server. For the most part, there's not much to be concerned about...right? Well, a lot depends on your code.

For instance, let's look at directory structures. If you've hard-coded in paths to your files, do those paths exist on the new server? Different software and site maintainers mean that there's not some fixed location you can count on for data or storage. Your program needs to be able to adapt to these situations: If you coded it in C, would you want to have to recompile every time a directory structure change was made? Probably not. If you're depending on information from the server, whatever the form, your program needs to be flexible enough to take changes into account.

One way of being flexible is with a configuration file. Most programs can easily find a file that's in the directory they're running from or in some directory that must exist to have any chance for the program to run. In the configuration file, you can set up directory paths, variables, and other important information that will then be read dynamically by your program. This allows people (whether it's you or someone else) to modify those values without modifying the program itself.

Operating Systems

There's no one operating system that everyone runs. Sure, there are companies that would sure like to change that, but it's a fact of life that major differences exist on the very base levels of systems, and your program may have to take those into account. One of the first steps in this direction is to use an interpreted language, such as Perl or Java. These are both available on a number of platforms and don't necessarily require changes in order to make the code run on a different type of machine-that is, if the language has the functions you need.

One of the most difficult things about planning for cross-platform functions is that many components that are common in one environment may be completely alien to another. Take the common ls command for listing files on the UNIX side or grep, which performs text string pattern matching. It wouldn't be at all uncommon to write a program that listed out the files in an FTP site's directory, sorting them by file size or date. But if you wrote something that relied on either the ls or grep command and then took it to a pc, you'd be in trouble. How can you possibly accommodate differences on such a base level? With a little bit of trickery...

The configuration files mentioned in regard to server software come into play here. Provide a tag that specifies the operating system and evaluate that within your program. If it's possible for the functions you'll be doing, provide an alternate route for commands that may differ. For instance, Listing 5.3 shows a small fragment of Perl pseudocode that uses a variable called os to specify which operating system it's being run under.


Listing 5.3. Building operating-system independence into your scripts.
if ($os eq "UNIX") {
        ....
        }
if ($os eq "NT") {
        ....
        }
if ...
else  {
        ..insert error code here..
       }

Note
Notice that if you take this route, you'll also need to provide an error case if the configuration file doesn't exist, is inaccessible, or is just plain wrong.

Another item to take into account is the capability to access certain files. Assuming you get around basic operating system command set differences, you still have to make sure that you don't rely on reading information from files that just don't exist. For instance, much of the data that's stored in server configuration files on UNIX is stored in the Registry in Windows NT. Because it's almost impossible to code a generically cross-platform function that accesses the Registry, the focus of the code isn't necessarily "no changes" for portability, but rather "few changes."

The more you can do to make the code easy to translate between different systems and servers, the less frustration you'll encounter if the time ever comes to do so. In commercial CGI work, this is imperative; you can't spend all your life developing programs based on the WinCGI standard if you're planning on trying to win an account at a UNIX-intensive shop. However, by being familiar with what's involved in changing over, all the rough work will already be done, and you'll just reap the benefits.

Reuse

Another thing to consider when thinking of portability is whether or not you can put your current code to good use somewhere else, either through creating your own custom library or just cutting and pasting. More general functions like reading, parsing, and decoding data are the most commonly used library functions, but what about situations that are unique to your class of CGI applications?

If you're evaluating serial numbers, for instance, and connecting to a database to gather information about the user of that serial number, wouldn't it make sense to create a function that does that and then include it in the code? In Perl, this is as easy as creating a different Perl script with some subroutines in it, then inserting a #require 'myscript.pl'; line at the beginning of your new script. Throughout the rest of your program, you can call subroutines from your other script just as if you had typed them into the new script's code.

The more you access a function and the more complex it is, the more you should think about reusing it. After a while, between libraries you build yourself and ones you've found from other sources, your programs can be created faster and more efficiently, because as long as the method of use is the same between scripts, you'll be bringing in a precreated and pretested segment of code to perform an otherwise annoying function. And who better to write a library of functions useful to you than you?

Summary

The design and execution of a CGI program don't have to be torturous. It's very easy to take a rough idea and turn it into an outline that you can use to make your application run smoothly; you just need to spend the short amount of time it takes to review and re-review until you're sure it meets your needs. One of the benefits of a methodical design process is that it means less time will be spent trying to figure out why you did something a certain way, and it will give you or anyone else who needs to modify the code all the details necessary to see where the changes they need to make should go, and how they'll need to interact with the rest of the program. Measure twice; cut once.