HTML 4.0 Sourcebook:Data Processing on an HTTP Server

To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Table of Contents

Figure 10.13 Perl code extract for decoding FORM data passed to the program via standard input. Differences from the extract in Figure 10.11 are shown in italics. Note that this is not a functional piece of code and that the extracted name and value strings must be placed in a permanent storage location (such as an associative array) for subsequent processing.

$input=<STDIN>                             # read FORM data from stdin
chop($input); chop($input);                # chop CR/LF trailing characters:
                                           # recall that the data sent by a client
                                           # is always terminated by a single line
                                           # containing only a CRLF pair. This                                           # must be removed, since it is not
                                           # part of the message body.
                                           # Check for unencoded equals sign -- if
                                           # there are none, the string didn’t
if( $input !~ /=/ ) {                      # come from a FORM, which is an error.
  &pk_error(“Query String not from FORM\n”);
}
                                           # If we get to here, all is OK. Now
@fields=split(“&”,$input);                 # split data into separate name=value
 # fields(@fields is an array)

#   Now loop over each of the entries in the @fields array and break
#   them into the name and value parts. Then decode each part to get
#   back the strings typed into the form by the user

foreach $one (@fields) {
 ($name, $value) = split(“=”,$one);        # split,at the equals sign,into
                                           # the name and value strings. Next,
                                           # decode the strings.
 $name  =~ s/\+/ /g;                       # convert +’s to spaces
   $name  =~ s/%(..)/pack(“c”,hex($1))/ge; # convert URL hex codings to
Latin-1
   $value =~ s/\+/ /g;    # convert +’s to spaces
   $value =~ s/%(..)/pack(“c”,hex($1))/ge; # convert URL hex codings to
Latin-1

   #    What you do now depends on how the program works. If you know that
each
   #    name is unique (your FORM does not have checkbox or SELECT items
that
   #    allow multiple name=value strings with the same name) then you can
place
   #    all the data in an associative array (a useful little perl fea-
ture!):

Relative Advantages of GET and POST

The GET and POST methods for handling FORM input have different strengths and weaknesses. POST is clearly superior if you are sending large quantities of data to the server or data encoded in character sets other than ISO Latin-1. If you are sending small quantities of data, and only ISO Latin-1 characters, the choice is less clear. One useful criterion is to ask if you want the user to be able to store (“bookmark”) a URL that will return the user to this particular resource. If the answer is yes, then you must use the GET method, since the relevant data will be placed in the query string portion of a URL, which is stored when a URL is recorded. If, on the other hand, you do not want the user to be able to quickly return to this resource or you want to hide the FORM content as much as possible, you should use POST.

HTML Encoding of Text Within a FORM

With gateway programs, you often need to place data inside the FORM sent to the client—this might be initial field values assigned to the VALUE attributes of INPUT or OPTION elements or within the body of a TEXTAREA element, or it might be state information (information describing the state of the interaction between the user and the server-side application) preserved within the VALUE attributes of TYPE=“hidden” INPUT elements. However, in doing so, you must remember that the text received by the client will be parsed. This means that any entity or character references embedded in the VALUE (or NAME) strings or within the body of a TEXTAREA element will be automatically expanded into the correct ISO Latin-1 characters. For example, if a document sent to a client contains the hidden element

<INPUT TYPE=“hidden” NAME=“stuff” VALUE=“&lt;BOO&quot;&gt;”>

the client will parse the VALUE string and convert it into the string <BOO”>. When the FORM containing this hidden element is submitted, the string <BOO”> will be URL-encoded and sent to the server, so that the entity references in the original data are lost.

This is sensible if you recall that, as far as the browser is concerned, entity references and character references are no different from the characters they represent. This can be a problem, however, if the data within the hidden form contains HTML markup, since you often need to preserve entity references distinct from the characters they represent; for example, so that simple character strings (<tag>) do not get converted into markup tags (<tag>) by the conversion process. Thus, if you need to preserve entity references, you must do the following encodings of the string prior to placing it within a VALUE or NAME attribute or inside a TEXTAREA element:

1. encode all ampersand characters in the text string as &

2. encode all double quotation symbols as "

3. encode all right angle brackets as >

The second and third steps are necessary, as any raw double quote characters (“) will prematurely terminate a VALUE or NAME string, while some browsers mistakenly use an unencoded greater-than symbol (>) to prematurely end INPUT elements. The first step encodes the leading character of each entity or character reference: For example, the original string é becomes &eacute;. This is processed by the client browser back to the string é, which brings you full circle when the data are returned to the server.

State Preservation in CGI Transactions

In a complex gateway application, a complete session may require a series of interactions between the client and server. Since the HTTP protocol is stateless, the server—and any gateway program on the server—retain no knowledge of any previous transaction. Thus you, the gateway program designer, must build in mechanisms for keeping track of what happened in any previous stage. There are two strategies for doing this. The traditional way is to use TYPE=“hidden” INPUT elements within HTML forms, to pass state information back and forth between client and server. A second, newer method is to use Netscape cookies to store state information on the client.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.