Chapter 13 What to Know About Live Communication

Web Chat
The Web Chat Environment
The PlainChat Application
Limitations of PlainChat
Problems Solved by Web Chat Applications
Modifying the Web Chat Application

It seems that live communication on the Web is becoming more and more common. Dates and times for "live" chat events are often listed in newspapers. Increasingly, more companies are turning to the Web as a channel to advertise their product. Even today, browsing the result-pages of Yahoo! or Lycos (popular Search engines on the Web), you see several dozen references to live Web chat applications and events. In Figure 13.1, Searching for "Live Chat" through Yahoo! (http://yahoo.com), comes up with 77 entries.

Figure 13.1: Search results from Yahoo! when looking for references to "Live Chat." Right away, there is a reference to an upcoming chat session with members of Congress.

While not all live communication on the Web is as high profile as a chat session with members of Congress, large and small companies alike utilize Web chat as a means for live communication on the Web.

Web Chat

While Web chat might be one of the more superfluous attractions to a Web site, users really love this type of communication. Users on the Internet love to talk to one another using e-mail, UseNet, IRC, or MUD. Once a user has already been introduced to a particular kind of communication medium on the Internet, they tend to think that all media basically work the same. The fact is, there are several different ways to communicate over the Internet these days. MUD and IRC are more real-time environments for communicating; UseNet and e-mail are slower. UseNet, in particular, is special because users can "lurk" for years on a newsgroup without ever posting anything, but they can read every article and correspond in private via e-mail with people who post interesting messages or have questions.

Web chat is a relatively new form of communication on the Internet. Web chat is used to describe the type of interface and environment experienced when chatting on the World Wide Web. Chatting isn't even a good word to describe the events that take place on Web chat. More appropriate terms are interact, entertain, inform, and gossip. Web chat is primarily considered a new type of communication medium, but when we look closer at how it works and the environments created by Web chat, we see that it is a mix of UseNet and the more interactive areas like MUD and IRC. On the surface, Web chat looks like all the other types of Internet communication tools. Still, there are some issues we need to deal with in order to implement applications to give users a place to communicate on the Web.

I was starting to learn about TCP/IP when my mentor described the protocol unlike "plugging in your TV to your VCR." When it comes to implementing Web chat applications, the same holds true. There are some obvious programming techniques used to implement the applications, but understanding the process of a Web chat will hopefully spur the reader to investigate other options.

Web chat has some particular limitations compared to very real-time systems such as MUD and IRC. No matter how contrived the Web chat systems are, users still flock to them. As the technology advances almost daily, we can be sure to find new kinds of "chat" environments that will converge to simulate real-time interaction.

As we start to look at the Web chat applications used on the World Wide Web, we will refer to these different terms: applications, environments and systems.

Web chat applications refer to the suite of software that controls and supports the Web chat experience. The experience, or the session of being "in" the Web chat is going to be referred to as the Web chat environment. Users are participating in a Web chat if the pages they see in their browsers relate to the activity occurring in the Web chat. The Web chat system refers to the interaction among the browser, CGI programs, Web server, and the Internet. For each message "posted" by a user in a Web chat environment, there are several dozen interactions happening. All of these interactions and tasks performed by the application are referred to as the Web chat system.

Note

Another way to look at it is this: The Web chat application is the software the Webmaster writes. The Web chat environment is defined as the experience and "virtual position" of a user when browsing the World Wide Web. If users are communicating over the Internet using a Web chat application, that's called being in a Web chat environment. Lastly, the hidden effects of posting, reading and all the "magic" that happens behind the scenes is defined as the Web chat system.

In the next three chapters, we hope to explain enough about Web Chat applications, environments and systems so that any Webmasters (experienced or novice) can create their own Web chat.

The Web Chat Environment

Web chat environments are places on the Web site that allow people to come together and talk about any range of topics. Web chat environments are characterized by a feeling that the user visiting the site is either a guest or an active participant of the Web chat environment. It is important to remember that distinction when developing the theme of the chat environment.

Make sure the Web chat environment theme is clearly noted. If not, how does your Web chat environment intend to instruct the user to behave in the chat environment? These questions are important even if you don't really care about the kind of conduct in the Web chat environment because users want to know when it's appropriate to respond a certain way. Also, if multiple chat environments exist within the same Web area, the theme of the environment helps to guide the user through the Web chat.

On a more technical note, the Web chat environment is made up of HTML pages. Customarily, these are generated dynamically by the Web chat application software. As part of the process for creating the whole Web chat environment, consideration for page layout is just as important as deciding on the theme of the Web chat environment.

Establishing the Theme of Your Web Chat Environment

Successful Web chat environments have a very specific theme. The topic of the chat environment must be specific to help induce relevant discussion. The noise to signal ratio within most Internet communication arenas (like UseNet) has risen, and the number of split-up groups increased to match the diversity of topics and discussions made there. For some, this is a good thing to happen. It's easier to find a thread of discussion on UseNet when there are specific groups (even within a main group). For a Web chat environment, the topic of discussion appropriate for that environment must be made plainly obvious to the user who is about to enter. If the theme of the Web chat environment is not clear, then some users may feel lost and confused for a while until they realize what the topic is. You want users to be very welcome in Web chat environments. It helps a site gain notoriety when their Web chat environments are popular; they become popular when people visit them in droves. So, try to maintain a consistent theme for your Web chat environments.

The theme of the Web chat environment is the descriptive element of the environment that tells:

Who will be invited to participate in the environment
When the event will be open/available
The special considerations that are being made for participants? (Will face-graphics, sound-bite audio snippits, and VRML clips be needed?)

An example of a very focused Web chat environment theme was the two-year birthday party chat event for Free Range Media. On April 18th, 1996, the staff of Free Range Media set up a celebrity Web chat environment where the president of Free Range fielded questions from the public, and there was a discussion with several key people in the online media industry. Figure 13.2 is a demo of what this environment looks like.

Figure 13.2: The two-year anniversary party for Free Range Media included a live chat session between Andrew Fry and various clients and industry leaders.

Pages in a Web Chat Environment

Web chat environments generally consist of two pages. The first page is a login page; the second is the chat dialog itself, or the transcript page.

The login page identifies users so their messages can be associated with a real person, which distinguishes them from other messages that are posted. The login page asks for a user's name, handle, or some textual piece of information to label his messages. The user is also asked to choose a graphic icon to go along with his name. Once a user has specified his identity, then the next Web page he sees is the transcript page, or dialog page for the chat environment.

The transcript page consists of a log of all the messages sorted by when each message was added. A text box is also visible for users to type in new messages. The text box contains at least two kinds of buttons-a Transmit button for sending the message and a Refresh button for allowing the user to reload the page. For a low-volume site, all of this can be accomplished by implementing a single CGI script.

The PlainChat Application

The PlainChat application is a simple approach to create a Web chat environment. The PlainChat application doesn't require any extensive back-end support (no special databases, strange URLs, or graphics). PlainChat allows users to communicate at near real-time speed using a Web browser. The limitations of PlainChat are that it uses a plain text file to store the messages. This will pose problems down the road as activity in the chat environment increases. In this section, the PlainChat application and a hybrid called BusyChat are introduced to handle the basic limitations of PlainChat.

PlainChat primarily supports a Web chat environment for a low-volume site. A low-volume site is one where the activity on the site is moderate enough to provide reasonably good performance for page loading and interactivity. For a low-volume site, the very basic PlainChat environment will suffice. Knowledge of site usage plays a significant role in determining whether a site is moderately or heavily used. Determining a high or low rating can be done by analyzing the usage logs of the Web server.

Components of a PlainChat Application

The PlainChat application consists of several components. The login page for the Web chat environment tells the user who is entering the Web chat environment. The second component of PlainChat is the transcript page. The transcript page is where all the messages are displayed. The transcript log (file) is where all the messages are stored. Storage and display are separate issues for the PlainChat application. Further, in the case of PlainChat (but not all Web chat environments), the transcript page is where message inputs from the user are accepted. The transcript page contains the HTML form and input areas to allow the user to contribute to the chat environment. The flow of the chat session resides mostly within the cycle of reloading or re-creating the transcript page. The login page is just the front end to the Web chat environment.

The Login Page

Listing 13.1 shows a sample HTML page to serve as the login page for the chat environment. The transcript page is generated dynamically, so the HTML listing of how the transcript page is not as important as the CGI script, which generates the transcript page.

Listing 13.1 plain-login.html-Serves as a Login Page for the Chat Environment

<HTML>
<TITLE>Login Page for PlainChat</TITLE>
<BODY BGCOLOR=FFFFFF>
<H1>Login Page for PlainChat</H1>
<P>
Please enter your name below:<BR>
(Example: Pat Smith)<BR>
<FORM METHOD=POST ACTION="/cgi-bin/plainchat.cgi">

<INPUT NAME="theUsername"><BR>

Please enter your E-mail address below:<BR>
(Example: pat@some.place.com)<BR>
<INPUT NAME="theEmail">
<P>

<INPUT TYPE=SUBMIT VALUE="Enter PlainChat">
</FORM>

</BODY>
</HTML>

The following is the CGI script:

<HTML>
<TITLE>Login Page for PlainChat</TITLE> <BODY BGCOLOR=FFFFFF>
<H1>Login Page for PlainChat</H1>
<P>
Please enter your name below:<BR>
(Example: Pat Smith)<BR>
<FORM METHOD=POST ACTION="/cgi-bin/plainchat.cgi">

<INPUT NAME="theUsername"><BR>

Please enter your E-mail address below:<BR> (Example: pat@some.place.com)<BR>
<INPUT NAME="theEmail">
<P>

<INPUT TYPE=SUBMIT VALUE="Enter PlainChat"> </FORM>

</BODY>
</HTML>

The PlainChat application begins with a login page like the one in Figure 13.3. The HTML form is the first step of the PlainChat application.

Figure 13.3: The login page for the PlainChat application

This form asks for the user's name and e-mail address, see Figure 13.4. The CGI script uses the e-mail address to construct a mailto HTML tag in the transcript for the user's name. If you ask for other information from the user on the login page, you will incorporate that into the style of the Transcript page. For example, you can ask the user to choose a particular icon to represent his mood for the chat environment. That icon along with his name (and other information) will then appear as the herald for each message.

Figure 13.4: The transcript page of the PlainChat application. Messages are listed sequentially. A text area lets you enter a new message.

The HTML form starts the Web chat system by referring to a CGI script to execute when the user is ready to enter the Web chat environment.

PlainChat uses a plain text file to store the messages. The basis of the application is the transcript file. It is a plain text file created or appended as necessary by the PlainChat application (plain.cgi). Organize the structure of the data appended so that the user can distinguish between individual messages. PlainChat adds each message to the transcript file in HTML. In other words, the transcript file is not processed and converted into HTML for display by the CGI script. The transcript file is, in fact, an HTML file, although users cannot load it directly because it's kept in a place outside of the document root.

For example, the document root of the Web site might be

/var/web/default/htdocs

and the location of the transcript log file might be

/var/web/default/data/plain.dat

Storing the transcript log file this way makes it impossible for a user to "goto" the transcript log file directly.

The Transcript Page

The ACTION of this form is plain.cgi. The basic function of this CGI script is to generate the HTML for the transcript page. The transcript page is where the user sees all the messages that have been posted so far. They are listed in order with newer messages on top and the oldest messages on the bottom. The text area for users to enter new messages also appears on this form.

Note

It is a conscious decision to place the text area above where the newer messages appear. It is likely that users who want to respond or start a new thread will want to see the most relevant messages nearby as they type their message. Also, the user will want to see the text area first and not be forced to scroll down the page to see it. That is why the transcript page is configured with the text area on top, and the messages listed in "reverse" order with the newest messages on the top and the older ones on the bottom.

The transcript page is completely generated via the CGI script. The CGI script is concerned with generating HTML and is also responsible for managing the text file that stores the messages.

Once the PlainChat login page is installed, the CGI script needs to be fleshed out with the correct code to handle the inputs from the login page. The CGI script implementing PlainChat handles input from two different sources. The first is the HTML form data passed by the login page. The user's name and his e-mail address make up one source of data for the CGI script. The CGI script doesn't create the login page. Refer to Figure 13.4 to see the important data that appears in the transcript page. The other case is where the CGI script receives input from the transcript page. Unlike the login page, the transcript page is an HTML form that is generated by the PlainChat application software (plain.cgi).

To be more specific, there are a few ways to look at the job of the PlainChat CGI script. It must be written to decide what to do depending on what kind of data it receives. If the user comes from the login page, then the job of the script is to display the transcript page to the user. This step is a springboard to the Web chat environment. The first page is the login page, but the CGI script brings the user to where things are happening-the transcript page.

If the user is already in the transcript page, then this is the other situation. Here, the user is already viewing the transcript page or is presented with the transcript page after leaving the login page. The transcript page is much different than the login page because the user isn't required to enter any more new information to enjoy the chat environment. If the user is already viewing a page generated by plain.cgi, there are three possible states the user can be in.

State 1: Viewing the Login Page

The user presses the Transmit button, thus causing the script to run again to process data. If the user presses the Transmit button, the data to be processed is a possible message typed by the user in the text box (see Fig. 13.5).

Figure 13.5: This is a diagram of data flow from the Login page to the page generated by the CGI script (the transcript page). The data passed to the "transcript page" is the name of the user. Additional information can be passed depending on the style of the Web chat environment (like the use of icons or sound to identify users once they enter the transcript page cycle of the chat environment.

State 2: The Transcript Page Cycle-A New Message

If the user is already viewing a page generated by plain.cgi, the second outcome occurs if the user presses the Reload button. The Reload button on most browsers (in the toolbar) will "repost" the form data to the script. In other words, reloading a URL that is a CGI script makes the browser (client software) prompt the user if they wish to "repost form data."

Web chat users don't usually repost form data because they don't want to appear repetitive. For a Web chat environment, it's important not to continually remind the user that this is still a Web page; the messages from browsers about "reposting form data" can be annoying.

To alleviate this situation, a Refresh button in the form is used. Figure 13.6 shows a diagram of the flow of data between the transcript page and itself when a user refreshes the page. If the Refresh button is pressed (see Fig. 13.7), it causes the HTML form to let the CGI script know that all the user wants is to reload the page without being asked if he wants to re-post form data. The net effect is no effect. The page appears to reload, but nothing is added to the transcript log file. Refreshing the page is allowed by the HTML form so the user can check up to see if anything new has been posted. A better solution is to put a Refresh button in the HTML form so the user doesn't have to re-post form data.

Figure 13.6: The transcript page is where the user enters new messages or refreshes the page to view new messages.

Figure 13.7: The data generated from the HTML from generated by the CGI scripts is passed to the CGI script again (and again) as the transcript page cycle proceeds.

State 3: Exiting the Chat Environment

On the same topic of refreshing and reloading, a user must be able to exit any particular environment he is in. The Web is about pages being linked, and it is inappropriate to force the user to leave the Web chat environment by adjusting his URL window, without giving him a quick and easy link or toolbar to click on.

Consideration for flow within a site includes flow of pages dynamically generated. The source of the pages may be unlike static pages (created dynamically versus loading them from a disk), but it is important to build an exit-link from the Web chat into the transcript page.

There should always be an "out" for any Web system (like a Web chat), so the third possible outcome from the transcript page is that the user wants to leave for somewhere else in the site. Any link, toolbar, or button on the transcript page that takes the user away from the Web chat will essentially disconnect the user from the Web chat environment temporarily. This is considered a state because it refers to the condition of not being in the chat environment. The user may have saved the URL to the Web chat environment, But for PlainChat nothing can be done to tell if the user is still there. That kind of check-in/check-out feature is part of a more advanced Web chat environment called SuperChat (covered in Chapter 15, "Performance Tradeoffs: Keeping Chat Messages in Memory").

Page Generation

There are only two distinct pages that make up the PlainChat environment. We've seen the login page where the user identifies themselves. We've also looked at the transcript page, where the messages are listed in order and where the user can type in new messages. There are no other pages in the PlainChat environment. The CGI program, plain.cgi, in fact only generates one kind of page: a transcript page. The login page for PlainChat is assumed to be static. In later sections, we will show how you can make a version of PlainChat display both a Login page and the transcript page from the same CGI script.

Performance isn't an issue when using the same CGI script to generate the Login page and transcript page. There are just instances in a Web site where it's more appropriate to generate a page dynamically and not use a static page. If the Web chat area is deeply embedded in a larger scheme of dynamically generated pages, it can be devastating to use a static page. Information is a lot harder to pass from page to page when there is a static page in the way of the "chain" of dynamic pages.

It can be helpful to view the possible outcomes in the PlainChat environment as a flow diagram (see Fig. 13.8). This will help decide what data is necessary for the PlainChat CGI script.

Figure 13.8: A flowchart showing each state the user can be in. The user starts at the Login page and gets to the Transcript page as part of phase 1. Phase 2 is the continuous cycle of pages generated by the CGI script when the user either refreshes pages or sends new messages. State 3 is considered "not in the chat environment."

From the user's point of view, the transcript page is where the user can contribute to the chat environment. The transcript page has a text box for entering a new message, a couple of buttons to either transmit or refresh, an exit-link out of the chat environment, and a formatted listing of the messages posted up to that point. There exists an HTML form on both the login page and the transcript page. The login page uses the HTML form to pass the name and extra information to the CGI script that runs the chat environment. The HTML form in the transcript page is used for more than passing "user" information secretly; it makes the next iteration of the transcript page equipped to assign user information for the next new messages transmitted.

Generating the Transcript Page

Just how is the transcript page created? As we've seen in earlier chapters, the whole point of CGI scripts is to generate pages. Most often these pages are HTML. We can use HTML to construct forms. Forms allow us to ask the user for input and to hide data within the form. The "submission" of data from HTML forms to CGI programs completes the cycle.

The CGI script that drives the "chain of events" for our PlainChat example is shown in Listing 13.2.

Listing 13.2 plainchat.cgi-CGI Script That Generates a Transcript Page

#!/usr/local/bin/perl

@INC = ('../lib', @INC);

require 'web.pl';

%Form = &getStdin;

&beginHTML;

$LOG_FILE = "$ServerRoot/data/plainchat.dat";

$theUsername = $Form{'theUsername'};
$theMessage  = &cleanMessage($Form{'theMessage'});
$theEmail    = $Form{'theEmail'};

&appendMessage($theUsername, $theEmail, $theMessage) if 
( $theMessage);

&buildForm('theUsername', $theUsername,
           'theEmail',    $theEmail);

&displayTranscript();

exit(0);


sub cleanMessage {

   local($theData) = $_[0];

   # any filtering, or censoring code goes here
   return $theData;
}

sub buildForm {

  local(%hiddenData) = @_;

  print "<body bgcolor=ffffff>\n", 
        "<H1>Transcript Page for PlainChat</H1>\n",
        "<FORM METHOD=\"POST\" ACTION=\"/cgi-bin/plainchat.cgi\">\n",
        "Enter your comments below:<BR>\n",
        "<TEXTAREA ROWS=5 COLS=60 NAME=\"theMessage\">\n",
        "</TEXTAREA>\n",
        "<P>\n",
        "<INPUT TYPE=\"SUBMIT\" VALUE=\"Transmit\">\n",
        "<INPUT TYPE=\"SUBMIT\" VALUE=\"Refresh\"><BR>\n",
        "<p>\n";

  foreach $dataItem (keys %hiddenData) {
     print "<INPUT TYPE=\"HIDDEN\" NAME=\"$dataItem\" ",
           "VALUE=\"$hiddenData{$dataItem}\">\n";
  }
  print "<INPUT NAME=\"inChat\" TYPE=\"HIDDEN\">\n",
        "</FORM>\n",
        "<P>\n",
        "<A HREF=\"/index.html\">Back Home</A><P>\n";
}
   


sub displayTranscript {

    print "Current Transcript:<BR>\n";
    open(TRANSCRIPT, "< $LOG_FILE") || &fail("Cannot open Transcript:
    $!");
    print while(<TRANSCRIPT>); 
    close(TRANSCRIPT);
    
}

sub appendMessage {

    local($theUser, $theEmail, $theData) = @_;

    open(TRANSCRIPT, "< $LOG_FILE");
    @Lines = <TRANSCRIPT>;
    close(TRANSCRIPT);

    open(TRANSCRIPT, "> $LOG_FILE");
    print TRANSCRIPT "\n",
                     "On ", &today, 
                     " <A HREF=\"mailto:$theEmail\">$theUser</A>",
                     " said:<BR>\n",
                     $theData,
                     "<P>\n";
    print TRANSCRIPT @Lines;
    close(TRANSCRIPT);
}


sub today {

    local(@theDate) = localtime(time);

    return sprintf("%d/%d/%d, %02d:%02d:%02d %s",
                   $theDate[4], $theDate[3], $theDate[5],
                   $theDate[2]>12?$theDate[2]-12:$theDate[2]==0?12:$theDate[2],
                   $theDate[1], $theDate[0], 
                   $theDate[2]>12?"pm":"am");
}

The chat cycle is the chain of events from the moment the user submits a new message. The CGI script receives several variables of input from the HTML form in Listing 13.2: the body of the message, the hidden types that store the user's name and other data (like paths/URLs to graphic icons). The state of the chat itself is also a variable. If the CGI receives a message body, it knows that the preceding page was generated by itself. If the only data received by the CGI script is a user's name or personal graphic icon, then the CGI knows the preceding page was a login/setup page. The knowledge of where the data came from helps the CGI script determine what output to generate to continue the cycle, regardless of the state. The CGI script is written to preserve the information of the Web chat participants.

Layout of the Transcript Page

The layout of the Transcript page is usually a result of what kind of Web chat system is implemented (see Fig. 13.9). At a minimum, the transcript page has an area for the user to type comments, some buttons for triggering that message to be sent, and a body of text that makes up the transcript.

Figure 13.9: The PlainChat transcript page contains several pieces of data per message. The data and time of the message, the name of the person who posted the message, and the message text itself.

Invariably, the body of the transcript page will scroll down a few pages. PlainChat displays the input text area first, then buttons for sending messages or refreshing the transcript page, followed by the body of the transcript log.

PlainChat uses this format so that the user can see the last few messages and the text area to comment in one screen. An important layout consideration is how easy it is to use. By making sure that the most recent messages are on the same "visible" window, the user doesn't have to go through a lot of trouble following the thread. Even though the transcript might carry on for pages (with respect to the size of the browser window), the user will be assured that all the current messages are visible without having to go find them.

If there are any exit-links on the transcript page, they should be near the other buttons that control submitting a message. Exit-links are links that get the user out of the Web chat environment.

The transcript page is integral to the CGI script. The CGI script creates the transcript page, so all the formatting and structure of the transcript page is embedded in the CGI script.

Passing Data

The HTML form of the transcript page has several parts to it. The most visible usage of HTML forms on the transcript page is the text box and buttons for accepting new messages:

<textarea  rows=10  cols=40 name="theMessagebody">
</textarea>
<input type="submit" value="Transmit">
<input type="submit" value="Reload" name="reload">

Nowhere on the transcript page do you ask the user to re-identify himself. Use a special input type in the form to "pass" that information:

<input type="hidden" name="theUsername" value="$form{'theUsername'}">
<input type="hidden" name="theGender" value="$form{'theGender'}">

The transition from page to page on a Web site raises an interesting characteristic of the HTTP protocol. The only relationship between pages is the data in the URL and the data hidden or passed from HTML forms or links. For example, while implementing the PlainChat environment, it's necessary to "remember" the user's name. The method used to remember this information for the simple chat is to store it as a hidden type in the form on the transcript page.

In the preceding source listing, we have cut out all the "extras" and focused on the required data for the Web chat to function. The login form passes the user's name:

<input type="text" name="theUsername">

Once the CGI script is generating the transcript page, two other variables are being set by the user. One directly set by the user is the message itself:

<input type="text" name"theMessage">

The second variable set by the user (indirectly) is a flag variable to indicate you have been "through the transcript page sequence":

<input type="hidden" name="theMessage">

The term indirect doesn't mean anything special other than the user didn't set the value, but because it's in the transcript page and submitting a new message, it's set by default.

The Transcript Log File

The transcript log of the PlainChat environment is the only place where data about the transcript is stored. The CGI script has no memory of what has happened in the chat. The Web server does not maintain information about who said what in the chat transcript. This kind of chat environment is called "plain" because it uses a raw text file to store the messages and also because the mechanics of the PlainChat application are rudimentary.

The transcript log of PlainChat is stored in a plain text file. A user can read the same transcript file with an editor outside the scope of the Web server and be able to follow the thread of the dialog. As a Webmaster, it might be necessary to clean up foul language or trim the size of the log file. By default, PlainChat uses no special tricks to save the transcript of the chat environment, it just writes it all to a plain file.

Using a plain text file causes a few problems though. One problem is that the file is updated by the CGI script, which sends its open, read, and write calls using the file system. The Web is an asynchronous system. Users are coming to sites at random times. For the PlainChat application, the chances of failure due to race conditions is greater because of the possibility that two instances of the CGI script manipulating the file will clobber each other's effect.

The tradeoff for using a plain text file is that it can be read, edited, and managed by anyone. The limitation of using a plain text file with no special formatting is that features of the operating system and programming language, which implement the PlainChat application, are not utilized to streamline the process of updating the transcript log file. However, we are illustrating the concept of Web chat with plain chat so other versions of plain chat can be better understood.

In a real sense, the PlainChat application will fail to operate well if the usage of the chat environment gets too busy. The function in the CGI script that reads the transcript log is similar to the function that appends a new message to the transcript log. The weak links are the functions reading the transcript log file. The race condition mentioned earlier occurs when one instance of the CGI script attempts to write back a new transcript log file that is inconsistent with another instance of the CGI script adding a new message.

Limitations of PlainChat

PlainChat is limited because it cannot handle very many simultaneous postings. The action of posting new messages is asynchronous; they are happening randomly all the time. Using the file system as both a means to store the messages and to facilitate the storage of messages introduces file access conflicts.

File-Access Conflicts

If two users are using the Web chat, one user might post a message at the "same" time as another user. When the first message is posted, the CGI script will read in the current transcript log into memory and append the new message to the temporary copy of the current transcript. The result is then rewritten back to file when the CGI is finished. The "second" user causes the same chain of events to happen, too. The problem is caused when the data is read by the first user.

Consider the "instance" of the first user submitting a new message, which causes the CGI to read in what it thinks is the current transcript log. Before this instance is finished running (before it flushes the transcript log back to disk with a new message), a second instance is started. The second user causes the same transcript to be read in. The second instance of the CGI script thinks it too has a current copy of the transcript log. Therefore, the second instance appends a new message and then rewrites it all back to file.

What if the first instance doesn't finish before the second? This is possible in a true multitasking operating system. If the order of instances changes like this, then the true version of the transcript log is corrupted. One (or more) of the new messages posted might not be appended.

If the operating system can be used to help control this, then there still might be problems. For example, if the system attempts to use file locking, then the instances of CGI scripts running might be artificially queued up for sequential access to the transcript file. This would help alleviate version mismatch, but at what cost? The users in the Web chat will experience unusually higher lag time between posting to allow all users to get their new messages "accepted and recorded" in order.

Although, these two examples show an extreme case of problems with PlainChat, the reality is that it doesn't come into play unless the system is heavily loaded. So, for the moderate or low-volume site, PlainChat can still provide a robust chat experience. The point is that there are limitations to PlainChat.

Time Conflicts

For example, PlainChat shows the basic mechanics of a Web chat environment that are not realistic. Aside from the system limitations, there are some usability problems with PlainChat as well. These other limitations have nothing to do with the file-access problem; they relate to what it is like to actually participate in a Web chat environment.

Consider the scenario that users A and B are both "in the chat." They are reading the current transcript, making new comments to what messages they see. How does this actually work?

Let's say user A comes online and notices there is nothing in the transcript log because the chat has just started. But, user A is timid and just wants to see what happens, so user A waits. User B comes along and likewise sees that the chat log is empty, but user B is not apprehensive about using Web chat so he posts a question, "Is anyone out there?"

As a result of user B posting a new message, the CGI script processes the new message and then redisplays a new transcript page with the text box, the buttons and the transcript log, containing only user B's new message. User A is still timid, but grows impatient and presses the Reload button in the chat to see what's going on.

User A must use the Reload button to see if anything has changed because the transcript page is treated as a static page. The CGI script that created the transcript page only uses plain vanilla HTML to build the "new instance" of the transcript page. No server push, client pull, animation, and so on.

This need to press the Reload button to see what's going on is what makes the Web chat environment very different than more real-time environments like IRC and MUD. In IRC and MUD when you do anything, the response is immediate: the same channel or room knows you said something as soon as you say it.

In the Web chat environment implemented with PlainChat, this is not the case. Users A and B are now in a busy waiting mode. Both of them must reload the page to see if anything has been said worthwhile to comment on. It's a tedious operation to constantly reload the page. It's also ironic. The effect, such as pressing the Reload button, raises the load average more when reloading is happening more often. As more users join a PlainChat environment, they will probably start reloading the pages. More users reloading pages leads to an increased load on the machine, which forces the system to get closer to that "red zone" where messages may get dropped because of the file-access problem.

Plus, in this time period of network access and the hubbub over real-time/virtual-reality, users will want more than PlainChat to justify being connected to the Internet. Web communication needs to be fast and it has to simulate real conversation.

Speed Is Better Than Good Looks

PlainChat is only one version of a Web chat environment. It serves the purpose of creating an online forum for people to ask questions and talk. It does perform well for what it is. But, concerning the issue of speed, it probably will not be a system a high-volume site would use. To address the issue of truly simulating conversation, it fails to meet some basic needs. When you talk to a friend, you do not (we hope) continually tap your friend on the shoulder to make sure they are still there and they are able to talk. That is essentially what goes on with PlainChat when you constantly refresh the page.

However, these limitations can be overcome. There are other kinds of chat systems that utilize other techniques of preserving consistent and reliable access to the "transcript log." We will cover the use of Server Push and memory-based chat in the next chapters. These new tools give new dimension to the Web chat environment by making it more real-time.

Problems Solved by Web Chat Applications

Web chats are a popular media for new users of the Internet to engage in conversations with other Web users. Web chat environments don't require special software. They already have the tool they need to enjoy the Web chat environment-the Web browser. The Web chat environment solves the problem of "How can I provide a simple and almost real-time communication environment using the Web interface?"

Some sites are large enough so they can offer special "chat rooms," a directory of different chat environments all served by the same Web site. If a site is sufficiently equipped to handle the load, it's possible to set up several Web chat environments and allow users to mingle with a set of users, depending on the topics they are interested in.

On specialized sites, the Web chat can provide a premium service to its user base. For instance, a site might not offer a Web chat environment 24 hours a day, but it may on occasion (in conjunction with special marketing and announcements) give the users a chance to "talk to the pros" on just about any topic.

For example, during the Olympic games, a site covering the events might offer a chat environment for each major venue, allowing an expert in the field of gymnastics, running, weightlifting, and so on, to take questions from the users. The Web chat environment cannot realistically replace the event itself. It would be foolish to think that thousands of users would rather chat up a storm on the Web about an event rather than watching it or listening to it occur in real-time. But a special event where the winners of the competition join the host of the chat would be a great feature for the Web chat.

The Web chat is also a solution for other kinds of events. For example, a radio talk show like Larry King is a popular program on CNN. An application for a Web chat could be to offer listeners a chance to get a wider view of the topic being discussed by having all the calls and comments posted up on the Web chat as they are happening. To make the environment even more interactive, there could be two separate chat environments running: one to just take questions and the other to relay the responses (and questions) by the guest. This would allow the potential callers a chance to see what people are trying to ask and avoid duplicating question requests.

The point is that the Web chat environment solves the problem of needing an almost real-time environment for people to get a "running score" on any sort of event or group discussion.

It's a simple addition to a Web site that will attract attention. It is the unpredictability of what exactly will happen on the Web chat that makes Web chat environments so popular.

Modifying the Web Chat Application

Throughout the chapter we touched on areas of the Web chat application software that could easily be modified to meet any specific need of the Webmaster. In this section, we dissect the program application software showing "entry" points where such modifications can take place.

The types of Web chat applications discussed so far deal with Web chat environments for low- and high-volume sites. Other types of Web chat applications will be introduced in the following chapters that deal with graphics, higher speed, and better performance in terms of handling more frequent additions to the transcript page.

High-volume Web chat systems are those that expect to handle thousands of users. It is impractical to expect a single topic Web chat environment to deal with more than that. In fact, the high-volume Web chat system is usually deployed to handle events, such as a celebrity who will be "taking questions from the audience" over the Internet. Requirements of a high-volume Web chat system involve both enhancements to the PlainChat application and additional features to deal with the massive questions and comments made by the users.

There are a couple ways to deal with the massive quantity of messages generated via the high-volume Web chat system. The Web browser will only handle up to six or seven pages of information. Even after four pages, the chances are slim that a user can both keep up with the conversation and read all the material posted. Archiving the messages lets the user go back and read the messages more carefully, especially when the topic of the Web chat environment is a celebrity guest. The Web chat system should make provisions for the retrieval of all content posted.

Another way to manage the quantity of messages created in a high-volume Web chat is to split the messages on the basis of time. Like in a PlainChat, the first 10 or 15 messages can be displayed in about two or three screenfuls. After that, a hypertext link, as seen in Figure 13.10, can take the user to another page (generated by a CGI script) to read older messages. As more messages are posted, the older messages are no longer visible on the main transcript log page. A user can get to them by selecting to go back to earlier messages.

Figure 13.10: The BusyChat transcript page adds a new element to the transcript page generation routine. The addition is the new link(s) to take the user ahead or back in the bulk of messages posted to the "busy" chat environment.

Using BusyChat for High-Volume Sites

The main feature of BusyChat is its capability to handle hundreds of messages. It still operates under the same conditions as PlainChat in terms of file-access issues and the mechanics of conducting the chat itself. The enhancement is really a function of how organized the messages themselves can be.

BusyChat is a hybrid of PlainChat that works the same as PlainChat, except it's designed to handle higher volume of messages. The main difference is the display routine for the transcript page. BusyChat adds (optionally) a link at the start of each message list for going "ahead" or "back" to other messages posted to the BusyChat environment so that the user isn't flooded with a single page of all messages. The HTML file for the login page to the BusyChat application in Listing 13.3.

Listing 13.3 busy-login.html-HTML File for the BusyChat Application

<HTML>
<TITLE>Login Page for BusyChat</TITLE>
<BODY BGCOLOR=FFFFFF>
<H1>Login Page for BusyChat</H1>

<P>
Please enter your name below:<BR>
(Example: Pat Smith)<BR>
<FORM METHOD=POST ACTION="/cgi-bin/BusyChat.cgi">

<INPUT NAME="theUsername"><BR>

Please enter your E-mail address below:<BR>
(Example: pat@some.place.com)<BR>
<INPUT NAME="theEmail">
<P>

How many messages can you handle at once on the screen?<BR>
<INPUT NAME="theFLine" TYPE="radio" VALUE=5> 5
<INPUT NAME="theFLine" TYPE="radio" VALUE=10 CHECKED> 10
<INPUT NAME="theFLine" TYPE="radio" VALUE=15> 15
<INPUT NAME="theFLine" TYPE="radio" VALUE=25> 25
<p>

<INPUT TYPE=SUBMIT VALUE="Enter BusyChat">
</FORM>

</BODY>
</HTML>

The CGI script (the application) for BusyChat is shown in Listing 13.4.

Listing 13.4 busychat.cgi-CGI Script for BusyChat Application

#!/usr/local/bin/perl

@INC = ('../lib', @INC);
require 'web.pl';

%Form = &getStdin;

&beginHTML;

$LOG_FILE = "$ServerRoot/data/busychat.dat";
$Me       = "/cgi-bin/BusyChat.cgi";

$theUsername = $Form{'theUsername'};
$theEmail    = $Form{'theEmail'};
$theFLine    = $Form{'theFLine'};
$theMessage  = &cleanMessage($Form{'theMessage'});


&appendMessage($theUsername, $theEmail, $theMessage) if ($theMessage);
   

&buildForm('theUsername', $theUsername,
           'theEmail',    $theEmail,
           'theFLine',    $theFLine);

&displayTranscript($Form{'sLine'}, $theFLine);

exit;

sub cleanMessage {

   local($theData) = $_[0];

   # any filtering, or censoring code goes here
  
   return $theData;
}

sub buildForm {

  local(%hiddenData) = @_;

  print "<body bgcolor=ffffff>\n",
        "<FORM METHOD=\"POST\" ACTION=\"$Me\">\n",
        "<h1>Chapter 13, BusyChat</h1>\n",
        "Enter your comments below:<BR>\n",
        "<TEXTAREA ROWS=5 COLS=60 NAME=\"theMessage\">\n",
        "</TEXTAREA>\n",
        "<P>\n",
        "<INPUT TYPE=\"SUBMIT\" VALUE=\"Transmit\"><BR>\n",
        "<INPUT TYPE=\"SUBMIT\" VALUE=\"Refresh\"><BR>\n",
        "<p>\n";

  foreach $dataItem (keys %hiddenData) {
     print "<INPUT TYPE=\"HIDDEN\" NAME=\"$dataItem\" ",
           "VALUE=\"$hiddenData{$dataItem}\">\n";
  }
  print "<INPUT NAME=\"inChat\" TYPE=\"HIDDEN\">\n",
        "</FORM>\n",
        "<P>\n",
        "<A HREF=\"/index.html\">Back Home</A><P>\n";
}
   


sub displayTranscript {

    local($sLine, $theFLine) = @_;
    local(@Lines, $all);

    print "Current Transcript:<BR>\n";

    open(TRANSCRIPT, "< $LOG_FILE") || &fail("Cannot open Transcript: $!");
    chop(@Lines = <TRANSCRIPT>);
    close(TRANSCRIPT);
    $all = join('',@Lines);

    @Lines = split(/\<\!\-\- SPLIT \-\-\>/, $all);

    if ($#Lines>=$sLine+$theFLine) {
      $end = $sLine+$theFLine;
    }
    else 
    {
      $end = $#Lines;
    }

    if ($sLine >=$theFLine) {
       print "<a href=\"$Me?sLine=",
             $sLine-$theFLine, "&theFLine=",$theFLine,
             "\">Previous $theFLine</a><br>\n";
    } 

    if ($#Lines > $sLine+$theFLine) {
       $nextLeft = $#Lines >= $sLine+$theFLine+$theFLine?$theFLine:$#Lines-$sLine-

	   $theFLine;
       print "<a href=\"$Me?sLine=",
           $sLine+$theFLine,"&theFLine=$theFLine\">Next $nextLeft</a><br>\n";
    }

    print "<P>\n";

    if ($#Lines >= $theFLine ) {
      for($i=$sLine+1; $i<=$end; $i++) {
         print $Lines[$i],"\n";
      }
    }
    else
    {
       print @Lines,"\n";
    } 
}

sub appendMessage {

    local($theUser, $theEmail, $theData) = @_;

    open(TRANSCRIPT, ">> $LOG_FILE") || &fail("Cannot write new message: $!");
    print TRANSCRIPT "<!-- SPLIT -->\n",
                     "On ", &today, 
                     " <a href=\"mailto:$theEmail\">",
                     "$theUser</a> said:<BR>\n",
                     $theData,
                     "<P>\n";
    close(TRANSCRIPT);
}


sub today {

    local(@theDate) = localtime(time);

    return sprintf("%d/%d/%d, %02d:%02d:%02d %s",
                   $theDate[4], $theDate[3], $theDate[5],
                   $theDate[2]>12?$theDate[2]-12:$theDate[2]==0?12:$theDate[2],
                   $theDate[1], $theDate[0], 
                   $theDate[2]>12?"pm":"am");
}

The cycle of page generation for the high-volume Web chat environment is very similar to the PlainChat (low-volume) environment. The process begins the same with a "login" page, followed by a cycle of pages generated by CGI scripts supporting the high-volume Web chat environment.

The specific functions of the BusyChat CGI script that open the transcript log for reading and writing are the same for this example. But for a real application, we'll need to revisit this topic again later to learn how to make it more efficient. The aspect of the high-volume Web chat environment that differs the most from PlainChat is the format of the transcript log. Also, there are some management issues that need to be addressed about the maintenance of the transcript log for the high-volume Web chat environment.

BusyChat has a Login page that is similar to the PlainChat application, except that it asks the user how many messages per page the user want to have. This data is passed and preserved by the application as the user continues in the transcript page cycle states. Figure 13.11 shows the login page for the BusyChat application.

Figure 13.11: The BusyChat login page adds the choice of how many messages per page the user is comfortable with.

BusyChat offers users the choice of how many messages they want to view at once on a page. The login page asks the user to specify the number of messages he can handle. That value is used to split up the transcript page into sections. Each section looks the same in general; they all have a text area for adding new messages and a list of messages. But, the difference between the transcript page for PlainChat and BusyChat is that BusyChat will add links just before the message list is generated. If BusyChat notices that there are more total messages than the user said they can handle, BusyChat will produce a link to a new transcript page with the remainder of the messages listed. This is repeated until the amount of messages to display is less or equal to the number of messages the user can handle at once. If the user is in the "middle" of scrolling through the list, BusyChat also puts a link to go "back" x number of messages, where x is the number of messages a user can handle.

The way BusyChat works is that in addition to passing the username and e-mail address from page to page, it also passes two other pieces of information. First, it passes the message number indicating where to start displaying messages. If there are 12 messages total, and the user specified he can only handle five at a time, a message pointer equal to "6" would mean that the display would show messages 6, 7, 8, 9, and 10 with links back to the previous five and the next two messages.

With BusyChat, the user can set any value from 5, 10, 15, and 25. If those values are not good for your application, you can replace the radio buttons with an input box and let the user pick the exact value. Buttons are preferred because they are easy to change (just click), and it doesn't allow the user to mistakingly choose a value that doesn't work with the application (like -1).

BusyChat doesn't improve the file-access situation, though. But it allows the chat environment to grow to many messages without too much lag. Because it doesn't display any more than a handfull of messages, the server isn't tied up doing major data downloads to the user's browser. Plus, users don't have to wade through a sea of messages to find what they want. They can zip through the list a few at a time with the use of the "next" and "previous" links.

Using Graphic Images to Identify Users

The login page for any Web chat environment is used to ask for some identification (a name), or allow the user to select an icon to represent himself during the Web chat session. Within the PlainChat environment, it is not as important to control the kinds of "identification" available to the user. The PlainChat environment is not meant to be a high-volume Web chat. If your version of PlainChat allows users to represent themselves with a name and unusually large graphic icon, it isn't necessarily a desirable situation. It won't crash the PlainChat system, but a high-volume Web chat environment will not run efficiently if the graphic images associated with each user are unusually large or not uniformly shaped.

If the Web chat environment is high-volume, then it is a good idea to use graphics that are of uniform size and shape. The user is going to be sifting through a lot of pages to find messages of interest and the distraction of irregular and large graphic icons may deter the user from taking full advantage of the Web chat environment.

If the Web chat environment is low-volume, then it can be entertaining for all those participating to enjoy a variety of graphic styles and shapes. If you allow your Web chat users the ability to reference just about any kind of graphic, be aware of the possibility some users may link innapropriate material to the transcript page. It is probably worth the effort to set up a HTML form allowing users to submit their graphic to the site so it can be approved for size and shape; they can then later refer to it with an HTML list box of some kind. This then guarantees that the icons selected meet your guidelines for what works well with the chat environment, and it also will help the transcript page look better. If the graphics are all located on reliable servers (the Web chat server, for example) the error icon for missing icons won't appear in the browser.

If you want a login page that allows users to select from a list of graphics to be associated with their identity in the Web chat environment, this can be done with SSI (Server Side Includes). As the login page is loaded, a SSI in the page can go look up all the files in some preset directory and construct an HTML list box. The user can pick which graphic they want from the list box.

Doing this, the Webmaster doesn't have to edit the HTML login page; he only needs to maintain the icons located in the directory searched by the SSI.

Listing 13.5 is a sample HTML login page that uses SSI to allow the user to select an icon.

Listing 13.5 ssi.html-Sample Login Page Using SSI

<HTML>
<TITLE>SSI Graphic List Example</TITLE>
<BODY BGCOLOR=FFFFFF>

<H1>SSI Graphic List Example</H1>

Source:<BR>
&LT!--#exec cmd="/t2/home/jdw/bookweb/bin/graphics.pl" --&GT

<P>

Effect:<BR>
<FORM ACTION="/cgi-bin/ssi-load.cgi">
<!--#exec cmd="/t2/home/jdw/bookweb/bin/graphics.pl imageName" -->
<p>
<INPUT TYPE="submit" VALUE="Load it">
</FORM>

</BODY>
</HTML>

Listing 13.6 is the Perl script that implements the SSI referenced in Listing 13.5.

Listing 13.6 ssi-load.cgi-Perl Script Implementing the SSI

#!/usr/local/bin/perl

@INC = ('../lib', @INC);
require 'web.pl';

%Form = &getStdin;

&beginHTML;

$Label = $Form{'imageName'};
$Label =~ s/.*\/(.*)\.gif/$1/;

print <<"done";
<HTML>
<TITLE>Your Image is $Label</TITLE>
<body bgcolor=ffffff>
<H1>Your Image...</H1>

<BODY>

<img src="$Form{'imageName'}">
<p>

<a href="http://$ThisHost:$ThisPort/ch13/ssi.html">Back</a> </BODY>
</HTML>

done

It is Important to check that the Web server can support SSI. Chapter 4, "Designing Faster Sites," goes into detail on how to do that.

The high-volume Web chat environment usually is associated with an event. Because of this, you can expect that the topic discussed will be narrower than a generic Web chat environment.

The event can be centered around a celebrity, a sports event, or other attraction. Users hoping to participate in the high-volume Web chat environment will probably have some sort of affiliation to the event. They are probably fans, collegues, or people with an interest in live Web interaction.

For example, if the high-volume Web chat environment is centered around an Olympic event like the women's marathon, then each user who wishes to make a comment or ask a question about the event should be able to select a small graphic icon resembling their respective country's flag. The login page is the best place to offer the choice of using a flag, and picking which country's flag to associate with this user:

<input name="theUsername">
<p>
<select name="theFlag">
<option value="none">Choose a flag
<option value="canada">Canada
<option value="france">France
<option value="mexico">Mexico
</select>

The HTML form variable "theFlag" is a piece of data that must be passed from page to page just as "theUsername".

The PlainChat application accepts the username from the login page so it can display the name with each message the user submits. The option with the flag names above creates a new kind of environment that is less concerned with individual identity, and more so with an affiliation to a particular group, like nationality.

The next chapter is about how server push improves the basic PlainChat application. The notable improvement

Chapter 13

What to Know About Live Communication

CONTENTS