Chapter 14 Perl and Tracking

CONTENTS

Logging
- The Log File
- HTTP Status Codes
Tracking and Environmental Variables
Tracking Hits with the Log
Counters Revisited
- Managing DBM Files
NT Perl Checklist Script
Chapter In Review

There has been little mention so far in this book about the use of logs and other methods of tracking. Once you have a complete Web site, it is necessary to find out how users travel through it. One of the ways to do this is to track usage with logs. To do this accurately, you can place a Perl script at the top of the Web page to be tracked. To demonstrate this use of Perl, in this chapter a tracking element is added to the Goo Goo Records Web site.

Logging

There are all kinds of logs, or lists of actions, that happen inside a computer. Logs tend to be divided up based on their purpose, such as a system log to record actions-called events-done by the NT system, or an application log that records events caused by applications. These logs can be used to keep track of anything that happens that might be of interest to a Web Master or Network Administrator. For example, every time the computer is asked to start up an application, a note of that event is made in the application log. This log, like most others, can be viewed using Event Viewer.

Within a Web site logs can be used to monitor who is visiting your site, and even if they are trying to go into places they're not supposed to.

One of the early problems with tracking and logging on the Web were unrealistic and high hit counts for Web sites. These inflated numbers were, and still are, caused by simplistic uses of counters to record hits on a particular page. It is quite common for Web pages to contain two or three hypertext links, an image link, and a next page link. If the user accesses all of these links then a hit count of four or five may occur, giving a skewed version of site usage. This is only true if the links on the page are used; their presence alone will not skew the hit count.

There are several solutions available to avoid this problem. One is to add a short Perl script to the top of the larger Perl script that delivers the Web pages that are being monitored for user traffic. There are two ways to do this:

Have a form call a CGI script, and that script will load the page, and record the hit to that page.
Have one of the links to the HTML documents call an URL, like this one from the Goo Goo Records site-http://www.googoo.com/cgi-bin/page.pl?next.htm.

This second method will call a Perl script, page.pl, which will read the query string information for the HTML document, in this case, next.html. The script will then deliver that HTML document.

The second method is a little more flexible because you only need one Perl script to deliver any page: the page to deliver changes with the query string info. One drawback is that all of your links will be to a Perl script, making the response time longer. Also, you would have to do all of your logging from the Perl script because the Web server log would only record that every user called the script x number of times, without recording what the destination was. This may be desirable, though, because using this method allows you to make the Web site's log files as minimal or as detailed as you like. This method is explored in detail later in this chapter, as it is the same method used by the Goo Goo Records' Web Master on their Web site. This is the script that performs the logging task:

#!/usr/bin/perl
     ###################################################
     #
     # This is the Page delivery script.
     #
     # This script takes the query string information    as the filename and
     # delivers the file to the browser.  A link to      deliver the page new.html would
     # look like this:
     #
     # <A HREF="http://www.googoo.com/cgi-bin/page.pl?new.html>new</a>
     #
     # Path information is also valid, and necessary to get lower in the directory
     # structure:
     #
     # <A HREF="http://www.googoo.com/cgi-bin/page.pl?/newstuff/new/new.html>new</a>
     #
     # This will allow more flexible logging of any page that is delivered with this
     # script.  With a little work, you can even get this script to process server
     # side includes, counter, and all that jazz.
     #  The trouble here is that the server logs will now only show the user hitting
     # page.pl, no matter which page they request.  This is fine if you are creating
     # your own logs, but can be frustrating if you are not. This script generates
     # a log similar to the one generated by the EWACS server.
     #####################################################
     if ($ENV{'REQUEST_METHOD'} EQ 'GET') {
          $file=$ENV{'QUERY_STRING'};
          $file=~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
          print "Content-type: text/html\n\n";
          $file="c:\googoo\$file";
          if (-e $file) {
               open(LOG,">>c:\\logs\\access");
               $t=localtime;
               print "$t $ENV{'SERVER_NAME'} $ENV{'REMOTE_HOST'}
     $ENV{'REQUEST_METHOD'} $file
     $ENV{'SERVER_PROTOCOL'}\n";
               close(LOG);
               open(HTML,"$file");
               while ($line=<HTML>) {
                    print $line;
               }
               close(HTML);
          }
          else {
               print <<EOF;
<HTML>
<HEAD>
<TITLE>Error! File not found</TITLE>
</HEAD>
<H1>Error! File not found</H1>
<HR><P>
The file you requested was not found.  Please contact <address><A
HREF="mailto:webmaster@googoo.com">webmaster@googoo.com</a></address>
</HTML>
EOF
          }
     }
     else {
          print "<HTML>\n";
          print "<title>Error - Script Error</title>\n";
          print "<h1>Error: Script Error</h1>\n";
          print "<P><hr><P>\n";
          print "There was an error with the Server Script. Please\n";
          print "contact GooGoo Records at <address><a
href=\"mailto:support@googoo.com\">support@googoo.com</a></address>\n";
          print "</HTML>\n";
          exit;
     }

Another method of tracking is to read information from a log file, and to create your tracking data from this data.

The Log File

The file that contains the important information about the Goo Goo Records site is known as the log file. Since they are using the EMWAC HTTP service with their Web site, a log file is created each day and kept in the log file directory. The directory path for the log file directory on the Goo Goo Records server is C:\WINNT35\system32\LogFiles. Each log file is given a file name relating to the date it was created, following the general format of HSyymmdd.LOG. For example, a log file created for July 6, 1996 would have the log filename HS960706.LOG. An example of a log file's contents would resemble this excerpt of a listing from the log file HS960509, from a server in Finland:

Thu May 09 20:09:17 1996 wait.pspt.fi 194.100.26.175 GET /ACEINDEX.HTM HTTP/1.0
Thu May 09 20:09:18 1996 wait.pspt.fi 194.100.26.175 GET /gif/AMKVLOGO.GIF HTTP/1.0
Thu May 09 20:09:19 1996 wait.pspt.fi 194.100.26.175 GET /gif/RNBW.GIF HTTP/1.0
Thu May 09 20:09:19 1996 wait.pspt.fi 194.100.26.175 GET /gif/RNBWBAR.GIF HTTP/1.0
Thu May 09 22:35:09 1996 wait.pspt.fi 194.215.82.227 GET /gif/WLOGO.GIF HTTP/1.0
Thu May 09 22:35:11 1996 wait.pspt.fi 194.215.82.227 GET /gif/BLUEBUL.GIF HTTP/1.0
Thu May 09 22:35:11 1996 wait.pspt.fi 194.215.82.227 GET /cgi-bin/counter.exe?-smittari+-w5+./DEFAULT.HTM 
HTTP/1.0
Thu May 09 22:35:13 1996 wait.pspt.fi 194.215.82.227 GET /gif/EHI.JPG HTTP/1.0
Thu May 09 22:35:17 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI1.gif HTTP/1.0
Thu May 09 22:35:17 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI2.gif HTTP/1.0
Thu May 09 22:35:19 1996 wait.pspt.fi 194.215.82.227 GET /AVIVF.HTM HTTP/1.0
Thu May 09 22:35:23 1996 wait.pspt.fi 194.215.82.227 GET /gif/virtlogo.gif HTTP/1.0
Thu May 09 22:35:23 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI1.gif HTTP/1.0
Thu May 09 22:35:29 1996 wait.pspt.fi 194.215.82.227 GET /gif/KOULU.GIF HTTP/1.0
Thu May 09 22:35:32 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI2.gif HTTP/1.0
Thu May 09 22:35:45 1996 wait.pspt.fi 194.215.82.227 GET /gif/VF21.GIF HTTP/1.0
Thu May 09 22:36:02 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI3.gif HTTP/1.0
Thu May 09 22:36:14 1996 wait.pspt.fi 194.215.82.227 GET /gif/LETTER.GIF HTTP/1.0
Thu May 09 22:37:46 1996 wait.pspt.fi 194.215.82.227 GET /AVIONGEL.HTM HTTP/1.0
Thu May 09 22:37:52 1996 wait.pspt.fi 194.215.82.227 GET /gif/PIRUNLG.GIF HTTP/1.0
Thu May 09 22:44:43 1996 wait.pspt.fi 194.215.82.227 GET /AVIPELI1.HTM HTTP/1.0
Thu May 09 22:44:45 1996 wait.pspt.fi 194.215.82.227 GET /gif/STRESSLG.GIF HTTP/1.0
Fri May 10 04:29:29 1996 wait.pspt.fi 192.83.26.48 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 04:29:30 1996 wait.pspt.fi 192.83.26.48 GET /gif/LETTER.GIF HTTP/1.0
Fri May 10 04:29:31 1996 wait.pspt.fi 192.83.26.48 GET /gif/engflag.jpg HTTP/1.0
Fri May 10 04:30:21 1996 wait.pspt.fi 192.83.26.48 GET /AVIVF.HTM HTTP/1.0
Fri May 10 04:30:26 1996 wait.pspt.fi 192.83.26.48 GET /gif/virtlogo.gif HTTP/1.0
Fri May 10 04:30:27 1996 wait.pspt.fi 192.83.26.48 GET /gif/VF21.GIF HTTP/1.0
Fri May 10 04:30:30 1996 wait.pspt.fi 192.83.26.48 GET /gif/KOULU.GIF HTTP/1.0
Fri May 10 04:31:11 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI2.HTM HTTP/1.0
Fri May 10 04:31:13 1996 wait.pspt.fi 192.83.26.48 GET /gif/LAITE.GIF HTTP/1.0
Fri May 10 04:31:14 1996 wait.pspt.fi 192.83.26.48 GET /gif/KOKOONP.JPG HTTP/1.0
Fri May 10 04:31:32 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI3.HTM HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TIKI1.GIF HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TPIRU1.GIF HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TSTRE1.GIF HTTP/1.0
Fri May 10 04:31:46 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI4.HTM HTTP/1.0
Fri May 10 04:32:03 1996 wait.pspt.fi 192.83.26.48 GET /ACEINDEX.HTM HTTP/1.0
Fri May 10 04:32:19 1996 wait.pspt.fi 192.83.26.48 GET /ACEVF.HTM HTTP/1.0
Fri May 10 04:32:21 1996 wait.pspt.fi 192.83.26.48 GET /gif/ROBOCOP1.GIF HTTP/1.0
Fri May 10 04:33:01 1996 wait.pspt.fi 192.83.26.48 GET /ACEINDEX.HTM HTTP/1.0
Fri May 10 07:54:44 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI1.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI2.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /cgi-bin/counter.exe?-smittari+-w5+./DEFAULT.HTM 
HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/LETTER.GIF HTTP/1.0
Fri May 10 10:08:25 1996 wait.pspt.fi 192.89.123.26 GET /gif/VFLOGO.GIF HTTP/1.0
Fri May 10 10:08:25 1996 wait.pspt.fi 192.89.123.26 GET /gif/AMKVLOGO.GIF HTTP/1.0
Fri May 10 10:08:37 1996 wait.pspt.fi 192.89.123.26 GET /AVIVF.HTM HTTP/1.0
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/VF21.GIF HTTP/1.0
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/KOULU.GIF HTTP/1.0
Fri May 10 10:11:59 1996 wait.pspt.fi 192.89.123.26 GET /AVITULOS.HTM HTTP/1.0
Fri May 10 10:12:05 1996 wait.pspt.fi 192.89.123.26 GET /gif/VIFA5PAP.GIF HTTP/1.0
Fri May 10 10:12:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI2.gif HTTP/1.0
Fri May 10 10:12:47 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 10:13:49 1996 wait.pspt.fi 192.89.123.26 GET /AVIONGEL.HTM HTTP/1.0
Fri May 10 10:13:59 1996 wait.pspt.fi 192.89.123.26 GET /gif/PIRUNLG.GIF HTTP/1.0

In this log file you can see the different calls to the different Perl scripts, and the method by which the request is made, either Get or Post. The log file begins with the first request made that day, and finishes with the last. This example is a very short one, edited from its original for this example, so you can imagine that log files on very active Web servers can easily become triple this length. Purging log files is a very important practice to integrate into your Web maintainence routine.

When you go to purge your log files, remember that you are going to erase information you may need in the future. If you are generating reports from these logs, make sure you only delete logs for which reports have already been made. It is very common that these reports run on a one- or two-week lag time behind the current date, so the last one or two weeks' log files must be kept to successfully generate these reports.

The creation of a new log file for each day makes this process of purging much easier than some HTTP servers which place all log entries into one file, like the "access_log" file used with the NCSA HTTP server. Instead of having to go into the file and delete specific entries, creating an editing hassle, the EMWAC server gives you the advantage of deleting the entire log file for the days no longer necessary for generating reports.

Each log file is kept open until the next day's log records its first action, or transaction. Once this transaction occurs, the previous day's log file is closed. The data transactions recorded in the EMWAC log files are as follows:

The time and date of the request
The IP address or domain name of the server
The IP address or domain name of the client
The HTTP command
The URL requested
The version of the HTTP protocol used (when no version shows up in the log file, this means the default version of 0.9 HTTP was used)

All of this information can be used to provide detailed reports on Web site traffic.

One way to find out the accurate number of hits a site is receiving is to use the daily log file. By understanding the format of the HTTP header that makes the request of the site's home page, we can use a simple script to count actual hits.

Using the grep command in Perl, the Goo Goo Records' Web site first used this script to figure out how many users accessed their site. You might recall that the grep command uses the concept of regular expressions to look for a match and then compiles a list of all matches to the designated character string, or regular expression.

#! usr\bin\perl
     print "content-type: text/html\n\n";
     $num = grep -c 'GET / HTTP' /googoo.com/ WINNT35\system32\LogFiles' ;
     $num += 'grep -c 'GET index.sht /googoo.com/ WINNT35\system32\LogFiles' ; 
     $num += 'grep -c 'GET index.htm /googoo.com/ WINNT35\system32\LogFiles' ;
     print "$num\n";

The Web Master abandoned this method of user hit tabulation early on for several reasons. The first reason was that this method may be more accurate, but it is very time consuming, because it has to read through and count every match that occurs in the long daily log files. The second reason is that each page that was to be monitored had to have its own modified version of this script, because the script makes a specific call to the page named in the script. Another bad side effect of this script is that it forces you to make your index Web page, index.htm, and a Server Side Includes page for the whole thing to work. This will greatly reduce the speed at which your home page works. The final reason is that the site started using the EMWAC HTTP service, which doesn't support Server Side Includes (notice the ".sht" file extension used in the script which is the shortened, and approved, NT version of ".shtml"), making the scripts useless. Good thing for the Web Master there are several other ways to count hits on a Web page.

HTTP Status Codes

There are very few people left who use the Web and have not encountered HTTP server codes yet. There may be nothing quite as frustrating as not receiving the HTML document you requested, but instead the message, "Forbidden, access not granted" or a similar, one-line response. These responses are some of the many HTTP status codes which are issued with each request made of a Web server. Table 14.1 outlines the different types of HTTP status codes, and what they mean.

Table 14.1 HTTP status codes

HTTP Status Code	Code Type	Meaning
200	Successful request	OK-The request was satisfied.
201	Successful request	OK-following a POST command.
202	Successful request	OK-request accepted for processing, but processing is not complete.
203	Successful request	OK-Partial information-the returned information is only partial.
204	Successful request	OK-No Response-request received but no information exists to send back.
300	Redirection	Moved-The information requested is in a new location and the change is permanent.
301	Redirection	Found-The information requested temporarily has a different URL.
302	Redirection	Method-Information under going change, a suggestion for the client to try another location.
303	Redirection	Not Modified-The document has not been modified as expected in the Get request.
304	Redirection	Not delivered from cache-The client already has all the information, the browser just needs to display it again.
400	Error with client	Bad Request-A syntax problem with the client's request, or the request could not be satisfied.
401	Error with client	Unauthorized-Client does not have authorization to access information requested.
402	Error with client	Payment Granted-Used when payment methods are employed by the server and have been satified/ accepted.
403	Error with client	Forbidden-No access for client to information, even with proper authorization.
404	Error with client	Not Found-Server could not find file to satisfy client request.
405	Error with client	Method Not Allowed-The method used in the request line is not allowed for access to the information in the request URL.
406	Error with client	None Acceptable-The information requested has been found, but not within the conditions stated in the request, monitored by the Accept and Accept-Encoding request headers.
407	Error with client	Proxy Authentication Required-This code is not in service yet because HTTP 1.0 does not have proxy capacity yet. When it does, this will indicate proper client authentication necessary to continue.
409	Error with client	Conflict-There is a conflict with the information requested in its current state, preventing access.
410	Error with client	Gone-Information requested by client is no longer available, with no forwarding URL.
411	Error with client	Authorization Refused-The credentials in the client request are not satisfactory to allow access to requested information.
500	Error with server	Internal Server Error-An unexpected condition has caused the server to be unable to satisfy the client's request.
501	Error with server	Not Implemented-The client's request includes facilities not currently supported by the server.
502	Error with server	Bad Gateway-Upstream gateway access necessary for completing request denied or failed.

Understanding these status codes is critical if you want to keep track of what is happening on your server with your Web sites. While these status codes are not recorded by the EMWAC HTTP service log files that the Goo Goo Records site uses, they are in the log files of other servers.

Tracking and Environmental Variables

In each of the Perl scripts that we have used so far, a standard bit of code is used to parse off the form data. We then used that parsed data to make decisions on which the Perl script is to act. Form data, however, is not the only data we can glean from a user through the server. In any Perl script called by an HTML document, we can use the special environment variables to make decisions.

Environment variables are accessed through the %ENV variable, and can be readily used. For example, if you wanted to track how many users from the googoo.com domain have used your Perl scripts, you could add the following snippet of code to each Perl script:

if ($ENV{'REMOTE_HOST'}=~/googoo\.com/i) {
          open(TRACK,"c:\logs\scripts.trk");
          $line=<TRACK>;
          close(TRACK);
          $line++;
          open(TRACK,">c:\logs\scripts.trk");
          print TRACK $line;
          close(TRACK);
     }

This script will increment the number contained in scripts.trk every time the script is accessed only if the client is accessing from within googoo.com. This could be useful for Web sites that will only deliver certain pages to internal users, or to track which users are inside, and which are outside your company.

In addition to the environmental variables already present on the NT Server, and those which may have been added, the EMWAC HTTP service uses the environmental variables listed in Table 14.2.

Table 14.2 Environmental variables

Environmental Variable	Description
CONTENT_LENGTH	The length of the content as received from the client.
CONTENT_TYPE	The content type of the information received that has attached data, as with "POST" requests.
GATEWAY_INTERFACE	The CGI specification revision for the server, in the format of CGI/revsion.
HTTP_ACCEPT	This is the list of MIME types that the HTTP server will recognize, or accept, for use.
PATH_INFO	The path data based on the client's request.
QUERY_STRING	All the information that follows the "?" in the URL when the script specified was accessed using "GET."
REMOTE_ADDR	The client's IP address.
REQUEST_METHOD	The method of request made by the client, as"GET," "POST," and so forth.
SCRIPT_NAME	Path name of the script requested to execute.
SERVER_NAME	The server's host name, DNS alias or IP address, in the form it would appear in a self-referencing URL.
SERVER_PORT	The port number to where the client's request was sent.
SERVER_PROTOCOL	The name/version of the server's information protocol.
SERVER_SOFTWARE	The name/version of the information server software that answered the client's request.

These environmental variables are a subset of the standard CGI designated environmental variables for HTTP service.

Browsers

It is an incorrect assumption that all the people using the Web do so through Netscape Navigator. While this is by far and away the most widely used Web client, or browser, it is not the only one. Microsoft's Internet Explorer is one of several other browsers growing in use. Both Navigator and Explorer use HTML tags not supported by the protocol's standards, and that can't be utilized by the other's software. This means that your site may look different to different users and their different browsers. To avoid the problem of having users find your Web site out of sync with their browser, keeping tabs on which browsers are accessing your Web site is invaluable.

The Goo Goo Records site has added an element so they can determine which browsers are accessing their site, and at what percentage. Eventually they plan to have special pages for each different browser, making use of each browser's strengths.

The following script snippet records which browsers are used to access the Web site:

open(TRACK2,">>c:\logs\browsers.trk");
          print TRACK2 "$ENV{'HTTP_CLIENT'}\n";
          close(TRACK2);

This Perl snippet prints the browser type to the file browsers.trk.

IP Addresses/Domain Names

The IP address, and the related InterNIC domain name, is the way computers find each other on the Internet. This series of four numbers separated by three periods gives each computer on the Internet, which includes the Web, its own identity. Domain names are character equivalents that are assigned to these numbers. For more details concerning IP addresses and domain names, check out the InterNIC site at

http://www.internic.net/

When a computer contacts your server, it leaves its IP address as a calling card, which is recorded in the log file. The environmental variable REMOTE _HOST also stores this address, or sometimes domain name, as its value.

Having a record of your users' IP addresses can be used to determine where your users are from, and to understand which servers you are more popular on. This information can also be used to find out the address and identy of any problem users. To find out the information which comes with an IP address, consult the InterNIC directory, whose URL was given earlier.

The Refer URL

There is an environmental variable which records the URL of where the user has come from as its value. The name of this environmental variable is PREVIOUS_URL. This variable can be used to track a user through a site, or to find out where your site connects to the outside resources on the Web.

Tracking Hits with the Log

The term hits is used very loosely on the Web, and means everything from the actual action of picking any HTML link, to the base unit used to measure Web site traffic. For the purposes of this book a hit is simply any time a user calls up a resource on the Web, whether it be an HTML document, image, or downloadable program. When that resource is accessed successfully, a hit can be considered to be counted against it, or on it. Moving about within one HTML document would not be a hit, but moving from one HTML document to another within the same site would count as one hit.

To record a hit, one of the methods discussed in this chapter can be used. The hit may be registered in different ways using a short Perl snippet on the HTML document, or the hit information can be read by a Perl script from one of the HTTP service's log file.

Counters Revisited

Another way to keep track of Web site traffic is by creating your own page counting scripts which do not rely on logs for statistical information. The way in which this can be done is to use Perl to create either a plain text file, or the more useful database management file, or DBM file.

DBM files are used on the Internet so that different platforms and different operating systems can access the same information. With Windows NT DBM, files are accessed through an application programming interface, or API. It is through the API that the client communicates with the database. Microsoft's SQL server may be used as an API. Manipulating DBM files with Perl is a straightforward affair, as this next section demonstrates.

Managing DBM Files

The main functions to use to manipulate DBM files in Perl are "dbopen()," "dbclose()," "reset()," each()," "values()," "and keys()." Some of these functions were dealt with earlier in the book, and the others will be explained here.

To begin with, the dbopen command is used to create a link between a DBM file and an associative array. The format for this would be something like:

dbopen(%new_array,DB_new_file, read-write-mode);

where Perl will create two new files if the file name specified in the statement does not exist. The new files would have the names "db_newfile.dir" and "db_newfile.pag." To prevent these files being created set the read/write mode to "undef."

The different parameters specified in the above statement operate like this: the %new_array is an associative array and behaves like one; DB_new_file is the DBM file being opened, without specifying either the ".dir" or ".pag" extentions(a full path and file name for the DBM file should be used here); the Read_write_mode which sets the file permissions for the DBM file.

To sever the connection between the DMB and the associative array use the dbclose command in this format:

dbclose(%new_array);

There is just one small problem with this method of tracking a Web site on an NT server. Currently, in Windows NT the DBM funtions in Perl are unsupported. This method is included in this book now for two important reasons: the first is as an example of how tracking can be done outside of using logs, the second is that the DBM function in Perl NT may be supported soon, so you'll be ready for it.

To help deal with Perl scripts that use unsupported routines and functions for Windows NT, a NT Perl checklist is necessary.

NT Perl Checklist Script

This section is a little out of place in this chapter, but it seemed like a good idea to add it here. One big headache that confronts Perl programmers, especially those working in non-UNIX environments, is finding a Perl script that satisfies your needs, but then when you try and run it it fails. After numerous futile attempts at execution you discover that the script uses Perl functions not supported in your version, or porting, of Perl.

In Windows NT, the list of unsupported functions is long enough and extensive enough to cause problems with the inability to use DBMs explained in this chapter. The following script is constructed to search any Perl script for any currently unsupported NT functions. Think of it as an acid test for new scripts you want to add to your Perl library.

#!/usr/bin/perl
     # nttest.pl
     # This is where the list of unsupported functions goes... @functions=("getnetbyname",
	 "getnetbyaddr","getnetent","getprotoent",
"getservent","sethostent","setnetent","setprotoent","setservent","endhostent",
"endnetent","endprotoent","endservent","socketpair,"msgctl","msgget","msgrcv",
"msgsnd","semctl","semget","semop","shmctl","shmget","shmread","shmwrite",
"ioctl","select($w, $x, $y, $z)","chmod","chroot","fcntl","flock","link",
"lstat","readlink","symlink","sysread","syswrite","umask","utime","crypt",
"getlogin","getpgrp","getppid","getpriority","getpwnam","getgrnam","getpwuid",
"getgrgid","getpwent","getgrent","setpwent","setgrent","endpwent","endgrent",
"setpgrp","fork","kill","pipe","setpriority","times","wait","waitpid","alarm",
"dbmclose","dbmopen","dump","syscall");
     $filename=$ARGV[0];
     if(!$filename) {
          print "\nUsage: nttest.pl <scriptname>\n\n";
          exit;
     }
     $linecount=0;
     $errors=0;
     open(SCRIPT, $filename);
     while ($line=<SCRIPT>) {
          $linecount++;
          foreach $func (@functions) {
               if ($line=~/$func[\s|(]/i) {
                    print "Line $linecount: Function $func() is unsupported
by Perl for Windows.\n";
                    $errors++;
               }
          }
     }
     close(TRACK);
     if (!$errors) {
          print "This script contains no unsupported functions, and should
work with Perl for Windows.\n\n";
     }
     else {
          print "This script contains unsupported functions, and will not
work under Perl for Windows.\n\n";
     }

With this script you should save hours of time debugging a Perl script that will never run on Windows NT. Now if someone could write a Perl script that then fixed these unsupported features so the script did work in NT, that would be really something. Please remember that all is not lost. Most of these unsupported functions are not useful in the scripts, and Perl is amazing at doing the same task in different ways. With a little ingenuity and reworking, these scripts may function fine in Windows NT.

Chapter In Review

In this chapter we covered the ability to keep track of a Web site's traffic, using the example of the Goo Goo Records site. Their site uses logs to generate reports to keep track of who is using the site, which browser they are using, and where they go in the site.