-->
Previous Table of Contents Next


Cookies

HTTP cookies are a method for maintaining statefulness in a stateless protocol. What does this mean? In HTTP, a session between a client and a server typically spans many separate actual TCP connections, thus making it difficult to tie together accesses into an application that requires state, such as a shopping-cart application. Cookies are a solution to that problem. As implemented by Netscape in its browser and subsequently by many others, servers can assign clients a cookie, meaning some sort of opaque string whose meaning is significant only to the server itself, and then the client can give that cookie back to the server on subsequent requests.

The mod_cookies module nicely handles the details of assigning unique cookies to every visitor, based on the visitor’s host name and a random number. This cookie can be accessed from the CGI environment as the HTTP_COOKIE environment variable, for the same reason that all HTTP headers are accessible to CGI applications. The CGI scripts can use this as a key in a session-tracking database, or it can be logged and tallied up to get a good, if undercounted, estimate of the total number of users that visited a site, not just the number of hits or even number of unique domains.

Happily, there are no configuration issues here. Simply compile with mod_cookies and away you go. It couldn’t be easier.

Configurable Logging

For most folks, the default logfile format (also known as Common Logfile Format, or CLF) doesn’t provide enough information when it comes to doing a serious analysis of the efficacy of your Web site. It provides basic numbers in terms of raw hits, pages accessed, hosts accessing, timestamps, and so forth, but it fails to capture the “referring” URL, the browser being used, and any cookies being used. So there are two ways to get more data for your logfiles: by using the NCSA-compatibility directives for logging certain bits of information to separate browsers or by using Apache’s own totally configurable logfile format.

NCSA Compatibility

For compatibility with the NCSA 1.4 Web server, two modules were added. These modules log the User-Agent and Referer headers from the HTTP request stream.

User-Agent is the header most browsers send that identifies what software the browser is using. Logging of this header can be activated by an AgentLog directive in the srm.conf file or in a virtual-host-specific section. This directive takes one argument, the name of the file to which the user-agents are logged—for example,


AgentLog logs/agent_log

To use the AgentLog directive, you need to ensure that the mod_log_agent module has been compiled and linked to the server.

Similarly, the Referer header is sent by the browser to indicate the tail end of a link. In other words, when you’re on a page with an URL of “A,” and there’s a link on that page with an URL of “B,” and you follow that link, the request for page “B” includes a Referer header with the URL of “A.” This is very useful for finding what sites link to your site, and what proportion of traffic they account for.

The logging of the Referer header is activated by a RefererLog directive, which points to the file to which the referers get logged:


RefererLog logs/referer_log

One other option the Referer logging module provides is RefererIgnore, a directive that allows you to ignore Referer headers. RefererIgnore is useful for weeding out the referers from your own site, if all you’re interested in is links to you from other sites. For example, if your site is www.myhost.com, you might want to use the following:


RefererIgnore www.myhost.com

Remember that logging of the Referer header requires compiling and linking in mod_log_referer.

Totally Configurable Logging

The previous modules were provided, like many Apache features, for backward compatibility. They have some problems, though. Because they don’t contain any other information about the request they’re logging from, it’s nearly impossible to tell which Referer fields went to which specific objects on your site. Ideally, all the information about a transaction with the server can be logged into one file, extending the common logfile format or replacing it altogether. Well, such a beast exists in the mod_log_config module.

The mod_ log_ config module implements the LogFormat directive, which takes as its argument a string, with variables beginning with % to indicate different pieces of data from the request. Table 36.4 lists the variables.

Table 36.4 Variables for the LogFormat Directive

Variable Definition

%h Remote host.
%l Remote identification via identd.
%u Remote user, as determined by any user authentication that may take place. If the user wasn’t authenticated and the status of the request is a 401 (authorization error), this field may be bogus.
%t Common logfile format for time.
%r First line of request.
%s Status. For requests that are internally redirected, this is the status of the original request; %>s will give the last.
%b Bytes sent.
%{Foobar}i Contents of Foobar: header line(s) in the request from the client to the server.
%{Foobar}o Contents of Foobar: header line(s) in the response from the server to the client.


Previous Table of Contents Next