HTML 4.0 Sourcebook:The HTTP Protocol

To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Table of Contents

Basic Elements of an HTTP Session

When a client contacts a server, it sends a request header defining the details of the request, followed by any data the client may be sending. In response, the server returns a response header describing the status of the transaction, followed in turn by any data being returned. The first example illustrates the basics of this flow of information.

Example 13: A Simple GET Method Request

This example looks at how a client makes a GET request for a document resource from an HTTP server and at how a server responds to the request. In practice, such a transaction would be initiated on a browser by clicking on a hypertext anchor pointing to the server and the desired file—for example:

<A HREF="http://smaug.middle.earth.ca:2021/Tests/file.html">anchor text</A>

For this example (and for Examples 14 and 15), we assume this anchor was accessed from a document previously retrieved from the URL: http://www.utoronto.ca/webdocs/webinfo.html

That is, the user first accessed the document webinfo.html and within this page selected the anchor that retrieved the test file. The analysis is broken into the two basic parts: the passing of the request to the server and the response sent by the server back to the client.

The Client Request Header

Figure 9.1 shows the actual data sent by an old client (Netscape Navigator 1.01!) to the server. Other clients send qualitatively the same information (we will look at some others later on). The dots indicate Accept header fields omitted to save space.

This request message consists of a request header containing several request header fields. Each field is a line of ASCII text, terminated by a carriage-return linefeed character pair (CRLF). A blank line containing only a CRLF pair marks the end of the request header and the beginning of any data being sent from the client to the server. This example transaction does not send data to the server, so the blank line is the end of the request.

The request header contains two parts. The first part—the first line of the header—is called the method field. This field specifies the HTTP method to be used, the location of the desired resource on the server (as a URL), and the version of the HTTP protocol the client program would like to use. This is followed by several HTTP request fields, which provide information to the server about the client and about the nature of the data (if any) being sent by the client to the server.

Client Request: Method Field

The method field contains three text fields, separated by whitespace (whitespace is any combination of space and/or tab characters). The general form for this field is:

HTTP_method  identifier  HTTP_version

which, in our example, was:

GET /path/file.html HTTP/1.0

Figure 9.1 Data sent from a Netscape Navigator 1.01 client to an HTTP server during a simple GET request. Comments are in italics.

GET /Tests/file.html HTTP/1.0
Accept: text/plain
.
.
Accept: */*
If-Modified-Since: Wed, 25 Sep 1996 17:23:31 GMT
Referer: http://www.utoronto.ca/webdocs/webinfo.html
User-Agent: Mozilla/1.01
	[a blank line, containing only CRLF ]

The three components of this method field are:

HTTP_method The HTTP method specification—GET in this example. The method specifies what is to be done to the object specified by the URL. Some others common methods are HEAD, which requests header information about an object, and POST, which is used to send information to the object.

identifier The identifier of the resource. In this example, the identifier, /path/file.html, is the URL stripped of the protocol and Internet domain name strings. If this were a request to a proxy server, it would be the entire URL. Proxy servers are discussed later in this chapter.

HTTP_version The HTTP protocol version used by the client; HTTP/1.0 for Netscape Navigator 4 and earlier.

Client Request: Accept Field

The example shows several additional request headers. The Accept fields contain a list of data types, expressed as MIME content-types, which tell the server what type of data the client is willing to accept. MIME types are discussed in more detail later in this chapter; see also Appendix B, “Multipurpose Internet Mail Extensions,” available at the companion Web site at www.wiley.com/compbooks/graham. The meanings for simple requests are relatively straightforward. For example, Accept: text/plain means that the client can accept plain text files, while Accept: audio/* means that the client can accept any form of audio data.

A client can include information in accept headers stating the relative desirability of particular types of data. This is expressed through two quantities: the q or quality factor (a number in the range 0.0 to 1.0, where 0.0 is equivalent to not accepting a type, and 1.0 is equivalent to always accepting a type) and mxb, which stands for the maximum size in bytes. If the resource is bigger than the mxb value, then it is not acceptable to the client. The default values are q=1.0 and mxb=undefined (that is, any size is acceptable). Thus the following

Accept: image/*
Accept: image/jpeg; q=0.7; mxb=50000
Accept: image/gif; q=0.5

mean that the client prefers image/jpeg files, provided they are smaller than 50 KB. If there is a JPEG file, but it is too big, then the client would prefer a GIF. If there are no GIF files, then the client will take any available image file.

Accept types can be combined in a single field if they are separated by commas. For example, the above three fields could be written:

Accept: image/*, image/jpeg; q=0.7; mxb=50000, image/gif; q=0.5

NOTE: Unreliability of Accept Header Information
In principle, a server or a server-based gateway program can use Accept information to decide what type of data to send back to the client. However, very few servers have this capability, while many browsers send the field Accept: */*, which indicates that they will accept anything.

Client Request: User-Agent Field

There are several other common request header fields. The User-agent field, of the form User-agent: ascii_string, provides information about the client making the request. The example gave

User-Agent: Mozilla/1.01

to indicate the Netscape 1.01 browser (code-named “Mozilla”). Some other examples are:

User-Agent: Mozilla/2.02 (Windows 3.1)
User-Agent: Mozilla/3.02 (Win95; I)
User-Agent: Mozilla/4.03 [en] (Win95; I)
User-Agent: Mozilla/2.0 (compatible; MSIE 3.0; Windows 95;1024,768)
User-Agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)

The first three correspond to different versions of the Netscape Navigator browser, with the text inside the parentheses providing additional information about the platform (typically, the operating system). This is the correct syntax for including additional browser specific information. The fourth and fifth examples are from Microsoft Internet Explorer 3 and 4, respectively—note how these browsers claim to be a Netscape equivalents, and properly identify themselves only via the text inside the parentheses. Many server gateway programs use the browser identity string to select the type of data to return, and thus Microsoft uses the Mozilla name so that server software will treat the Microsoft product as equivalent to Netscape’s.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.