Chapter 12 MIME and Helper Applications

Web Helper Applications
Multipurpose Internet Mail Extensions (MIME)
MIME and the World Wide Web
MIME and CGI
Summary

This chapter deals with some fundamentals for your Intranet. In subsequent chapters, you'll use these building blocks to help your customers access specific information in their everyday work. This chapter assumes you already have a Web server up and running, at least in a rudimentary fashion. If this is not the case, you'll probably want to get your server up, so you can see and manipulate the sample Web server and browser configuration files, which are discussed here. Refer back to Part II of this book, "Getting Set Up on the Server," and review Chapter 7 "Running the Intranet Web Server."

The configuration information that comes with the IIS 2.0 Web server software is fairly basic. It doesn't give you much in the way of troubleshooting information, nor does it go very far to explain the reasons behind certain aspects of the program. This chapter fills in this information gap. For example, a handy little section in the help file explains how to configure MIME types in the Registry, but this chapter explains why you would want to do that. This chapter also discusses the server MIME mappings in the Registry, their meaning in Web technology, and how you can use them to set up your Intranet, concepts that are central to this book.

Web Helper Applications

Web browsers like Netscape, Explorer, and Mosaic are amazing packages. They not only enable you to search the World Wide Web for interesting and useful documents, images, and other data, but they also provide a friendly interface for older Internet services such as FTP and Gopher. These browsers can display not only plain text and HTML text (and HTML hyperlinks) but also several common types of image files, even without any helper applications. But even these amazing programs have their limits. People use a mind-boggling array of different formats to store their data on computers, and new formats are being invented all the time. Your Web browser can't possibly handle all the existing kinds of data, let alone the new formats just being invented. That's where helper applications come in.

The pioneering developers of Web technology, scientists at CERN, the European Particle Physics Lab, wanted to develop some means of integrating various kinds of data into a single, user-friendly interface. Toward this end, the folks at CERN made a critical choice early in their work to allow a Web browser to call other computer programs to handle data that it can't handle itself. You probably know these other programs as helper applications, though some people call them external viewers. Whatever they're called (the term helper application is used in this book), the decision to enable Web browsers to hand off data to a different, outside program was sheer genius.

You might have already set up your browser to use helper applications for viewing Web video or listening to Web sounds. What you may not realize is that the mechanism for handing off data to helper applications is a standardized one, and you can use it for almost anything you can imagine. Helper applications aren't just for viewing video clips, as the following examples demonstrate:

Your everyday word processor can function as a helper application, enabling you to distribute boilerplate documents with your Web server.
You can set up your spreadsheet program as a helper application to enable your customers to download live data and then manipulate it.
You can use a presentation graphics package as a helper application to open up computer-based training possibilities for your customers.

I'll come back to the specifics of setting up helper applications later in this chapter. First, though, you need to understand the mechanism by which Web browsers pass off data they can't handle internally to external programs. This subject might seem a digression from your Intranet, but understanding this subject is critical to your success.

Multipurpose Internet Mail Extensions (MIME)

The original Web developers at CERN decided on a single interface for a variety of data types. The developers implemented this decision by adopting an existing mechanism called Multipurpose Internet Mail Extensions or MIME.

As the name implies, MIME hails from the world of Internet electronic mail. E-mail is one of the oldest Internet services, pre-dating the World Wide Web by many years. E-mail is still one of the most popular Internet services and is often given as the reason organizations and people want Internet access. Despite its popularity, though, Internet e-mail has been limited by the requirement that only plain ASCII text can be used in messages. This requirement means that nontext files, such as applications, data files that include formatting (like word processor files), and other binary files, can't be e-mailed as-is. It also means that even simple non-ASCII characters, such as non-English characters used in many languages around the world, won't pass e-mail muster.

As is often the case with computers and the Internet, there are ways you can work around this limitation to get a binary data file from one place to another intact. For example, you can use the file transfer protocol (FTP) to transfer any kind of file from one computer to another over the Internet. Also, if you've used Internet e-mail to send data files very much, particularly to or from UNIX systems, you may know about the uuencode and uudecode programs. The uuencode program converts a binary file into a specially encoded ASCII text file so it can be sent by e-mail. Its companion utility, uudecode, converts the encoded file back into its original format on the recipient's end.

Neither of these workarounds is really convenient, though. Both not only require extra steps, but also a certain amount of skill and knowledge on the part of both the sender and receiver of the messageÑskill and knowledge that the casual e-mail user may not have. Sophisticated, user-friendly e-mail tools have developed in the past few years, and most have point-and-click features for attaching any kind of data file to a message. These tools are easy-to-use and work well for exchanging nontext data, provided both sender and recipient are using the same package.

Unfortunately, users of different proprietary-format e-mail programs, such as a cc:Mail user and a Microsoft Mail user, can't easily exchange data files through e-mail. Both cc:Mail and Microsoft Mail use a proprietary message format. Although there are gateway packages for both, they are expensive and don't always work well.

Although Lotus (manufacturer of cc:Mail) and Microsoft would have you believe that the solution to these incompatibility problems lies in your buying their packages for every user or buying an expensive piece of e-mail gateway software and dedicating hardware on which to run it, these solutions are inadequate in the context of the Internet. These vendors might wish they could sell their packages to every one of the millions of Internet e-mail users, but this feat is unlikely. If you need to send Internet e-mail, your fancy mail program's file attachment feature will break down sooner or later.

Enter MIME

In 1991, Nathaniel S. Borenstein of Bellcore proposed major extensions to Internet electronic mail standards. Called Multipurpose Internet Mail Extensions, or MIME, Borenstein's proposal extended the existing Simple Mail Transport Protocol (SMTP) standards to offer a "standardized way to represent and encode a wide variety of media types, including textual data in non-ASCII character sets, for transmission via Internet mail."

The MIME proposal, which was issued as Internet Requests for Comments (RFC) 1522 and 1523, amended earlier RFCs that defined the Simple Mail Transport Protocol (primarily RFC 822) to allow the attachment of virtually any kind of data file to an Internet e-mail message using a simple mechanism.

NOTE

The Internet has a long history of development through consensus. The TCP/IP networking protocols, developed at first with U.S. Government (Department of Defense) support, were worked out through give-and-take revolving around publicly proposed standards called Requests For Comments. Internet developers issued proposed standards for the nuts-and-bolts of the Internet, calling for comment from the then-small Internet community. Coordinated by the Internet Engineering Task Force (IETF), a process for building consensus for developing standards grew up, with feedback on RFCs eventually incorporated into the final standards the IETF issued. To date, more than 2,000 Internet RFCs have been issued. Many of them have made their way into final standards, guaranteeing that different vendors' TCP/IP networking applications can work together. You can find the complete set of Internet RFCs at http://www.internic.net/ds/dspg0intdoc.html.

RFC 822 defined the Simple Mail Transport Protocol. Anyone who wants to develop an Internet e-mail program can follow the requirements of RFC 822 to ensure the package works with all other RFC 822-compliant data file packages.

Under the terms of RFC 822, an Internet data file message has two parts:

A header, often likened to the envelope in which you mail a letter at the post office, which contains addressing and postmark information
A body, like the information inside the envelope, which contains the text of the message

This division of e-mail messages into a header section and a body section is critical to MIME and, as you will see later in this chapter, it is also important in World Wide Web services. Consequently, this division will be important to you in setting up your Intranet.

A header itself can be divided into separate parts, each having the same general format:

A header name (From, To, Date, and so on) followed by a colon and a single blank space. (Multiword header names, such as Reply-To, are hyphenated.)
The header content, such as the addressee's e-mail address, time, and sender's e-mail address.

All headers in an e-mail message are single lines. Some headers are required by RFC 822, and others are optional. The important point is that they all follow the same format, with the colon and a single blank space separating the header name and contents. You will see additional headers on your own e-mail messages, including what might be termed postmarks of all the Internet hosts that handled your message on its way to you. Even so, all follow this simple format, and the header section of all e-mail messages, regardless of how many headers there are, is separated from the body by a single blank line.

If you're interested in looking at the headers on your own e-mail messages, many mail programs (including Eudora Light) have a setup command to let you choose whether you want to see all the headers on a message. You'll see that each message contains header and body parts.

How MIME Works

As noted, the broad division of Internet data file messages into the header and body sections is always present in the format just described. Borensteins' MIME proposal, also grossly simplified here, was to extend this basic division by doing the following:

Adding a new header type that specified whether a message was a multipart message, with some or no normal text and zero or more attachments
Enabling the data to be encoded into a special ASCII text format, and then attached to the message body, with separating/identifying information

You can read the details of MIME in RFC 1522 and RFC 1523. In essence, the new header type allows one or more of a set of message content types to be identified and attached to messages. The content types include image, audio, video, application data, and, of course, text. In addition, a special content type allows multiple attachments of differing data types to the same message.

There are several MIME headers, including Mime-Version, Content-Type, and Content-Length. You can read about these headers in detail in the MIME RFCs. The important part to note is that these are just additional headers that follow the standard Internet e-mail message format.

MIME-capable mail user agents parse incoming messages for the MIME-extended headers. Based on the content type of the message and a set of user-configurable rules associating particular content types with application programs (or viewers), the MIME mail program passes attachments off to other application programs on the system that are capable of dealing with them. For example, an incoming MIME-formatted e-mail message may have an audio file attached. The recipient's MIME-compliant mail tool recognizes the sound file attachment from the extended headers in the message and fires off an audio player to play the sound. Likewise, your Web browser passes off data it cannot handle directly to helper applications on your system that can handle the data.

Mail User Agents and Mail Transfer Agents

Internet e-mail handling programs are usually divided into two categories. First, users that are creating, sending, and reading e-mail messages use Mail User Agents (MUAs). Examples of Windows GUI MUAs include Exchange (free with Windows), Eudora, and Pegasus (another very popular freeware application available on the Internet).

Usually, however, a separate program does the work of routing and delivering e-mail. These separate programs are usually referred to as Mail Transport Agents (MTAs). However, pc MUAs, like Eudora, generally have enough MTA features built in so that the mail you create gets handed off immediately to a mail server for delivery.

Similarly, MIME-compliant MUAs create MIME-formatted messages automatically. Attaching a file is simple for the user; it's usually a point-and-click operation in graphical MUAs, with the encoding handled internally by the program.

Probably the most widely used MIME-compliant MUA is the pc and Macintosh package called Eudora. It's a basic Internet e-mail package with most of the standard MUA features, but it's also MIME-compliant. A postcardware (meaning freeware if you send its author, Jeff Beckley, a postcard) version of Eudora Light is available on the CD-ROM that accompanies this book.

MIME and the World Wide Web

You know from using your WWW browser that you can deal with many kinds of data and Internet services. Your Web browser can display images, access Gopher and FTP services, and, when properly equipped with helper applications, play movies or audio that you find on the Web. Because you've set up a Web server of your own, you also know you can make these and other data types available on your server, and you know how to write the HTML to include them in your Web pages. You may not know, however, that the MIME mechanism just described is what makes this all possible.

To help you understand this process, this section delves more deeply into the details of MIME as it relates to Web servers, browsers, and helper applications. You'll learn how Web servers use MIME to distinguish among the types of data they're serving and how Web servers use MIME to tell Web browser clients what sort of data is being sent in every single transaction.

MIME and the Web Server

Web servers understand MIME information and provide it to Web browsers in every HTTP transaction. As described earlier in this chapter, MIME is able to identify a number of data types (called content types in the MIME discussion earlier) and subtypes. Web server software uses an extensive database of MIME content type information. With IIS, this database is in the Windows NT Registry underneath this key:

HKEY_LOCAL_MAchINE\SYSTEM\CurrentControlSet\Services\InetInfo\Parameters\MimeMap

Figure 12.1 displays the Registry Editor opened to this key with the MIME type for Microsoft Word documents selected in the right-hand window.

Figure 12.1 : Editing the IIS MIME map using the Windows NT Registry Editor.

The Layout of the Server MIME Map

IIS installs over 100 MIME mappings by default. The syntax of each row in the Mime Map key is as follows:

<mime type>,<filename extension>,,<gopher type>

For example, the server uses the following line to tell the browser that a .doc file is a Microsoft Word document:

application/msword,doc,,5

Notice that the <mime type> field is subdivided into two parts by a forward slash. Remember from the discussion of MIME earlier in the chapter that the proposed MIME standards include a set of data types (content types) that can be attached to e-mail messages. The <mime type> field represents these very same data types. If you scroll down the window pane on the right side of the Registry Editor window and look at just the part of the <mime type> field before the slash, you can see six data types:

Application
Audio
Image
Text
Video
X-world

Two other common MIME types, which IIS does not install automatically, are the following:

Message
Multipart

These MIME types follow the conventions proposed in Nathaniel Borenstein's MIME RFCs and are the same types supported by the MIME-compliant e-mail packages listed earlier. Thus, this short list of MIME data types is incorporated into your Web server.

Of course, different kinds of data can fall into these broad categories, so the MIME data types are subdivided into MIME data subtypes. The matter to the right of the slash in the MIME map signifies subtypes of the major MIME data types. You're no doubt familiar with several kinds of images, .gif, .jpeg, and .bmp, for example. Thus, you'll see a number of entries for the image data type, one each for the major image subtypes, such as image/jpeg. Similarly, you'll see a couple of different video and audio subtypes, including video/mpeg.

Perhaps the largest number of subtypes are those of the application data type. As you can see from Figure 12.1, a large number of well-known application programs are listed. These range from everyday office word processors (like application/msword) to standard UNIX utilities (like tar) to special purpose packages (like PostScript). MIME provides support for all of these application programs and the mechanism to use them. If you use these applications, or any of the other applications listed in the MIME map, your Web server knows about them, and you'll be able to put them to work as a part of your Intranet by using the information in this book.

Look at the remaining data on each row of the MIME map. The MIME mechanism associates filename extensions with data types/subtypes. The right side of each row contains a filename extension to be associated with the MIME data type/subtype on the left side of the row. For example, the entry for image/gif uses the filename extension gif, and the entries for application/postscript use several filename extensions: ai, ps, and eps.

To put this another way, the MIME map helps Web browsers tie filename extensions to specific computer programs. Your Web server knows, from the MIME map, that a .doc file is a data file for Microsoft Word, a .ps file is a PostScript document, and an .mpeg file is an MPEG (Motion Picture Experts Group) video movie. This is an important piece of information for your Intranet because now your Web server can tell your clients (that is, Explorer, Netscape, Mosaic, or another Web browser) what sort of data is coming when your customers click a hyperlink.

Clients, Servers, and MIME Types

Just as Web servers know about MIME types and include the information in every piece of data they send to Web browsers, the Web browsers understand MIME as well.

Web Servers Say What They're Sending

Web servers always precede anything they send in response to a client request (for example, when you click a hyperlink) with some preliminary header information. From the discussion about MIME headers in the e-mail context, you can probably guess that these headers contain MIME data type/subtype information. Specifically, when a Web server responds to a request from a Web browser for a document or other piece of data, the server announces to the browser in one or more headers the type of data it is sending, using the associations in the MIME map in the Registry. Thus, when you click a hyperlink pointing to a video file (volcano.mpeg, for example), the first bit of information sent back to your browser about the link is its MIME type/subtype, video/mpeg. Your browser, then, knows what sort of data is coming even before it arrives.

Web Browsers Understand MIME Types, Too

Your Web browser understands MIME and its data types/subtypes. Your browser reads the incoming MIME type header information from the Web server and decides what to do with the incoming data based on its type. For example, your Web browser knows what to do with data of the MIME type text/html (regular Web pages in HTML) or image/gif (a .gif image). It has a built-in ability to properly handle these and other common types of data. That's how you're able to read most documents you find on the Web and see most images as well.

MIME and Web Helper Applications

As noted at the beginning of this chapter, Web browsers can't possibly handle all kinds of data. You already know about common helper applications. What you might not know is that the MIME information is intimately involved with these helper applications. Web browsers use the MIME type header information they get from Web servers, using the very same set of data type/subtype and filename extension, to pass off the data to helper applications. This process enables you to play Web movies or sound files. And this process is how, as you'll learn in later chapters, you can use MIME information to create your own associations between data and your own helper applications for your Intranet.

A MIME Conversation

The following imaginary dialog between a Web browser and Web server, written in plain English instead of in the Hypertext Transfer Protocol (HTTP) using MIME headers, illustrates what happens when a user clicks an object that the browser can't show:

User (to the browser): Click, show me this object.

Browser (to the server): Send me the data this link points to.

Server: OK, but first you should know that it is of this MIME data type/subtype. Here it comes.

Browser (to itself): Ohhhh, it's that kind of MIME data type/subtype. Let's see, that means I can't display it myself, so I have to send it to a helper application that understands that data type. Let me look at my list. Which one handles this MIME data type/subtype? (Note that browsers are getting more and more sophisticated at handling multiple file types internally without having to pass the data to an external helper application.)

Browser (to the selected helper application): Here, deal with this data.

Using MIME to Set Up Web Helper Applications

This section outlines the process of setting up a Web server and browsers to use helper applications. To focus on the general principles used, pretend you have a helper application called PluPerStat. You don't need to know what this program does or anything about the data it produces/uses, but assume a couple of things about it:

PluPerStat has some kind of proprietary data format.
PluPerStat stores its files with the filename extension .plu.

Edit the MIME Map on Your Web Server

Your first step in setting up PluPerStat as a helper application for your Intranet is to edit the MIME map in the Registry on your Web server to add an entry for it. Not all Web servers store the MIME map in the NT Registry. See your server documentation to make sure of the name and location of the MIME map file if you are using some Web server other than IIS.

Set Up the New Helper Application on Your Browser(s)

Before you can use PluPerStat as a helper application, you need to tell your Web browser about it and its MIME data type/subtype. Different browsers have different mechanisms for adding helper applications. The following section covers Explorer. If you're using another browser, check your documentation (if necessary). You will probably realize as soon as you read the steps for Explorer that the concepts can be easily applied to any browser.

Setting up Internet Explorer for MIME

To set up Internet Explorer 2.0 to use the imaginary PluPerStat data format, perform the following steps. Note that Explorer uses the term file types rather than helper application to accomplish the same purpose.

Run Explorer and choose View | Options | File Types from the main menu. You will see the dialog shown in Figure 12.2.
Click the New Type button to open the dialog shown in Figure 12.3.
Figure 12.2 : The Internet Explorer File Types dialog.
Fill in the boxes with the appropriate information. The one thing you can't really do in this example is fill in the path to the application that will be used to open this type of file when this type of data is downloaded. But assuming you had a real application in mind, you could click the New button and fill in the path.
Figure 12.3 : This Internet Explorer dialog is used to add MIME types.
When you've finished, click OK, and then click OK again to save the new MIME information.

Explorer is now configured to use PluPerStat as a helper application whenever it encounters the MIME type/subtype application/x-pluperstat or the filename extension .plu.

Why Include Both MIME Type and Filename Extension?

Careful readers will notice the first sentence in the preceding paragraph says "whenever it encounters the MIME type/subtype application/x-pluperstat or the filename extension .plu." You may wonder why it's necessary to include both the MIME type/subtype and filename extension. After all, you've learned the Web server includes this information in the MIME data type/subtype headers, so why does Explorer (or any other Web browser) have to be configured to specify both pieces of information?

Although this information is indeed redundant when communicating with a Web server, Web browsers also communicate with other kinds of Internet information servers, such as FTP and Gopher servers. These Internet services pre-date both the World Wide Web and MIME; they don't know anything about MIME types. Moreover, because they send back only one kind of data, not one of many kinds of data like a Web server, they have no reason to precede the data they send with any identifying header information at all. Because an FTP server, for example, has no way of telling a Web browser what sort of data is coming, Web browsers use a workaround, keying off the filename extension to a MIME data type/subtype. You've no doubt seen your browser display a set of canned icons, representing different file types, when you connect to an FTP or Gopher server. These icons are your browser's MIME mechanism at work, using the filename extensions it finds and a built-in list of MIME data types/subtypes.

Thus, if you're connected to an FTP server with your Mosaic browser and you click a link pointing to a file with the .plu extension, Mosaic can make the assumption the file is a PluPerStat data file because you've configured a helper application for this kind of data. There's no guarantee, though, that the .plu file is really a PluPerStat data file. After all, people are free to name files anything they want.

The MIME map contains a semi-official list of MIME types and filename extensions, and Web browsers are built to rely on that list. Although you added a new type, application/x-pluperstat in the example, to your browser, there's no guarantee that other Web or Internet servers won't have used the same filename extension for some other kind of data file. Still, the key point is that Web browsers have a built-in list of filename extension/MIME type associations to fall back on in the absence of any MIME header information coming from the server.

MIME and CGI

The Common Gateway Interface is a standard way of passing information from the Web fill-in forms you've seen to back-end CGI scripts or other CGI programs that deal with the data. CGI is described in detail in Chapter 19, "Getting the Most out of HTML with CGI," and Chapter 20, "Building a CGI Database System."

CGI is MIME-aware, which accounts for much of its power. CGI scripts on your Intranet can return data from your Web server in response to browser requests, in much the same way as you get data when you click a hyperlink. Although most people think of the data being returned from a Web server as being from static files on the server (such as Web pages written in HTML, images, and so on), CGI scripts and programs can generate data on the fly in response to user requests. Such requests can be, for example, based on a fill-in form. The user enters information in the form, and then clicks a Submit button. The CGI script then processes the information entered, generates a new stream of data based on the user input, and returns it to the client. Thus, a fill-in form can solicit input from a user such as search criteria in a database application, construct a query using the user data, run the query against the database, and return the results to the user's Web browser as an HTML document.

The mechanics of this CGI on-the-fly data generation use the MIME mechanism. Just as your Intranet server, sending back data in response to a mouse click on a hyperlink, precedes that data with header information containing the MIME data type/subtype of the data to be sent, your CGI scripts must return the same sort of information about the data stream they're about to send. Thus, any Perl CGI script's very first output statements might be something like the following:

print "Content-Type: text/html\n";
print "\n";

These statements give orders to generate the string of characters Content-Type: text/html followed by a newline and print a blank line. You've seen Content-Type before, just a few pages back, as well as the necessary blank line. Recall the discussion of the fundamental RFC-822 e-mail requirement: Messages must be separated into a header area and a body area with a blank line between them. What you have here is exactly the same: The CGI script generates a MIME data type/subtype header (in this case Content-Type: text/html) followed by a blank line as the very first bit of data to be returned to the Web browser.

In this example, as required in all CGI scripts that are to return data to the user's Web browser, the program informs the browser that the forthcoming data is of the MIME type text and subtype html. The rest of the data generated by the script is, in fact, text data with HTML codes. Such data can include any and all HTML markup, including URLs pointing to other Web documents, images, or even other CGI scripts. Use of variable substitution in CGI scripts, for example, can enable you to generate documents, forms, or anything else that can be flagged in HTML, all with the simple use of one MIME type/subtype header preceding the data.

This simple, yet powerful, example uses the text/html MIME type/subtype, but there is no reason your CGI scripts can't return any other valid MIME type/subtype. Provided you've set up your Web server's MIME map and your users' Web browsers have corresponding helper application setup, there's almost no limit to what you can return from your CGI scripts. For example, the preceding Perl print statements could just as well be the following:

print "Content-Type: application/x-pluperstat\n";
print "\n";

Your script would then select and return a PluPerStat data file, based on information the user enters into a fill-in form on your Intranet. This way, you can make a library of PluPerStat data available on your Web server, enable your customers to grab pieces of it using their Web browsers, and then interact with the data using the PluPerStat program itself. You've just made your Intranet something more than just a look-at-pictures-and-read-text-files server: Your customers can actually use it for their real work.

Summary

This chapter is the heart of this book. In it, you've learned the following:

What Web helper applications are
What MIME is and where it came from
How the developers of the World Wide Web adopted MIME as a major part of Web technology
How Web servers and browsers use MIME to identify and process data
The relationship between MIME and Web browser helper applications
The basics of helper application setup in Explorer
How MIME and the CGI mechanism work together

The next chapter continues the discussion of MIME by showing you how to hook your office word processor into your Intranet. Later chapters talk about your own application programs and apply the information you've learned in this chapter to real programs that do real work.

Chapter 12

MIME and Helper Applications

CONTENTS

The Layout of the Server MIME Map

Web Servers Say What They're Sending

Web Browsers Understand MIME Types, Too

Set Up the New Helper Application on Your Browser(s)

Setting up Internet Explorer for MIME