Java's URL class gives applets and applications easy access to the World Wide Web using the HTTP protocol. This is fine and dandy if you can get the information you need into a format that a Web server or CGI script can access. However, wouldn't it be nice if your code could talk directly to the server application without going through an intermediary CGI script or some sort of proxy? Wouldn't you like your Java-based Web browser to be able to display your wonderful new image format? This is where protocol and content handlers come in.
Handlers are classes that extend the capabilities of the standard URL class. A protocol handler provides a reference to a java.io.InputStream object (and a java.io.OutputStream object, where appropriate) that retrieves the content of a URL. Content handlers take an InputStream for a given MIME type and convert it into a Java object of the appropriate type.
MIME (Multipurpose Internet Mail Extensions) is the Internet standard for specifying the type of content a resource contains. As you may have guessed from the name, it originally was proposed for the context of enclosing nontextual components in Internet e-mail. MIME allows different platforms (PCs, Macintoshes, UNIX workstations, and others) to exchange multimedia content in a common format.
The MIME standard, described in RFC 1521, defines an extra set of headers similar to those on Internet e-mail. The headers describe attributes such as the method of encoding the content and the MIME content type. MIME types are written as type/subtype, where type is a general category such as text or image and subtype is a more specific description of the format such as html or jpeg. For example, when a Web browser contacts an HTTP daemon to retrieve an HTML file, the daemon's response looks something like this:
Content-type: text/html <HEAD><TITLE>Document moved</TITLE></HEAD> <BODY><H1>Document moved</H1>
The Web browser parses the Content-type: header and sees that the data is text/html-an HTML document. If it was a GIF image file, the header would have been Content-type: image/gif.
IANA (Internet Assigned Numbers Authority), the group that maintains the lists of assigned protocol numbers and the like, is responsible for registering new content types. A current copy of the official MIME types is available from ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/. This site also has specifications or pointers to specifications for each type.
The exact procedure for loading a protocol or content handler depends on the Java implementation. The following instructions are based on Sun's Java Developers Kit and should work for any implementation derived from Sun's. If you have problems, check the documentation for your particular Java version.
In the JDK implementation, the URL class and helpers look for classes in the sun.net.www package. Protocol handlers should be in a package called sun.net.www.protocol.ProtocolName, where ProtocolName is the name of the protocol (such as ftp or http). The handler class itself should be named Handler. For example, the full name of the HTTP protocol handler class, provided by Sun with the JDK, is sun.net.www.protocol.http.Handler. To load your new protocol handler, you must construct a directory structure corresponding to the package names and add the directory to your CLASSPATH environment variable. Assume that you have a handler for a protocol-let's call it the foo protocol-and that your Java library directory is .../java/lib/ (...\java\lib\ on Windows machines). You must take the following steps to load the foo protocol:
If you place the netClass.zip file containing the network classes (located on the CD-ROM that accompanies this book) in your CLASSPATH, the example handlers should load correctly.
Let's start extending Java with a handler for the finger protocol. The finger protocol is defined in RFC 762. The server listens on TCP port 79. It expects either the user name for which you want information followed by ASCII carriage return and linefeed characters, or (if you want information on all users currently logged in) just the carriage return and linefeed characters. The information is returned as ASCII text in a system-dependent format (although most UNIX variants give similar information). We will use an existing class (fingerClient) to handle contacting the finger server and concentrate on developing the protocol handler.
The first decision we must make is how to structure URLs for our protocol. We'll imitate the HTTP URL and specify that finger URLs should be of the following format:
finger://host/user
In this syntax, host is the host to contact and user is an optional user to ask for information about. If the user name is omitted, we will return information about all users.
Because we already have a fingerClient class written, we need to write only the subclasses to URLStreamHandler and URLConnection. Our stream handler will use the client object to format the returned information using HTML. The handler will write the content into a StringBuffer, which will be used to create a StringBufferInputStream. The fingerConnection, a subclass of URLConnection, will take this stream and implement the getInputStream() and getContent() methods.
In our implementation, the protocol handler object does all the work of retrieving the remote content; the connection object simply retrieves the data from the stream provided. Usually, the connection object handler would retrieve the content. The openConnection() method would open a connection to the remote location, and the getInputStream() would return a stream to read the contents. In our case, the protocol is very simple (compared to something as complex as FTP) and we can handle everything in the protocol handler.
The source for the fingerConnection class should go in the same file as the Handler class. The constructor copies the InputStream passed and calls the URLConnection constructor. It also sets the URLConnection member to indicate that the connection cannot take input. Listing 24.1 contains the source for this class.
Listing 24.1. The fingerConnection class.
class fingerConnection extends URLConnection { InputStream in; fingerConnection( URL u, InputStream in ) { super( u ); this.in = in; this.setDoInput( false ); } public void connect( ) { return; } public InputStream getInputStream( ) throws IOException { return in; } public Object getContent( ) throws IOException { String retval; int nbytes; byte buf[] = new byte[ 1024 ]; try { while( (nbytes = in.read( buf, 0, 1024 )) != -1 ) { retval += new String( buf, 0, 0, nbytes ); } } catch( Exception e ) { System.err.println( "fingerConnection::getContent: Exception\n" + e ); e.printStackTrace( System.err ); } return retval } }
First, let's rough out the skeleton of the Handler.java file. We need the package statement so that our classes are compiled into the package where the runtime handler will look for them. We also import the fingerClient object here. The outline of the class is shown in Listing 24.2.
Listing 24.2. Protocol handler skeleton.
package sun.net.www.protocol.finger; import java.io.*; import java.net.*; import sun.net.www.protocol.finger.fingerClient; // fingerConnection source goes here public class Handler extends URLStreamHandler { // openConnection() Method }
Now we'll develop the method responsible for returning an appropriate URLConnection object to retrieve a given URL. The method starts out by allocating a StringBuffer to hold our return data. We also will parse out the host name and user name from the URL argument. If the host was omitted, we default to localhost. The code for openConnection() is given in Listings 24.3 through 24.6.
Listing 24.3. The openConnection() method: parsing the URL.
public synchronized URLConnection openConnection( URL u ) { StringBuffer sb = new StringBuffer( ); String host = u.getHost( ); String user = u.getFile( ).substring( 1, u.getFile( ).length() ); if( host.equals( "" ) ) { host = "localhost"; }
Next, the method writes an HTML header into the buffer (see Listing 24.4). This allows a Java-based Web browser to display the finger information in a nice-looking format.
Listing 24.4. The openConnection() method: writing the HTML header.
sb.append( "<HTML><head>\n"); sb.append( "<title>Fingering " ); sb.append( (user.equals("") ? "everyone" : user) ); sb.append( "@" + host ); sb.append( "</title></head>\n" ); sb.append( "<body>\n" ); sb.append( "<pre>\n" );
We'll then use the fingerClient class to get the information into a String and then append it to our buffer. If there is an error while getting the finger information, we will put the error message from the exception into the buffer instead (see Listing 24.5).
Listing 24.5. The openConnection() method: retrieving the finger information.
try { String info = null; info = (new fingerClient( host, user )).getInfo( ); sb.append( info ) } catch( Exception e ) { sb.append( "Error fingering: " + e ); }
Finally, we'll close off the open HTML tags and create a fingerConnection object that will be returned to the caller (see Listing 24.6).
Listing 24.6. The openConnection() method: finishing the HTML and returning a fingerConnection object.
sb.append( "\n</pre></body>\n</html>\n" ); return new fingerConnection( u, (new StringBufferInputStream( sb.toString( ) ) ) ); }
Once all the code is compiled and in the right locations, load the urlFetcher applet provided on the CD-ROM that accompanies this book and enter a finger URL. If everything loads right, you should see something like Figure 24.1. If you get an error that says something along the lines of BAD URL "finger://...": unknown protocol, check that you have your CLASSPATH set correctly.
Figure 24.1 : The urlFetcher applet displaying a finger URL.
The content handler example presented in this section is for the MIME-type text/tab-separated-values. If you have ever used a spreadsheet or database program, this type will be familiar. Many applications can import and export data in an ASCII text file, where each column of data in a row is separated by a tab character (\t). The first line is interpreted as the names of the fields, and the remaining lines are the actual data.
Our first design decision is to figure out what type of Java object or objects to use to map the tab-separated values. Because this is a textual content, some sort of String object would seem to be the best solution. The spreadsheet characteristics of rows and columns of data can be represented by arrays. Putting these two facts together gives us a data type of String[][], or an array of arrays of String objects. The first array is an array of String[] objects, each representing one row of data. Each of these arrays consists of a String for each cell of the data.
We'll also need to have some way of breaking the input stream into separate fields. We will make a subclass of java.io.StreamTokenizer to handle this task. The StreamTokenizer class provides methods for breaking an InputStream into individual tokens. You may want to browse through the entry for StreamTokenizer in the API reference if you are not familiar with it.
Content handlers are implemented by subclassing the java.net.ContentHandler class. These subclasses are responsible for implementing a getContent() method. We'll start with the skeleton of the class and then import the networking and I/O packages as well as the java.util.Vector class. We also will define the skeleton for our tabStreamTokenizer class. Listing 24.7 shows the skeleton for this content handler.
Listing 24.7. Content handler skeleton.
/* * Handler for text/tab-separated-values MIME type. */ // This needs to go in this package for JDK-derived // Java implementations package sun.net.www.content.text; import java.net.*; import java.io.*; class tabStreamTokenizer extends StreamTokenizer { public static final int TT_TAB = ''\t' // Constructor } import java.util.Vector; public class tab_separated_values extends ContentHandler { // getContent method }
We will first define the class that breaks the input into the separate fields. Most of the functionality we need is provided by the StreamTokenizer class, so we only have to define a constructor that specifies the character classes needed to get the behavior we want. For the purposes of this content handler, there are three types of tokens: TT_TAB tokens, which represent fields; TT_EOL tokens, which signal the end of a line (that is, the end of a row of data); and TT_EOF tokens, which signal the end of the input file. Because this class is relatively simple, it is presented in its entirety in Listing 24.8.
Listing 24.8. The tabStreamTokenizer class.
class tabStreamTokenizer extends StreamTokenizer { public static final int TT_TAB = '\t'; tabStreamTokenizer( InputStream in ) { super( in ); // Undo parseNumbers() and whitespaceChars(0, ' ') ordinaryChars( '0', '9' ); ordinaryChar( '.' ); ordinaryChar( '-' ); ordinaryChars( 0, ' ' ); // Everything but TT_EOL and TT_TAB is a word wordChars( 0, ('\t'-1) ); wordChars( ('\t'+1), 255 ); // Make sure TT_TAB and TT_EOL get returned verbatim. whitespaceChars( TT_TAB, TT_TAB ); ordinaryChar( TT_EOL ); } }
Subclasses of ContentHandler must provide an implementation of getContent() that returns a reference to an Object. The method takes as its parameter a URLConnection object from which the class can obtain an InputStream to read the resource's data.
First, we'll define the overall structure and method variables. We need a flag (which we'll call done) to signal when we've read all the field names from the first line of text. The number of fields (columns) in each row of data will be determined by the number of fields in the first line of text and will be kept in an int variable called numFields. We also will declare another integer, index, for use while inserting the rows of data into a String[].
We need some method of holding an arbitrary number of objects because we cannot tell the number of data rows in advance. To do this, we'll use the java.util.Vector object, which we'll call lines, to keep each String[]. Finally, we will declare an instance of our tabStreamTokenizer, using the getInputStream() method from the URLConnection passed as an argument to the constructor. Listing 24.9 shows the skeleton code for the getContent() method.
Listing 24.9. The getContent() skeleton.
public Object getContent( URLConnection con ) throws IOException { boolean done = false; int numFields = 0; int index = 0; Vector lines = new Vector(); tabStreamTokenizer in = new tabStreamTokenizer( con.getInputStream( ) ); // Read in the first line of data (Listing 31.10 & 31.11) // Read in the rest of the file (Listing 31.12) // Stuff all data into a String[][] (Listing 31.13) }
The first line of the file will tell us the number of fields and the names of the fields in each row for the rest of the file. Because we don't know beforehand how many fields there are, we'll keep each field in a Vector called firstLine. Each TT_WORD token that the tokenizer returns is the name of one field. We know we are done once it returns a TT_EOL token and can set the done flag to true. We will use a switch statement on the ttype member of our tabStreamTokenizer to decide what action to take. This is done in the code in Listing 24.10.
Listing 24.10. Reading the first line of data.
Vector firstLine = new Vector( ); while( !done && in.nextToken( ) != in.TT_EOF ) { switch( in.ttype ) { case in.TT_WORD: firstLine.addElement( new String( in.sval ) ); numFields++; break; case in.TT_EOL: done = true; break; } }
Now that we have the first line in memory, we need to build an array of String objects from those stored in the Vector. To accomplish this, we'll first allocate the array to the size just determined. Then we will use the copyInto() method to transfer the strings into the array just allocated. Finally, we'll insert the array into lines (see Listing 24.11).
Listing 24.11. Copying field names into an array.
// Copy first line into array String curLine[] = new String[ numFields ]; firstLine.copyInto( curLine ); lines.addElement( curLine );
Before reading the remaining data, we have to allocate a new array to hold the next row. Then we loop until encountering the end of the file, signified by TT_EOF. Each time we retrieve a TT_WORD, we insert the String into curLine and increment index.
The end of the line lets us know when a row of data is done, at which time we will copy the current line into Vector. Then we will allocate a new String[] to hold the next line and set index back to zero (to insert the next item starting at the first element of the array). The code to implement this is given in Listing 24.12.
Listing 24.12. Reading the rest of the data.
curLine = new String[ numFields ]; while( in.nextToken( ) != in.TT_EOF ) { switch( in.ttype ) { case in.TT_WORD: curLine[ index++ ] = new String( in.sval ); break; case in.TT_EOL: lines.addElement( curLine ); curLine = new String[ numFields ]; index = 0; break; } }
At this point in the code, all the data has been read in. All that remains is to copy the data from lines into an array of arrays of String, as shown in Listing 24.13.
Listing 24.13. Returning TSV data as String[][].
String retval[][] = new String[ lines.size() ][]; lines.copyInto( retval ); return retval;
To show how the content handler works, we'll modify the urlFetcher applet used earlier in this chapter to demonstrate the finger protocol handler. We'll change it to use the getContent() method to retrieve the contents of a resource rather than reading the data from the stream returned by getInputStream(). We'll show the changes to the doFetch() method of the urlFetcher applet necessary to determine what type of Object was returned and display it correctly. The first change is to call the getContent() method and get an Object back rather than getting an InputStream. Listing 24.14 shows this change.
Listing 24.14. Modified urlFetcher.doFetch() code: call getContent() to get an Object.
try { boolean displayed = false; URLConnection con = target.openConnection(); Object obj = con.getContent( );
Next come tests using the instanceof operator. We handle String objects and arrays of String objects by placing the text into the TextArea. Arrays are printed item by item. If the object is a subclass of InputStream, we read the data from the stream and display it. Image content is just noted as being an Image. For any other content type, we simply throw our hands up and remark that we cannot display the content (because the urlFetcher applet is not a full-fledged Web browser). The code to do this is shown in Listing 24.15.
Listing 24.15. Modified urlFetcher.doFetch() code: determine the type of the Object and display it.
if( obj instanceof String ) { contentArea.setText( (String) obj ); displayed = true; } if( obj instanceof String[] ) { String array[] = (String []) obj; StringBuffer buf = new StringBuffer( ); for( int i = 0; i < array.length; i++ ) buf.append( "item " + i + ": " + array[i] + "\n" ); contentArea.setText( buf.toString( ) ); displayed = true; } if( obj instanceof String[][] ) { String array[][] = (String [][]) obj; StringBuffer buf = new StringBuffer( ); for( int i = 0; i < array.length; i++ ) { buf.append( "Row " + i + ":\n\t" ); for( int j = 0; j < array[i].length; j++ ) buf.append( "item " + j + ": " + array[i][j] + "\t" ); buf.append( "\n" ); } contentArea.setText( buf.toString() ); displayed = true; } if( obj instanceof Image ) { contentArea.setText( "Image" ); diplayed = true; } if( obj instanceof InputStream ) { int c; StringBuffer buf = new StringBuffer( ); while( (c = ((InputStream) obj).read( )) != -1 ) buf.append( (char) c ); contentArea.setText( buf.toString( ) ); displayed = true; } if( !displayed ) { contentArea.setText( "Don't know how to display " obj.getClass().getName( ) ); } // Same code to display content type and length } catch( IOException e ) { showStatus( "Error fetching \"" + target + "\": " + e ); return; }
The complete modified applet source is on the CD-ROM that accompanies this book as urlFetcher_Mod.java in the tsvContentHandler directory. Figure 24.2 shows what the applet will look like when displaying text/tab-separated-values. The file displayed in the figure is included on the CD-ROM as example.tsv.
Figure 24.2 : The urlFetcher_Mod applet.
Most HTTP daemons should return the correct content type for files ending in .tsv. Many Web browsers have a menu option that shows you information such as the content type about a URL (for example, the View | Document Info option in Netscape Navigator). You can use this feature to see what MIME type the sample data is being returned as. If the data does not show up as text/tab-separated-values, try one of the following things:
After reading this chapter, you should have an understanding of how Java can be extended fairly easily to deal with new application protocols and data formats. You now know what classes you have to derive your handlers from (URLConnection and URLStreamHandler for protocol handlers, ContentHandler for content handlers) and how to get Java to load the new handler classes.