Chapter 5

Building Special-Purpose I/O Classes


CONTENTS


The Java I/O library is designed so that you can extend it to work well with the kind of data you are using. You can extend the Java I/O system in several different ways. You can implement a file-like interface to an object that is not a file (for example, an in-memory array of bytes). You can create a filter stream, which is a special kind of I/O stream class that can transform or perform other special handling on the input or output of an existing data stream. You also can implement a class that reads and interprets a structured file, permitting an application to treat the file as a data structure, rather than having to interpret the format itself. This chapter explores the Java I/O system and the ways that you can enhance it to meet your own needs.

Stream Classes

Java I/O is based largely on I/O streams, which provide a mostly sequential view of file-like objects. The two basic stream classes are java.io.InputStream and java.io.OutputStream. They are fairly simple classes that permit reading or writing data as bytes. The majority of the classes in the java.io package extend one of those two classes.

InputStream and OutputStream are abstract classes, and the interface they provide is rather simple and abstract. They permit reading and writing single bytes or arrays of bytes-no other data types are permitted. Readers can query how many bytes are available for reading without blocking.

In addition, the InputStream class provides the interface (but not the implementation) for the mark mechanism-a simple, yet versatile, lookahead interface. Subclasses aren't required to support marks. If subclasses do not support marks, they must simply return false when their markSupported method is called. If they do support marks, the caller can invoke the mark(readlimit) method, which tells the input stream to save the current position and prepare to save the next bytes that are read. The parameter, readlimit, is an integer which specifies the maximum number of bytes that the stream will need to save. If the reset method is called before readlimit bytes have been read (and before the mark method is called again), the stream must back up to the marked position in the stream.

Sources and Sinks

It was mentioned previously that the basic I/O stream classes are abstract classes that need to be extended before they're useful. The first thing that you might notice about them is that they don't provide any mechanism for attaching the streams to any real data objects, such as files or network connections. That's the job of several extended stream classes, which I call source and sink classes. Source classes provide access to data sources, and sink classes provide destinations for output operations. The sources and sinks that are included as a part of the Java library, and the kinds of data objects to which they connect, are listed in Table 5.1.

Table 5.1. Java Source and Sink I/O Streams.
Class Names Type of Data Object
FileInputStream
FileOutputStream
Disk file
ByteArrayInputStream
ByteArrayOutputStream
In-memory arrays of bytes
PipedInputStream
PipedOutputStream
These two classes connect to each other
SequenceInputStream Several other streams, in sequence
StringBufferInputStream A StringBuffer instance
java.net.SocketInputStream
java.net.SocketOutputStream
Network sockets

The PipedInputStream and PipedOutputStream classes are interesting because they connect to each other, enabling one part of your Java program to read output produced by another part. Usually the two parts that use the piped streams (called the producer and the consumer) are in different threads, to minimize the possibility of causing a deadlock. However, even in different threads, it's possible to have a deadlock. For example, the consumer might be blocked while waiting on the producer to write more data, while at the same time the producer can't finish computing the data until the consumer takes some additional action. Use the piped streams with care. (See Chapter 6, "Effective Use of Threads," for more information about using Java threads.)

As an example of how one of these classes would typically be used, here is a code fragment that opens a file and reads the first four bytes (ignoring the possibility of exceptions):

InputStream f = new FileInputStream("Applet.class");
byte sig[] = new byte[4];
f.read(sig);

Figure 5.1 depicts the relationships between the stream classes listed previously and their data objects.

Figure 5.1 : Relationship between streams and data objects.

Filter Streams

In the table of source and sink streams, two very important ones are omitted. Strictly speaking, they belong in that table, because they do provide another way to connect an I/O stream to a data object. However, they are so useful that they also can be thought of as something much more.

InputFilterStream and OutputFilterStream are source and sink classes that use other streams as their data objects. Filter streams can extend the interface of an existing stream object, transform the data as it is read or written, or transparently provide some useful service such as buffering. Because they are themselves streams, they can use other filter streams as data objects. This means that you can compose the functions provided by filter streams by chaining several of them together into a new, composite filter. Figure 5.2 illustrates the idea.

Figure 5.2 : Input and output filter streams.

Like the basic InputStream and OutputStream classes, FilterInputStream and FilterOutputStream are abstract classes, so subclasses are required if they are to do anything useful. There are several useful filter streams supplied with the Java library, however. The functions of the filter streams differ more strongly than the sources and sinks do, so it makes more sense to explain them than to list them in a table.

The BufferedInputStream and BufferedOutputStream classes provide I/O buffering. The basic source and sink streams of the Java I/O system don't provide any buffering; when they are asked to read or write some data, they immediately pass the request on to the underlying data object. When the object is a file or network connection, that strategy can result in poor performance. An instance of one of the buffered stream classes maintains an internal buffer and uses the buffer to satisfy I/O requests whenever possible. Only when the buffer is empty (in the case of an input stream) or full (in the case of an output stream) is the underlying source or sink invoked again.

Typically, when you create a filter stream, you pass the next stream in the chain to the new filter stream when it is initialized:

InputStream f = new BufferedInputStream(new FileInputStream("Applet.class"));

The DataInputStream and DataOutputStream classes provide a more structured interface to data. Unlike the other streams mentioned so far, the data streams don't restrict input and output to units of bytes. They provide interfaces for reading and writing the primitive Java datatypes, such as int, double, and boolean, in addition to a few other useful constructs, such as text lines and UTF (byte-encoded Unicode) strings. The data streams read and write these objects in a binary format, but they do it in a portable way, so a file written using DataOutputStream on one machine can be read later using DataInputStream on another machine with a different architecture.

LineNumberInputStream extends the basic stream functionality by keeping track of the line number from which text is currently being read. This can be very useful when writing a parser that needs to report line numbers along with error messages to help users find the source of problems.

The PrintStream class extends the OutputStream interface by providing several methods for producing formatted textual output, including print and println methods for all the basic Java datatypes. PrintStream even provides a method for printing arbitrary objects (it calls String.valueOf(obj) to produce a printable representation of the object).

The intent of the PushbackInputStream class is to provide lookahead interface that is slightly simpler (and less costly) than the full-fledged mark/reset mechanism described previously. When using the PushbackInputStream class, you are allowed to look ahead by only one byte. Instead of the mark and reset methods, a simpler unread method is available, which takes a single byte as an argument. Calling unread more than once between calls to read results in an IOException being thrown.

Editing, Transformation, and Selection with Streams

Most of the streams included in the Java library are simple utility streams. It's possible to build much more sophisticated streams, however. You can build streams that edit raw data to cast it into a new form; one example would be a source code pretty-printing class. Other streams might translate data into an entirely different format. It's also possible to build filter streams that perform a more conventional type of filtering, letting only lines, words, or records that meet certain criteria pass through.

The really useful thing about all these various stream classes is that each of them inherits from one of the base classes InputStream and OutputStream, so they can be treated as instances of those types when desired. If you need to call a method that takes one of those two base classes as a parameter, and you don't want to give that method the raw data stream, you can simply tack a filter stream (or a whole chain of them) onto the original stream and pass the last filter stream into the method.

An Example FilterStream

To demonstrate how to build a filter stream, let's look at a class that decodes a stream that is encoded in "base64" format. Base64 is an encoding format designed for the Multipurpose Internet Mail Extensions (MIME) standard to permit binary data to be sent through electronic mail without being garbled. It is similar to the UNIX "uuencode" format, but base64 is better defined, and its designers were careful to use only characters that would not be changed or dropped by existing mail gateways. I've chosen Base64 for an example because it's an extremely simple format, so the details of the format conversion won't obscure the basic techniques for building a stream class.

An Internal Filter

The Base64InputStream class illustrates one handy but atypical use of filter streams. Commonly, application code is in control of all the filters in the chain of streams. It's useful in this case, though, for the decoding stream to slip another stream into the chain, to partition the task. The base64 specification recommends that whitespace characters (space, tab, carriage return, linefeed, and formfeed) be ignored in base64-encoded files. If there's another stream class ahead of the decoder, which strips out all whitespace, we can avoid having to worry about that in the center of our decoding routine. Listing 5.1 contains the WSStripInputStream class.


Listing 5.1. WSStripInputStream.java.
/*
 * WSStripInputStream.java       1.0 96/01/25 Glenn Vanderburg
 */
 
package COM.MCP.Samsnet.tjg;

import java.io.*;

/**
 * An input stream which strips out all whitespace characters.
 *
 * @version     1.0, 25 Jan 1996
 * @author      Glenn Vanderburg
 */

class WSStripInputStream extends FilterInputStream {

    /**
     * Constructs a new WSStripInputStream initialized with the
     * specified input stream
     * @param in the input stream
     */
    public WSStripInputStream(InputStream in) {
        super(in);
    }

    /**
     * Reads a byte of data.  The method will block if no input is available.
     * @return  the byte read, or -1 if the end of the stream is reached.
     * @exception IOException If an I/O error has occurred.
     */
    public int read() throws IOException {

        // This is the routine that really implements the special
        // functionality of this class; the others just call this
        // one to get the data that they need.
        int c;
        do {
            c = in.read();
        } while ((c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f')
                 && c != -1);
        return c;
    }

    /**
     * Reads into an array of bytes.
     * Blocks until some input is available.
     * @param b the buffer into which the data is read
     * @param off the start offset of the data
     * @param len the maximum number of bytes read
     * @return  the actual number of bytes read, -1 is
     *          returned when the end of the stream is reached.
     * @exception IOException If an I/O error has occurred.
     */
    public int read(byte b[], int off, int len) throws IOException {
        for (int i=off; i<len; i++) {
            int c = read();
            if (c == -1) {
                return i - off;
            }
            b[i] = (byte) c;
        }
        return len;
    }

    /**
     * Skips bytes of input.
     * @param n         bytes to be skipped
     * @return  actual number of bytes skipped
     * @exception IOException If an I/O error has occurred.
     */
    public long skip(long n) throws IOException {

        // Can't just read n bytes from 'in' and throw them
        // away, because n bytes from 'in' doesn't necessarily
        // correspond to n bytes from 'this'.
        for (int i=1; i <= n; i++) {
            int c = read();
            if (c == -1) {
                return i - 1;
            }
        }
        return n;
    }

    /**
     * Returns the number of bytes that can be read without blocking.
     * @return the number of available bytes
     */
    public int available() throws IOException {

        // We don't really know.  We can ask 'in', but some of those bytes
        // are probably whitespace, and it's possible that all of them are.
        // So we have to be conservative and return zero.
        return 0;
    }
}

The Base64 Decoding Filter

Once the WSStripInputStream class is done, it's relatively easy to build the Base64InputStream class. This implementation sacrifices efficiency for simplicity. As a result, the only thing moderately complicated is the fill_buffer method, which does some error checking and then, if all is well, performs the actual decoding. Listing 5.2 contains the Base64InputStream class. It makes use of a special exception, BadFormatException; the code for the exception is available on the CD-ROM that comes with this book. (Following the code listing is a short discussion of some design decisions that could have been made differently.)


Listing 5.2. Base64InputStream.java.
/*
 * Base64InputStream.java       1.0 96/01/17 Glenn Vanderburg
 */

package COM.MCP.Samsnet.tjg;

import java.io.*;
 
/**
 * An input stream which decodes a base64-encoded file.  
 *
 * @version     1.0, 17 Jan 1996
 * @author      Glenn Vanderburg
 */

public
class Base64InputStream extends FilterInputStream {

    /* Base64 padding character */
    static private byte pad = '=';

    static private int BADchAR = -1;

    /* Base64 decoding table.  */
    static private int c[] = new int[256];
    static {

        for (int i=0; i<256; i++) {
            c[i] = BADchAR;
        }

        c['A'] = 0;  c['B'] = 1;  c['C'] = 2;  c['D'] = 3;  c['E'] = 4;
        c['F'] = 5;  c['G'] = 6;  c['H'] = 7;  c['I'] = 8;  c['J'] = 9;
        c['K'] = 10; c['L'] = 11; c['M'] = 12; c['N'] = 13; c['O'] = 14;
        c['P'] = 15; c['Q'] = 16; c['R'] = 17; c['S'] = 18; c['T'] = 19;
        c['U'] = 20; c['V'] = 21; c['W'] = 22; c['X'] = 23; c['Y'] = 24;
        c['Z'] = 25; c['a'] = 26; c['b'] = 27; c['c'] = 28; c['d'] = 29;
        c['e'] = 30; c['f'] = 31; c['g'] = 32; c['h'] = 33; c['i'] = 34;
        c['j'] = 35; c['k'] = 36; c['l'] = 37; c['m'] = 38; c['n'] = 39;
        c['o'] = 40; c['p'] = 41; c['q'] = 42; c['r'] = 43; c['s'] = 44;
        c['t'] = 45; c['u'] = 46; c['v'] = 47; c['w'] = 48; c['x'] = 49;
        c['y'] = 50; c['z'] = 51; c['0'] = 52; c['1'] = 53; c['2'] = 54;
        c['3'] = 55; c['4'] = 56; c['5'] = 57; c['6'] = 58; c['7'] = 59;
        c['8'] = 60; c['9'] = 61; c['+'] = 62; c['/'] = 63;

        // The pad character doesn't have an encoding mapping, but
        // it's not an automatic error.
        c[pad] = -2;
    }

    /* Buffer for decoded characters that haven't been read */
    int buf[] = new int[3];
    int buffered = 0;

    /* Buffer for clusters of encoded characters */
    byte ebuf[] = new byte[4];

    boolean textfile;
    
    /**
     * Constructs a new Base64InputStream initialized with the
     * specified input stream.
     * @param in the input stream
     */
    public Base64InputStream(InputStream in) {
        this(in, false);
    }

    /**
     * Constructs a new Base64InputStream initialized with the
     * specified input stream, for a text file.
     * @param in the input stream
     * @param textfile true if the file is a text file
     */
    public Base64InputStream(InputStream in, boolean textfile) {

        // To make life easier, we slip a WSStripInputStream in just ahead
        // of us, so that we don't have to worry about whitespace characters.
        super(new WSStripInputStream(in));
        this.textfile = textfile;
    }

    /**
     * Reads a byte of data.  The method will block if no input is available.
     * @return  the byte read, or -1 if the end of the stream is reached.
     * @exception IOException If an I/O error has occurred.
     */
    public int read() throws IOException, BadFormatException {
        if (buffered == 0) {
            fill_buffer();
        }

        int b = buf[--buffered];

        if (textfile && b == '\r' && peek() == '\n') {
            return read();
        }
        else {
            return b;
        }
    }
          
    /**
     * Returns the next byte which will be read.  The method will
     * block if no input is available.
     * @return  the next byte to be read, or -1 if the end of the
     *          stream is reached.
     * @exception IOException If an I/O error has occurred.
     */
    public int peek() throws IOException, BadFormatException {
        if (buffered == 0) {
            fill_buffer();
        }

        return buf[buffered - 1];
    }
          
    /**
     * Reads into an array of bytes.
     * Blocks until some input is available.  This method should be overridden
     * in a subclass for efficiency (the default implementation reads 1 byte
     * at a time).
     * @param b the buffer into which the data is read
     * @param off the start offset of the data
     * @param len the maximum number of bytes read
     * @return  the actual number of bytes read, -1 is
     *          returned when the end of the stream is reached.
     * @exception IOException If an I/O error has occurred.
     */
    public int read(byte b[], int off, int len)
    throws IOException {
        for (int i=off; i<len; i++) {
            int c = read();
            if (c == -1) {
                return i - off;
            }
            b[i] = (byte) c;
        }
        return len;
    }

    /**
     * Skips bytes of input.
     * @param n         bytes to be skipped
     * @return  actual number of bytes skipped
     * @exception IOException If an I/O error has occurred.
     */
    public long skip(long n) throws IOException {

        // Can't just read n bytes from 'in' and throw them away, because
        // n bytes from 'in' will result in roughly (4n/3) bytes from 'this',
        // and we can't even calculate the exact number easily, because of
        // the potential of running into the padding at the end of the
        // encoding.  It's  easier to just read from 'this' and throw those
        // bytes away, even though it's less efficient.
        for (int i=1; i <= n; i++) {
            int c = read();
            if (c == -1) {
                return i - 1;
            }
        }
        return n;
    }

    /**
     * Fills buf with a new chunk of decoded data.
     */
    protected void fill_buffer()
    throws IOException, BadFormatException {
        if (buffered != 0) {  // Just for safety ...
            return;
        }

        int l = in.read(ebuf);
        int numbytes = 3;

        if (l == 0) {  // Must've reached EOF last time ...

            // Fill buffer with EOF indicators for read() to return.
            for (int i=0; i<buf.length; i++) {
                buf[i] = -1;
                buffered++;
            }
            return;
        }

        if (l < ebuf.length) {
            throw new EOFException();
        }

        // Check for bad characters
        for (int i=0; i < ebuf.length; i++) {
            if (c[ebuf[i]] == BADchAR) {
                throw new BadFormatException("Base64: invalid character "
                                          &nbs p;  + (char) ebuf[i]);
            }

            // While we're at it, take notice of padding
            if (c[ebuf[i]] == pad) {
                if (i < 2) {
                    throw new BadFormatException("Base64: padding starts "
                                          &nbs p;      + "too soon");
                }
                numbytes = i - 1;
            }
        }

        // Now do the decoding
        for (    int i=0, j=4,    k=2;
                 i < numbytes;
                 i++,     j -= 2, k += 2) {

            buf[(numbytes - 1) - i] = (c[ebuf[i+1]] >> j)
                                       + ((c[ebuf[i]] << k) & 0xff);
            buffered++;
        }
    }
}

Design Alternatives

As mentioned earlier, the design of the base64 decoding filter emphasizes simplicity. There are several things that might have been done differently if the class had been designed for production use.

The previous implementation takes a byte-by-byte approach, no matter how many bytes have been requested by the caller. The multibyte read methods and the skip method all call the single-byte read method repeatedly. Obviously, that's not the most efficient mechanism.

A better strategy would be to create larger internal buffers and process larger chunks of data at a time when the caller asks for more than one byte. Most of the extra complexity would be in the inner loop of fill_buffer, but it wouldn't be too bad. It's easy to calculate how many bytes of encoded input will be required to produce a given number of decoded bytes, so in most cases only one read call would need to be made upstream.

It would probably be a mistake, however, to attempt to provide even greater efficiency by reading more bytes than required and decoding them in advance. Suppose, for example, that the calling code wishes to provide helpful diagnostic messages in the event of an error. To help with this, there may be a LineNumberInputStream ahead of you. If your class were to read ahead, the calling code would not be able to determine reliably the line number where an error occurred. There is a general BufferedInputStream, and it's usually best to permit the application code to insert it at an appropriate place in the chain of input streams if needed. I/O libraries, in which buffering happens automatically without application control, are handy most of the time, but on the rare occasions when buffering is not desired, the lack of control is a big problem. (The error-reporting scenario just described is probably pretty implausible with base64 input, but the design principle is still a good one.)

If you are familiar with some of the more advanced features of C++, you might be thinking that the WSStripInputStream would be a good application for a nested class, because not many applications require stripping all whitespace out of a file. If it were a nested class, it wouldn't be available to any other classes besides Base64InputStream.

Java, however, doesn't have nested classes. One of the important differences between Java and C++ is that Java uses packages, rather than classes, as the primary unit of protection. Therefore, in the example, although WSStripInputStream couldn't be nested inside the class that uses it, it was placed within the same package and is not a public class. The result is that, although the class is accessible to Base64InputStream and the other classes in package COM.MCP.Samsnet.tjg, it is not visible or accessible outside that package.

The use of packages as the primary protection mechanism has an important implication: whole packages, and not merely classes, should be designed. It's not really a good idea to use a package as a catchall for loosely related classes. You don't need to understand every detail of all of the classes in a package before you start coding, but it's best to have a clear vision of the purpose of the package and write all the classes to contribute to that purpose. That rule is not followed in the COM.MCP.Samsnet.tjg package, obviously. Because the classes in this book are written to illustrate different programming tips and tricks, the package is used just as a namespace. In production systems, however, a little care in the design of your packages will pay dividends.

Reversing Streams

You may have also wondered about the decision to implement base64 decoding as a stream in the first place. What if you have some data already in memory in base64 format, and you need to decode it as you write it somewhere? Again, pretty unlikely with the specific example of base64, but it's still a valid question. (In fact, the JDK comes with undocumented base64 encoding and decoding classes that are not implemented as streams.)

It's a good idea to provide this kind of decoding functionality as an input stream, and the inverse operation as an output stream, because that matches the most common way the functions will be used. It is possible, though, that you may need to use an input stream in a chain of output streams, or vice versa. Fortunately, there's a way to do that.

Figure 5.3 is an illustration of two special filter streams that I call reverse streams. The ReverseInputStream class uses piped output and input streams to encapsulate an output filter stream so that it can be used in a chain of input streams, and the ReverseOutputStream class performs the inverse function.

Figure 5.3 : Reverse input and output streams.

Constructing the output chain in the illustration might be done this way:

ReverseOutputStream s
    = new ReverseOutputStream(new FileOutputStream("readme.txt"));
s.setInputStream(new Base64InputStream(s.attachPoint()));

The reverse output stream creates the two piped streams itself. The setInputStream method gives the reverse stream access to the end of the input stream, and the attachPoint method returns the piped input stream to make the other end of the connection.

These example implementations of the reverse stream classes do buffer data, breaking the rule of thumb presented earlier, because efficiency will be a problem here. Without the buffering, these classes would cause a lot of switches between threads, and thread switching is costly. If the buffering causes a problem, a subclass could be written, overriding the run method to disable the buffering.

Of course, because the encapsulated stream can actually be a chain of streams, a BufferedInputStreamT> could be added to the encapsulated stream by the application, permitting the ReverseOutputStream to serve both needs. However, the performance savings are large enough that it seemed better to include the buffering in the reverse streams from the start. Listing 5.3 shows the implementation of the ReverseOutputStream class.


Listing 5.3. ReverseOutputStream.java.
/*
 * ReverseOutputStream.java       1.0 96/01/27 Glenn Vanderburg
 */

package COM.MCP.Samsnet.tjg;

import java.io.*;
 
/**
 * An output stream which encapsulates an input stream.  
 *
 * @version     1.0, 27 Jan 1996
 * @author      Glenn Vanderburg
 */

public
class ReverseOutputStream extends FilterOutputStream implements Runnable {

    // The 'out' variable, in our superclass, is used for the
    // PipedOutputStream which is our entrance to the input stream chain.
    PipedInputStream head;  // head of the encapsulated stream
    InputStream tail;       // Last in the input stream chain
    OutputStream sink;      // Our real output stream

    Thread readSide;
    IOException savedException = null;  // placed here by readSide;
    
    /**
     * Constructs a new ReverseOutputStream initialized with the
     * specified output stream.
     * @param in the output stream
     */
    public ReverseOutputStream(OutputStream out) throws IOException {
        super(new PipedOutputStream());
        head = new PipedInputStream();
        PipedOutputStream pout = (PipedOutputStream) this.out;
        pout.connect(head);
        sink = out;
    }

    /**
     * Returns the head of the input stream
     * @return the head of our encapsulated input stream
     */
    public InputStream attachPoint() {
        return head;
    }

    /**
     * Sets the encapsulated InputStream.
     * @param in the input stream
     */
    public void setInputStream(InputStream in) {
        tail = in;
        readSide = new Thread(this);
        readSide.start();
    }
          
    /**
     * Loops reading from 'tail' and writing to 'sink' until
     * the stream is closed.
     */
    public void run() {
        int l;
        byte b[] = new byte[1024];

        try {
            while ((l = tail.read(b)) > 0) {
                sink.write(b, 0, l);
            }
            sink.close();
        }
        catch (IOException e) {
            // Hand the exception over to the other thread,
            // so it can be rethrown there.
            savedException = e;
        }
    }

    /*
     * This class would be a lot shorter if it weren't for having
     * to rethrow exceptions in the main thread ...
     *
     * Comments are omitted for the following methods, to save
     * space.
     */

    public void write(int b) throws IOException {
        if (savedException != null) throw savedException;
        super.write(b);
    }

    public void write(byte b[]) throws IOException {
        if (savedException != null) throw savedException;
        super.write(b);
    }

    public void write(byte b[], int off, int len)
    throws IOException {
        if (savedException != null) throw savedException;
        super.write(b, off, len);
    }

    public void flush() throws IOException {
        if (savedException != null) throw savedException;
        super.flush();
    }

    public void close() throws IOException {
        if (savedException != null) throw savedException;
        super.close();
    }
}

Non-Stream I/O Classes

Although streams make up the majority of the classes in the java.io package, there are a couple of other classes that handle input and output that should be mentioned.

The RandomAccessFile class provides a view of a file that is not stream-oriented. Unlike the various stream classes, each of which is either an InputStream or an OutputStream (but not both), RandomAccessFile can be used to both read from and write to a single file. It provides methods for moving around in the file and for finding out the current location. There are methods for reading and writing data in units of bytes, just as in the stream classes, and there are also methods that support reading and writing all the fundamental Java datatypes (the same methods that are present in DataInputStream and DataOutputStream).

Another I/O Class that does not extend one of the stream classes is StreamTokenizer. In one sense, StreamTokenizer does provide a stream-like view of the data, but only tokens, not the actual data, can be read. This class is meant for parsing programming languages or other text-based data formats that obey grammatical rules. When you call the nextToken method, the return value is an integer that indicates the type of token that was encountered in the data: end-of-line, end-of-file, number, or word. Whitespace and comments are ignored, so the calling code never sees them. If the token is a word or a number, it's possible to find out the value of the word or the number, but the caller ultimately has no access to the real data stream. StreamTokenizer is configurable and has several methods that enable you to set the characters that are to be treated as word characters, whitespace characters, or comment delimiters. The class supports quoted words, so that characters that would not normally be included in a word (such as tab characters) can be included where necessary, and there is a method for setting the quotation marks (which default to ' and ").

Highly Structured Files

If your program needs to understand a highly structured binary file format, a special-purpose I/O class is a good place to start. The class should parse and understand the file format and present a specialized view of the file to the rest of the program.

Structured binary files are rarely read as streams; usually, the file formats are designed with internal pointers to permit programs to find and access specific parts of the file quickly, without having to read all of the file first. Depending on your needs, your binary file class may not be an extension of the RandomAccessFile class, but you will probably find yourself using that class somehow in your design.

Classes for reading structured files can be designed to work almost like an in-memory data structure, providing methods that return objects representing small, coherent segments of the data. Programs can use such classes as though they were data structures. This approach has the advantage that the messy details of the file format and I/O tasks are wrapped up nicely in the I/O class, and they don't complicate the rest of the program.

Furthermore, with such a design, it's easier to take full advantage of the random-access design of most binary file formats, reading and loading the file lazily; that is, portions of the file are read and parsed only when they are required by the program.

Here's an example. The Java class file format is a binary format. The overall structure of the file conforms to this C-like structure definition:

/*
 * WARNING: The Java class file format is specified using a C-like
 *          syntax, but it is not C!  Not only is the syntax not
 *          legal, but the file format obeys different rules than
 *          C structure definitions, and this fragment has been
 *          further simplified for the example.
 *
 *          Don't attempt to use this to actually read a Java
 *          class file.
 */

struct ClassFile {
    unsigned int magic;             /* should be 0xCAFEBABE */
    unsigned int version;           /* currently 45 */

    unsigned short constant_pool_count;
    struct cp_info constant_pool[constant_pool_count - 1];

    unsigned short access_flags;    /* public, private, native,
                                     * abstract, static, etc.
                                     */
    unsigned short this_class;      /* index into constant_pool */
    unsigned short super_class;     /* index into constant_pool */

    unsigned short interfaces_count;
    unsigned short interfaces[interfaces_count];

    unsigned short fields_count;
    struct field_info fields[fields_count];

    unsigned short methods_count;
    struct method_info methods[methods_count];

    unsigned short attributes_count;
    struct attribute_info attributes[attribute_count];
};

struct method_info {
    unsigned short access_flags;
    unsigned short name_index;
    unsigned short signature_index;

    unsigned short attributes_count;
    struct attribute_info attributes[attribute_count];
}

It's possible to write an I/O class (or collection of classes) that can make such a file look like a data structure. Here's one example of how such a class could be used, based on a hypothetical class definition:

JavaClass aClass = new JavaClass("Neato.class");

/*
 * Try to process a particular method specially
 */

try {
    // Prepare a representation of the method signature:
    String methSig[] = { "Component", "Image" };

    // Now look for the method:
    JavaMethod aMeth = aClass.method("showOff", methSig);

    // Do something useful with the method representation.
}
catch (NoSuchMethodException e) {
    // The class doesn't have a "showOff" method.
}

/*
 * Now loop through all of the methods
 */

for (Enumeration methods = aClass.methodlist();
     methods.hasMoreElements();
     ;) {

    // Process a single method here.
}

In a real implementation of a JavaClass class, the new instance could read the entire class file into a complicated in-memory data structure immediately upon initialization and simply return portions of the structure upon request. Alternatively, with only a little more effort, you could implement the class to do lazy reads. At initialization time, it would open the file, read basic header information, and perform some simple checks to verify that the file really was a Java class file. (Even those actions could be deferred, but it would be best to do them right away, in the interest of reporting common errors as soon as possible.)

When asked for information about a particular method, the class would first check to see whether the required information had already been loaded. If so, it could be returned right away. Otherwise, the JavaClass instance would move to the appropriate location in the file, read just enough data to learn about the particular method of interest, and build the method's data structure before returning it to the caller.

If you've been thinking about the details of how to implement such a class, you may have realized that the Java class file format really isn't very appropriate for lazy loading. There aren't enough internal pointers to permit finding the desired information without first reading most of the file anyway. However, it does illustrate some of the points involved. The Java class file format was chosen for this example because its basics, at least, will be familiar to many Java programmers, and most other binary file formats would have required more explanation.

Summary

The Java I/O library is powerful, versatile, and designed for extension. You can use the supplied classes for a wide variety of I/O tasks, and you can extend them when your needs go beyond the built-in capabilities.

Most Java I/O classes are based on classes that provide a stream-oriented interface to files, network connections, and other file-like objects. The Java library also contains filter streams, which can massage a stream as it is being read or written, altering it in some way or performing some special function such as buffering. You can write your own streams (this chapter presents two example filter streams).

There are also I/O classes which don't follow the stream model, for performing random-access I/O and for splitting a string into tokens. Building on those classes, you can build I/O classes for some files, hiding the fact that input and output are even happening, and making a file appear to be a memory-resident data structure.