Chapter 23

Introduction to Network Programming

by Mike Fletcher


CONTENTS


One of the best features of Java is its networking support. Java has classes that range from low-level TCP/IP connections to ones that provide instant access to resources on the World Wide Web. Even if you have never done any network programming before, Java makes it easy.

The following chapters introduce you to the networking classes and how to use them. A guide to what is covered by each chapter follows:

Prerequisites

Although networking with Java is fairly simple, there are a few concepts and classes from other packages you should be familiar with before reading this part of the book. If you are interested only in writing an applet that interacts with an HTTP daemon, you probably can concentrate just on the URL class for now. For the other network classes, you need at least a passing familiarity with the World Wide Web, java.io classes, threads, and TCP/IP networking.

World Wide Web Concepts

If you are using Java, you probably already have a familiarity with the Web. You need some knowledge of how Uniform Resource Locators (URLs) work to use the URL and URLConnection classes.

java.io Classes

Once you have a network connection established using one of the low-level classes, you will use java.io.InputStream and java.io.OutputStream objects or appropriate subclasses of these objects to communicate with the other endpoint. Also, many of the java.net classes throw a java.io.IOException when they encounter a problem.

Threads

Although not strictly needed for networking, threads make using the network classes easier. Why tie up your user interface waiting for a response from a server when a separate communication thread can wait? Server applications also can service several clients simultaneously by spawning off a new thread to handle each incoming connection.

TCP/IP Networking

Before using the networking facilities of Java, you should be familiar with the terminology and concepts of the TCP/IP networking model. The next part of this chapter gets you up to speed.

Internet Networking: A Quick Overview

TCP/IP (Transmission Control Protocol/Internet Protocol) is the set of networking protocols used by Internet hosts to communicate with other Internet hosts. If you have ever had any experience with networks or network programming in general, you should be able to skim this section and check back when you find a term you are not familiar with. A list of references is given at the end of this section if you want more detailed information.

TCP/IP and Networking Terms

Like any other technical field, computer networking has its own set of jargon. These definitions should clear up what the terms mean:

The Internet Protocols

TCP/IP is a set of communications protocols for communicating between different types of machines and networks (hence the name internet). The name TCP/IP comes from two of the protocols: the Transmission Control Protocol and the Internet Protocol. Other protocols in the TCP/IP suite are the User Datagram Protocol (UDP), the Internet Control Message Protocol (ICMP), and the Internet Group Multicast Protocol (IGMP).

These protocols define a standard format for exchanging information between machines (known as hosts) regardless of the physical connections between them. TCP/IP implementations exist for almost every type of hardware and operating system imaginable. Software exists to transmit IP datagrams over network hardware ranging from modems to fiber-optic cable.

TCP/IP Network Architecture

There are four layers in the TCP/IP network model. Each of the protocols in the TCP/IP suite provides for communication between entities in one of these layers (see Figure 23.1). These lower-level layers are used by higher-level layers to get data from host to host. The layers are as follows, with examples of which protocols live at each layer:

Figure 23.1: The TCP/IP protocol stack.

Each layer in the stack takes data from the one above it and adds the information needed to get the data to its destination, using the services of the layer below. One way to think of this layering is like the layers of an onion. Each protocol layer adds a layer to the packet going down the protocol stack (see Figure 23.2). When the packet is received, each layer peels off its addressing to determine where next to send the packet.

Figure 23.2: Addressing information is added and removed at each layer.

Suppose that your Web browser wants to retrieve something from a Web server running on a host on the same physical network. The browser sends an HTTP request using the TCP layer. The TCP layer asks the IP layer to send the data to the proper host. The IP layer then uses the physical layer to send the data to the appropriate host.

At the receiving end, each layer strips off the addressing information that the sender added and determines what to do with the data. Continuing the example, the physical layer passes the received IP packet to the IP layer. The IP layer determines that the packet is a TCP packet and passes it to the TCP layer. The TCP layer passes the packet to the HTTP daemon process. The HTTP daemon then processes the request and sends the data requested back through the same process to the other host.

When the hosts are not on the same physical network, the IP layer handles routing the packet through the correct series of hosts (known as routers) until the packet reaches its destination. One of the nice features of the IP protocol is that individual hosts do not have to know how to reach every host on the Internet. The host simply passes to a default router any packets for networks it does not know how to reach.

For example, a university may have only one machine with a physical connection to the Internet. All the campus routers know to forward all packets destined for the Internet to this host. Similarly, any host on the Internet only has to get packets to this one router to reach any host at the university. The router forwards the packets to the appropriate local routers (see Figure 23.3).

Figure 23.3: An example of IP routing.

Note
A publicly available program for UNIX platforms called traceroute is useful if you want to find out what routers are actually responsible for getting a packet from one host to another and how long each hop takes. The source for traceroute can be found by consulting an Archie server for an FTP site near you, or from ftp://ee.lbl.gov.

The Future: IP Version 6

Back when the TCP/IP protocols were being developed in the early 1970s, 32-bit IP numbers seemed more than capable of addressing all the hosts on an internet. Although there currently is no lack of IP numbers, the explosive growth of the Internet in recent years is rapidly consuming the remaining unassigned addresses. To address this lack of IP numbers, a new version of the IP protocols is being developed by the IETF.

This new version, known as either IPv6 or IPng (IP Next Generation), will provide a much larger address space of 128 bits. This address space will allow for approximately 3.4 x 1038 different IP addresses. Where IP addresses used to be expressed as four decimal numbers (with values 0 to 255) separated by a period (.), as in 192.242.139.42, IPv6 addresses are expressed as eight groups of four hexadecimal digits separated by colons, like this:

5A02:1364:DD03:0432:0031:12CA:0001:BEEF

IPv6 will be backward compatible with current IP implementations to allow older clients to interoperate with newer ones. Provisions are contained in the protocol for tunneling IPv6 traffic over an IPv4 network (and vice versa). Other benefits of the new version are as follows:

Several new protocols are being added to the TCP/IP suite. The RTP (Real Time Protocol) and RTCP (Real Time Control Protocol) protocols provide support for applications such as video and audio conferencing. Some protocols are being done away with and the functionality they provide is being merged into other existing protocols. IGMP (Internet Group Membership Protocol), which provided support for membership in multicast groups, has been done away with; multicast membership is now handled with ICMP messages.

These enhancements to TCP/IP should allow the Internet to continue the phenomenal growth it has experienced over the past few years.

Where to Find More Information

This chapter was not meant to completely cover the subject of TCP/IP. If your curiosity has been piqued, the following online documents and books may be of interest to you.

RFCs

The first and definitive source of information on the IP protocol family are the Request For Comments documents defining the standards themselves. An index of all of RFC documents is available through the Web at http://ds.internic.net/ds/rfc-index.html. This page has pointers to all currently available RFCs (organized in groups of 100) as well as a searchable index.

Table 23.1 gives the numbers of some relevant RFCs and what they cover. Keep in mind that a given RFC may have been made obsolete by a subsequent RFC. The InterNIC site's index will note in the description any documents that were made obsolete by a subsequent RFC

Table 23.1. RFC documents.

RFC Number
Topic
791
The Internet Protocol (IPv4)
793
The Transmission Control Protocol (TCP)
768
The User Datagram Protocol 2(UDP)
894
Transmission of IP Datagrams over Ethernet Networks
1171
The PPP Protocol
1883
IP Version 6
1602
The Internet Standards Process: How an RFC Becomes a Standard
1880
Current Internet Standards

Books on TCP/IP

A good introduction to TCP/IP is the book TCP/IP Network Administration by Craig Hunt (O'Reilly and Associates, ISBN 0-937175-82-X). Although it was written as a guide for system administrators of UNIX machines, the book contains an excellent introduction to all aspects of TCP/IP, such as routing and the Domain Name Service (DNS).

Another book worth checking out is The Design and Implementation of the 4.3BSD UNIX Operating System by Samuel J. Leffler, et al. (Addison-Wesley, ISBN 0-201-06196-1). In addition to covering how a UNIX operating system works, it contains a chapter on the TCP/IP implementation.

If you are a beginner, another way to get started get started with TCP/IP is by reading Teach Yourself TCP/IP in 14 Days by Timothy Parker (Sams Publishing, ISBN 0-672-30549-6).

IPng and the TCP/IP Protocols by Stephan A. Thomas (John Wiley & Sons, ISBN 0-471-13088-5) offers an overview of version 6 of the Internet protocols.

Network Class Overview

This section gives a short overview of the capabilities and limitations of the different network classes provided in the java.net package. If you have never done any network programming, this section should help you decide what type of connection class you need to base your application. The overview will help you pick the Java classes that best fit your networking application. An overview of Java security, as it relates to network programming, is also provided.

Which Class Is Right for Me?

The answer to this question depends on what you are trying to do and what type of application you are writing. Each network protocol has its own advantages and disadvantages. If you are writing a client for someone else's protocol, the decision probably has been made for you. If you are writing your own protocol from scratch, the following should help you decide which transport method (and hence, which Java classes) best fit your application.

The URL Class

The URL class is an example of what can be accomplished using the other, lower-level network objects. The URL class is best suited for applications or applets that need to access content on the World Wide Web. If all you need to use Java for is writing Web browser applets, the URL and URLConnection classes in all likelihood will handle your network communications needs.

The URL class enables you to retrieve a resource from the Web by specifying the Uniform Resource Locator for it. The content of the URL is fetched and turned into a corresponding Java object (such as a String containing the text of an HTML document). If you are fetching arbitrary information, the URLConnection object provides methods that will try to deduce the type of the content either from the filename in the URL or from the content stream itself.

The Socket Class

The Socket class provides a reliable, ordered stream connection (that is, a TCP/IP socket connection). The host and port number of the destination are specified when the Socket is created.

The connection is reliable because the transport layer (the TCP protocol layer) acknowledges the receipt of sent data. If one end of the connection does not get an acknowledgment back within a reasonable period of time, the other end re-sends the unacknowledged data (a technique known as Positive Acknowledgment with Retransmission, often abbreviated as PAR). Once you have written data into a Socket, you can assume that the data will get to the other side (unless you receive an IOException, of course).

The term ordered stream means that the data arrives at the opposite end in the exact same order it is written. However, because the data is a stream, write boundaries are not preserved. What this means is that if you write 200 characters, the other side may read all 200 at once. It might get the first 10 characters one time and the next 190 the next time data is received from the socket. In any case, the receiver cannot tell where each group of data was written.

The reliable stream connection provided by Socket objects is well suited for interactive applications. Examples of protocols that use TCP as their transport mechanism are telnet and FTP. The HTTP protocol used to transfer data for the Web also uses TCP to communicate between hosts.

The ServerSocket Class

A ServerSocket class represents what Socket-type connections communicate with. Server sockets listen on a given port for connection requests when their accept() method is called. The ServerSocket offers the same connection-oriented, ordered stream protocol (TCP) that the Socket object does. In fact, once a connection has been established, the accept() method returns a Socket object to talk with the remote end.

The DatagramSocket Class

The DatagramSocket class provides an unreliable, connectionless, datagram connection (that is, a UDP/IP socket connection).

Unlike the reliable connection provided by a Socket, there is no guarantee that what you send over a UDP connection actually gets to the receiver. The TCP connection provided by the Socket class takes care of retransmitting any packets that get lost. Packets sent through UDP simply are sent out and forgotten, which means that if you need to know that the receiver got the data, you will have to send back some sort of acknowledgment. This arrangement does not mean that your data will never get to the other end of a UDP connection. If a network error happens (your cat jiggles the Ethernet plug out of the wall, for example), the UDP layer does not try to send it again or even know that the packet did not get to the recipient.

Connectionless means that the socket does not have a fixed receiver. You can use the same DatagramSocket to send packets to different hosts and ports; however, you can use a Socket connection only to connect to a given host and port. Once a Socket is connected to a destination, that destination cannot be changed. The fact that UDP sockets are not bound to a specific destination also means that the same socket can listen for packets as well as originating them. There is no UDP DatagramServerSocket equivalent to the TCP ServerSocket.

Datagram refers to the fact that the information is sent as discrete packets rather than as a continuous ordered stream. The individual packet boundaries are preserved. It may help to think of this process as dropping fixed-size postcards in a mailbox. If you send four packets, the order in which they arrive at the destination is not guaranteed to be the same in which they were sent. The receiver may get them in the same order they were sent, or the packets may arrive in reverse order. In any case, each packet is received whole.

Given the above constraints, why would anyone want to use a DatagramSocket? There are several advantages to using UDP:

The NFS (Network File System) protocol version 2, originally developed by Sun with implementations available for most operating systems, is an example application that uses UDP for its transport mechanism. Another example of an application in which a DatagramSocket may be appropriate is a multiplayer game. The central server must communicate with all the players involved and does not necessarily have to know that a position update got to the player.

Note
An actual game that uses UDP for communication is Netrek, a space combat simulation loosely based on the Star Trek series. Information on Netrek can be found using the Yahoo subject catalog at this URL:
http://www.yahoo.com/Recreation/Games/Internet_Games/Netrek/
There is also a Usenet newsgroup:
news:rec.games.netrek

Decisions, Decisions

Now that you know what the classes are capable of, you can choose the one that best fits your application. Table 23.2 sums up the type of connection each of the base networking classes creates. The Direction column indicates where a connection originates: Outgoing indicates that your application is opening a connection out to another host; Incoming indicates that some other application is initiating a connection to yours.

Table 23.2. Summary of low-level connection objects.

ClassConnection Type Direction
Socket Connected, ordered byte stream (TCP)Outgoing
ServerSocket Connected, ordered byte stream (TCP)Incoming
DatagramSocket Connectionless datagram (UDP)Incoming or Outgoing

You should look at the problem you are trying to solve, any constraints you have, and the transport mechanism that best fits your situation. If you are having problems choosing a transport protocol, take a look at some of the RFCs that define Internet standards for applications (such as HTTP or SMTP). One of them might be similar to what you are trying to accomplish. As an alternative, you can be indecisive and provide both TCP and UDP versions of your service, duplicating the processing logic and customizing the network logic. Trying both transport protocols with a pared-down version of your application can give you an indication of which protocol better serves your purposes. Once you've looked at these factors, you should be able to decide which class to use.

Java Security and the Network Classes

One of the purposes of Java is to enable executable content from an arbitrary network source to be retrieved and run securely. To accomplish this goal, the Java runtime enforces certain limitations on what classes obtained through the network may do. You should be aware of these constraints because they affect the design of applets and how the applets must be loaded. You must take into consideration whatever security constraints are imposed by your target environment and your development environment as well when you design your application or applet.

For example, Netscape Navigator 2.0 allows code loaded from local disk more privileges than code loaded over a network connection. A class loaded from an HTTP daemon may create only outgoing connections back to the host from which it was loaded. If the class is loaded from the local host (that is, if it is located somewhere in the class search path on the machine running Navigator), the class can connect to an arbitrary host. Contrast this with the applet viewer provided with Sun's Java Developers Kit. The applet viewer can be configured to act similarly to Navigator or to enforce no restrictions on network connectivity.

If you need full access to all Java's capabilities, there is always the option of writing a standalone application. A standalone application (that is, one not running in the context of a Web browser) has no restrictions on what it is allowed to do. Sun's HotJava Web browser is an example of a standalone application.

Note
For a more detailed discussion of Java security and how it is designed into the language and runtime, take a look at Chapter 35, "Java Security."
In addition, Sun has several white paper documents and a collection of frequently asked questions available at http://www.javasoft.com/sfaq/.

These checks are implemented by a subclass of java.lang.SecurityManager. Depending on the security model, the object will allow or deny certain actions. You can check beforehand whether a capability your applet needs is present by calling the SecurityManager yourself. The java.lang.System object provides a getSecurityManager() method that returns a reference to the SecurityManager active for the current context. If your applet needs to open a ServerSocket, for example, you can call the checkListen() method yourself and print an error message (or pop up a dialog box) alerting the users and referring them to installation instructions.

Summary

This chapter is a roadmap to the next four chapters. It has shown what concepts you need to be familiar with before you dive into network programming in Java. You should be comfortable with how TCP/IP networking operates in general (or at least know where to look for more information). You also should now have an idea of which Java class provides what function-ality.