-->

Previous | Table of Contents | Next

Page 587

Listing 28.2. resolv.

1#!/bin/perl

2 use Socket;

3  $addr = gethostbyname($ARGV[0]);

4  $dotfmt = inet_ntoa($addr);

5  print "$ARGV[0]: numeric: $addr dotted: $dotfmt\n";

Line 2 includes the Socket module included with Perl 5 distributions. This module is required for all the sample code included in this chapter, including Listing 28.1.

When you run this program, passing it a hostname that you want to see information on, you see something like the following:

$ ./resolv www.redhat.com

www.redhat.com: numeric: [unprintable characters] dotted: 199.183.24.253

Line 3 passes the name specified on the command line to gethostbyname(), which places the canonical address in $addr. This address is then passed to inet_ntoa(), which returns the same address in dotted-decimal format. (inet_ntoa is an abbreviation for internet number to ASCII.) You then print both values out in line 5. As you can see, the 32-bit address looks rather strange when printed.

NOTE
If your Linux workstation is not connected to the Internet, simply specify your own hostname to resolve or another hostname that is in your own /etc/hosts file or available to your workstation via DNS or NIS.
If your workstation is on the Internet and you see a different address for www.redhat.com, it just means that it has changed—after all, that is one of the reasons that DNS was
developed!

Network Services

Being able to locate a computer is a fundamental part of network communication, but it is not the only necessary component in an address. Why do you want to contact a specific host? Do you want to retrieve an HTML document from it? Do you want to log in and check mail? Most workstations, especially those running Linux or any other version of UNIX, provide more than one service to other nodes on a network.

Back in line 13 of Listing 28.1, you called the getservbyname() function. This function provides the other value used to form the complete network address. This value, referred to as a service port number, is the portion of the address that specifies the service or program that you want to communicate with.

Page 588

Like host addresses, service ports can be referred to by name instead of number. getservbyname() retrieves the number associated with the name specified from the file /etc/services. (If NIS is available, the number can also be retrieved from a network database.) Port numbers that are listed in this database are called well-known ports because, in theory, any host can connect to one of these services on any other because the numbers at least ought to remain consistent. The port numbers that are used by applications don't have to be listed in or retrieved from this database; it's just considered a good idea to list them in /etc/services and share them in order to prevent conflicts.

After you have retrieved the two components necessary to build a fully qualified address, you provide them to the sockaddr_in function, which builds a SOCKADDR_IN structure for us. SOCKADDR_IN is the programmatic representation of a network address needed for most socket system calls.

Sockets

Before you can use your addressing information, you need a socket. The socket() function in line 27 of Listing 28.1 illustrates how to create one. Some explanation of what sockets are and the types available to a program first will help explain the function.

Sockets are an Application Programming Interface (API) used for network communication. This API was first available with BSD UNIX for the VAX architecture in the early eighties, but has become prevalent in almost all UNIX versions and recently on Windows, along with a variety of other operating systems. System V UNIX has a different interface called the Transport Layer Interface (TLI), but even most system V UNIX versions, such as Solaris 2.x, provide socket interfaces. Linux provides a full implementation of the socket interface.

Socket applications treat network connections, or to be more exact, network endpoints, the same way most UNIX interfaces are handled—as file handles. The reason for the endpoint qualification is simple: Not all network sessions are connected, and referring to all network streams as connections can be incorrect and misleading. As a matter of fact, after a network endpoint is created and bound and/or connected, it can be written to, read from, and destroyed using the same functions as files. Because of this interface, socket programs tend to be portable between different versions of UNIX and frequently many other operating systems.

Protocols and Socket Types

The socket API is designed to support multiple protocols, called domains or families. Most UNIX versions support at least two domains: UNIX and Internet. (Two of the other domains are the Xerox Network system and ISO protocol suite.) UNIX domain sockets use the local workstation filesystem to provide communication between programs running on the same workstation only. Internet domain sockets use the Internet Protocol (IP) suite to communicate over the network. As you might guess, you will be concerned with Internet domain sockets.

Page 589

In the following call to socket(), you specify the scalar variable that you want to have the socket descriptor stored in and three values that describe the type of socket you want to have created—the protocol family, the socket type, and the protocol. I've already covered which protocol family you will use, which is PF_INET, for the Internet.

socket(CONNFD, PF_INET, SOCK_STREAM, $proto);

The possible socket types are SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_RDM, and SOCK_SEQPACKET. The last three are used for low level, advanced operations and are beyond the scope of this chapter.

SOCK_STREAM sockets are connected at both ends, they are reliable, and the messages between them are sequenced. The terms reliable and sequenced have special meanings in networking. Reliability refers to the fact that the network guarantees delivery: An application can write a packet with the understanding that it will arrive at the other end, unless the connection is suddenly broken by a catastrophic event, such as a host unexpectedly shutting down or the network literally breaking. In the event that the connection is broken, the application will receive timely notification. Sequencing means that all messages are always delivered to the other application in the order that they are sent.

SOCK_DGRAM sockets support connectionless and unreliable datagrams. A datagram is typically a fixed-length small message. Applications have no guarantees that datagrams will be delivered, and if they are, in what order. On the surface, it seems that no application would ever want to use SOCK_DGRAM, but as you will see, many applications do for good reasons.

The type of socket is very closely related to the protocol that is used. In the case of the Internet suite, SOCK_STREAM sockets always implement TCP, and SOCK_DGRAM sockets implement UDP.

The characteristics of the TCP protocol match the characteristics of SOCK_STREAM. TCP packets are guaranteed to be delivered barring a network disaster, such as the workstation on the other end of the connection dropping out, or the network itself suffering a serious, unrecoverable outage. Packets are always delivered in the same order that they are written. Obviously, these properties make the job of a network developer easy because a message can be written and essentially forgotten about, but there is a cost. TCP messages are much more expensive (demanding) than UDP messages in terms of both network and computing resources. The workstations at both ends of a session have to confirm that they have received the correct information, which results in more work for the operating system and more network traffic. The systems also have to track the order in which messages were sent, and quite possibly have to store messages until others arrive, depending on the state of the network "terrain" between the two workstations. (New messages can arrive while others are being retransmitted because of an error.) In addition, the fact that TCP connections are just that, connections, has a price. Every conversation has an endpoint associated with it, so a server that has more than one client has to arbitrate between multiple sockets, which can be very difficult. (See the section "I/O Multiplexing with TCP," later in this chapter, for details.)

Previous | Table of Contents | Next