-->
Page 583
by Eric Goebelbecker
Page 584
"The computer industry has become firmly and irrevocably centered around the network during the past five years or so."
The preceding statement, and several thousand variations on it, has served as the opening for probably tens of thousands magazine articles, editorials, and book chapters since 1995. The subject less frequently used as a topic is this: How are the applications that utilize these networks written? How do computers actually communicate over a network?
Networking and Linux are a natural combination. After all, Linux is a product of the Internet itself because most of the developers collaborated (and still do collaborate) across the world over e-mail, the World Wide Web, and Usenet news. In addition, Linux is based on UNIX, one of the operating systems that many common computer networking technologies were developed on.
Linux is an excellent platform for networking programming because it has mature and full functional networking features. Because Linux provides full support for the sockets interface, most programs developed on other version of UNIX will build and run on Linux with little or no modifications. Textbooks and documentation about UNIX networking are fully applicable to Linux also.
This chapter uses Perl examples to introduce network programming concepts and shows how to create functioning network programs for Linux quickly and easily. Perl was selected because it enables you to focus on network programming concepts instead of application development issues and programming environments. The scripts referred to in the tutorials are also included on the CD-ROM that accompanies this book. Note that when these scripts were developed, the emphasis was on illustrating key network programming concepts, not programming style, robustness, or how to program in Perl. Only basic knowledge of Perl is required to understand the examples, and they are certainly clear enough for C or C++ programmers to follow. For detailed information on the Perl language and how to use it for a wide variety of tasks, see Chapter 24, "Perl Programming."
This chapter is by no means exhaustive because the time and space allotted doesn't allow for coverage of concepts such as protocol layering and routing. This chapter is intended to serve as an introductory tutorial to network programming, with an emphasis on hands-on exercises.
This section covers the fundamentals of networking. You will learn what the necessary components of network communication are and how a program uses them to build a connection by following a simple program that retrieves networking information and uses it to connect to another program. By the end of this section, you should have a good understanding of network addresses, sockets, and the differences between TCP (Transmission Control Protocol) and its counterpart UDP (User Datagram Protocol).
Listing 28.1 contains a Perl function that creates a connection to a server using TCP. You can find this function in network.pl on the CD-ROM.
Page 585
Listing 28.1. makeconn()creating a TCP connection.
1: sub makeconn { 2: 3: my ($host, $portname, $server, $port, $proto, $servaddr); 4: 5: $host = $_[0]; 6: $portname = $_[1]; 7: 8: # 9: # Server hostname, port and protocol 10: # 11: $server = gethostbyname($host) or 12: die "gethostbyname: cannot locate host: $!"; 13: $port = getservbyname($portname, `tcp') or 14: die "getservbyname: cannot get port : $!"; 15: $proto = getprotobyname(`tcp') or 16: die "getprotobyname: cannot get proto : $!"; 17: 18: # 19: # Build an inet address 20: # 21: $servaddr = sockaddr_in($port, $server); 22: 23: 24: # 25: # Create the socket and connect it 26: # 27: socket(CONNFD, PF_INET, SOCK_STREAM, $proto); 28: connect(CONNFD, $servaddr) or die "connect : $!"; 29: 30: return CONNFD; 31: }
I can summarize this procedure in three essential steps:
The network address is built by retrieving address information in lines 11 and 13, and then assembling it in line 21. In line 27, you create the socket, using protocol information retrieved in line 15. (The protocol information, however, can actually be considered part of the address as you'll see.) In line 28, you finally establish the connection.
The steps involved in building a network address and connecting to it provide a framework for observing how network communication works. I'll spend some time covering each part of this process in order to better prepare you for the hands-on tutorials.
If you've ever configured a PC or workstation for Internet connectivity, you have probably seen an Internet address (or IP address) similar to 192.9.200.10 or 10.7.8.14. This is called
Page 586
dotted-decimal format and, like many things in computing, is a representation of network addresses that are intended to make things easier for humans to read. The notation that computers, routers, and other internet devices actually use to communicate is a 32-bit number, often called a canonical address. When this number is evaluated, it is broken down into four smaller 8-bit (one byte) values, much the way the dotted-decimal format consists of four numbers separated by decimals.
An internetwork, or internet for short, consists of two or more networks that are connected. In this case, the word internet refers to any two networks, not the Internet, which has become a proper name for the network that encompasses most of the world. The Internet Protocol (IP) was designed with this sort of topography in mind. In order for an internet address to be useful, it has to be capable of identifying not only a specific node (computer), but also which network it resides on. Both bits of information are provided in the 32-bit address. Which portion of the address is related to each component is decided by the netmask that is applied to the address. Depending on an organizations needs, a network architect can decide to have more networks or more addresses. For details on subnetting networks, see Chapter 13, "TCP/IP Network Management." For the sake of network programming, it's sufficient to know the information stored in an internet address and that individual workstation netmasks have to be correct in order for a message to be successfully delivered.
Dotted-decimal format is easier to read than 32-bit values (especially because many of the possible values can't be printed or would work out to some pretty ponderous numbers), but most people would rather use names than numbers because gandalf or www.yahoo.com is a lot easier to remember than 12.156.27.4 or 182.250.2.178. For this reason, the notion of hostnames, domain names, and the domain name system were devised. You can get access to a database of name-to-number mappings through a set of network library functions, which provide host (node) information in response to names or numbers. For example, in line 11 of Listing 28.1, you retrieve the address associated with a name with one of these functions, gethostbyname().
Depending on the host configuration, gethostbyname() can retrieve the address associated with a name from a file, /etc/hosts, from the Domain Name System (DNS) or from the Network Information System (NIS or Yellow Pages). DNS and NIS are network-wide services that administrators use to implify network configuration because adding and updating network address numbers from a central location (and maybe a backup location) is obviously a lot easier than updating files on every workstation in their organization. These systems are also useful for internetworks because the address of a remote host can be determined when it is needed by making a DNS request, rather than needing to exchange configuration files in advance.
One other advantage of using names is that the address that a name is associated with can be changed without affecting applications because the application need only know the name; the address can be discovered at runtime.
To illustrate the use of the gethostbyname() function and the difference between dotted-
decimal formatted addresses and canonical addresses, try the script in Listing 28.2, called
resolv on the CD-ROM.