Previous Table of Contents Next


Name Game

Without name services, very little works. Although you can usually rule out a name service problem by trying the same operation by IP address rather than name, name services that fail to work on the server end can be tricky.


It’s important to realize that existing connections will continue to work; it’s only new ones that will typically fail.

A UNIX host typically performs a name lookup to whatever its name resolution host is after the connection is established in order to get the symbolic identity of the caller. Therefore, if name services have a problem, this can cause a domino effect that causes problems for other services.

I see a lot of problems in the field that are DNS related, even though some of them don’t seem to be at first. For example, after a name server dies on a given UNIX host, Telnet sessions to that host can take a long time to show a login prompt. That is, even though the connection is accepted, and the netstat –an output shows there’s a connection, something prevents the login prompt from being issued.

That something is the DNS server. The Telnet server is configured to look up all connecting Telnet addresses after they connect; although the name server is dead, the Telnet service keeps trying before it issues a login prompt. Depending on the operating system and Telnet implementation, this can result in long delays.

nslookup

The tool for checking name services is called nslookup. As the name implies, it can contact a name server (hence the ns in nslookup) and look up information from it. If you know your UNIX host is running named and should be answering name queries from itself and others, you can type this:

nslookup hostname hostname

You should get a response. If you get an error message, the named that runs on hostname is likely down.


Note that you can use nslookup on any host, not just the one that you’re logged into.

There’s also an interactive mode that’s most helpful for resolving complex DNS issues. I’ll go into this feature in Hour 19, “Internet/Intranet Troubleshooting.”

Pole Position

Sometimes a UNIX networking problem won’t be in the higher-level services and programs—in other words, there will be times when it’s not the “people behind the telephone,” but rather the telephone or phone system itself. In order to make this determination, you can use a couple techniques and commands.


When nothing seems to be working, you might have to find the UNIX console (the terminal that’s hard-wired to the server) and log in from there. If the console isn’t responsive, the UNIX server has locked up. This doesn’t have anything to do with the network—it’s a rare occurrence, but it does happen. Your only option here is to wince, turn the server off, and hope that it reboots okay.

Assuming that the console is working, it’s time to roll up your sleeves and see why your UNIX server can’t be seen from the network. Let’s work from the inside out, assuming that, like Dr. Freud, all analysis begins with the self. The basic notion here is that you need a network card to get to anything. Let’s check that first:

# ifconfig -a
lo     Link encap:Local Loopback
       inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0
       UP BROADCAST LOOPBACK RUNNING MTU:2000 Metric:1
       RX packets:0 errors:0 dropped:0 overruns:0
       TX packets:37231 errors:0 dropped:0 overruns:0

eth0   Link encap:10Mbps Ethernet HWaddr 00:00:C0:82:26:94
       inet addr:167.195.160.6 Bcast:167.195.160.255 Mask:255.255.255.0
       UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
       RX packets:816928 errors:0 dropped:0 overruns:0
       TX packets:654019 errors:0 dropped:0 overruns:0
       Interrupt:10 Base address:0x350 Memory:c8000-cc000

The key things to look for are the words up and running. If you don’t see them, something has caused your network card to go down. Some network cards will go down due to a bad port on a hub, so try switching the port. A reboot may be in order, or you may indeed have a bad network card. You can also try netstat -i to check the error count and/or run ethstat to see what types of errors you might be getting. This may point to a network problem rather than UNIX problem.

The next thing you need to find out is whether your machine can talk to itself. Try pinging it:

$ ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=8.5 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=6.8 ms
<CONTROL-C>
-127.0.0.1 ping statistics-
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 6.8/7.6/8.5 ms

Pay careful attention to that address: 127.0.0.1 is a special address, called the loopback address. If you can’t ping it, you have something really odd going on. Again, a reboot may be in order. This is a software loopback, so a problem here does not point to hardware.


The loopback address is a way for you to get your UNIX system to talk to its own TCP/IP programs rather than using the network card to communicate. When you successfully communicate through the loopback, you rule out the TCP/IP program (stack) as the cause of your trouble.

Next, try pinging your own network card. If, for example, your server’s TCP/IP number is 192.168.99.5, try this:

$ ping 192.168.99.5

The output from this should look similar to the output for the loopback ping.

If this works out okay, try pinging the router. If this doesn’t work, make sure you can see other workstations on the segment. Can they see the router? The router might be down, leading people on all other segments to assume that the UNIX host is down.


Previous Table of Contents Next