Previous | Table of Contents | Next |
Without name services, very little works. Although you can usually rule out a name service problem by trying the same operation by IP address rather than name, name services that fail to work on the server end can be tricky.
Its important to realize that existing connections will continue to work; its only new ones that will typically fail.
A UNIX host typically performs a name lookup to whatever its name resolution host is after the connection is established in order to get the symbolic identity of the caller. Therefore, if name services have a problem, this can cause a domino effect that causes problems for other services.
I see a lot of problems in the field that are DNS related, even though some of them dont seem to be at first. For example, after a name server dies on a given UNIX host, Telnet sessions to that host can take a long time to show a login prompt. That is, even though the connection is accepted, and the netstat an output shows theres a connection, something prevents the login prompt from being issued.
That something is the DNS server. The Telnet server is configured to look up all connecting Telnet addresses after they connect; although the name server is dead, the Telnet service keeps trying before it issues a login prompt. Depending on the operating system and Telnet implementation, this can result in long delays.
The tool for checking name services is called nslookup. As the name implies, it can contact a name server (hence the ns in nslookup) and look up information from it. If you know your UNIX host is running named and should be answering name queries from itself and others, you can type this:
nslookup hostname hostname
You should get a response. If you get an error message, the named that runs on hostname is likely down.
Note that you can use nslookup on any host, not just the one that youre logged into.
Theres also an interactive mode thats most helpful for resolving complex DNS issues. Ill go into this feature in Hour 19, Internet/Intranet Troubleshooting.
Sometimes a UNIX networking problem wont be in the higher-level services and programsin other words, there will be times when its not the people behind the telephone, but rather the telephone or phone system itself. In order to make this determination, you can use a couple techniques and commands.
When nothing seems to be working, you might have to find the UNIX console (the terminal thats hard-wired to the server) and log in from there. If the console isnt responsive, the UNIX server has locked up. This doesnt have anything to do with the networkits a rare occurrence, but it does happen. Your only option here is to wince, turn the server off, and hope that it reboots okay.
Assuming that the console is working, its time to roll up your sleeves and see why your UNIX server cant be seen from the network. Lets work from the inside out, assuming that, like Dr. Freud, all analysis begins with the self. The basic notion here is that you need a network card to get to anything. Lets check that first:
# ifconfig -a lo Link encap:Local Loopback inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 UP BROADCAST LOOPBACK RUNNING MTU:2000 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 TX packets:37231 errors:0 dropped:0 overruns:0 eth0 Link encap:10Mbps Ethernet HWaddr 00:00:C0:82:26:94 inet addr:167.195.160.6 Bcast:167.195.160.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:816928 errors:0 dropped:0 overruns:0 TX packets:654019 errors:0 dropped:0 overruns:0 Interrupt:10 Base address:0x350 Memory:c8000-cc000
The key things to look for are the words up and running. If you dont see them, something has caused your network card to go down. Some network cards will go down due to a bad port on a hub, so try switching the port. A reboot may be in order, or you may indeed have a bad network card. You can also try netstat -i to check the error count and/or run ethstat to see what types of errors you might be getting. This may point to a network problem rather than UNIX problem.
The next thing you need to find out is whether your machine can talk to itself. Try pinging it:
$ ping 127.0.0.1 PING 127.0.0.1 (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=8.5 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=6.8 ms <CONTROL-C> -127.0.0.1 ping statistics- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 6.8/7.6/8.5 ms
Pay careful attention to that address: 127.0.0.1 is a special address, called the loopback address. If you cant ping it, you have something really odd going on. Again, a reboot may be in order. This is a software loopback, so a problem here does not point to hardware.
The loopback address is a way for you to get your UNIX system to talk to its own TCP/IP programs rather than using the network card to communicate. When you successfully communicate through the loopback, you rule out the TCP/IP program (stack) as the cause of your trouble.
Next, try pinging your own network card. If, for example, your servers TCP/IP number is 192.168.99.5, try this:
$ ping 192.168.99.5
The output from this should look similar to the output for the loopback ping.
If this works out okay, try pinging the router. If this doesnt work, make sure you can see other workstations on the segment. Can they see the router? The router might be down, leading people on all other segments to assume that the UNIX host is down.
Previous | Table of Contents | Next |