Sams Teach Yourself Network Troubleshooting in 24 Hours:Hour 4 The Napoleon Method: Divide and Conquer

Table of Contents

Take a look at where these folks are on your functional maps, and see what they have in common or where they connect through. When you do this, it will probably become painfully obvious where the problem is. If two departments are saying that they are down, it might be that the server they use is down—check your detail documentation! On the other hand, if you see from your functional map that all the groups that go through a particular router are down, it’s time to check that router. If users from just one segment are calling, it’s most likely a problem with the physical network segment.

Physical network segments (or “a bunch of hubs” to use plain language) are exquisitely suited to troubleshooting via the divide-and-conquer method. Why does a physical network segment go down? Usually, because these are shared networks (or “party lines”), the trouble is that somebody is babbling incessantly and not letting anybody else talk (see the Ethernet and Token-Ring hours for specifics). All sorts of fancy new technologies are built into “smart hubs” to detect this and stop it; therefore, this sort of problem isn’t as common today.

You’ll still see one workstation bring an entire segment down sometimes if you’re not totally using switches on your network, because even smart hubs are not as smart as you are (nor are they as smart as manufacturers like to think they are). In other words, there’s more than one way a given station can take down the segment.

Segment Searching

Because you’re treating a complex system as a series of simpler systems, without needing to know the specifics of each, you don’t care why the segment is down, you just care that it is down. The first thing you can do once you identify that a physical network is down is to refer to your physical documentation to see how many hubs or wire centers are involved with this physical network. These are the basic building blocks of the physical network. Because there are usually not very many of them in comparison to number of PCs on the network, it makes sense to start here. If you have a small number of hubs, it’s okay to isolate one at a time to see when the problem goes away. (When it does, you’ve found the problem hub.)

If you have an untenably large number of hubs connected together, you’ll want to use the kind of divide-and-conquer method you used when guessing my number from one to a million—it’s a lot faster. Cut off half of them and see if a workstation connected to the remaining hubs can get in. If it can’t, you’ve found the trouble segment. Otherwise, you’ll have to try the other half. Continue dividing the troubled segment in half and then in half again. Soon you’ll have found the hub that contains the trouble. At this point, you can divide and conquer the hub itself, taking out ports until you find the one that’s causing the trouble (see Figure 4.2).

Figure 4.2 An Ethernet hub with 32 ports requires no more than five guesses.

Unlike a shared Token-Ring (where all hubs are connected in a circle, as shown in Figure 4.3), each Ethernet hub is connected to another in a straight line (as shown in Figure 4.4). This means that if you isolate one, you isolate those below it. You can handle this by bypassing the hub you want to isolate and plugging directly into the next hub in the chain. However, be careful that your “cascade” hub is clearly labeled before you start unplugging things.

Figure 4.3 A Token-Ring network.

Figure 4.4 An Ethernet network.

When you find the port that’s causing the problem—and even when you first find the hub that has the problem port—make sure you perform a “control” experiment on it (that is, put it back into the main network and make sure it’s still causing the problem).

If you isolate the hub that has the “network glue” on it—that is, the server or the router that connects you to the server—the other hubs won’t be able to “see” any of the network and will remain “down.” If you suspect this hub, it’s best to divide and conquer on a port level rather than taking out the entire hub (that is, after verifying that the server or router is okay). If you don’t, you’ll keep the server or router from talking to the rest of the workstations on the segment, and you’ll assume that taking this hub out does not fix the problem (because the stations will still be unable to reach the server or router). In fact, the problem might be with one of the stations on the server hub, which you’ll find if you take them out by port rather than by the entire hub.
If you must take out the hub that the server or router is connected to, make sure to test whether it’s up or down by checking your documentation and seeing whether a workstation connected to this hub is up.

Once you find the port, it’s time to refer to your physical documentation, figure out which node on the network the port belongs to, and then determine the local problem. In most cases, you’ll find these types of problems to either be a mangled network cable, a bad network card, or even just a PC that’s locked up.

Again, “smart hubs” will automatically figure out certain kinds of problems, and performing this process is somewhat primitive in this era of “automatic transmission” shared networking. However, it’s good to know how to drive a standard transmission (just in case you have to); plus, if you know how to troubleshoot a “standard,” you’re that much better at understanding how an “automatic” works.

Table of Contents