Previous Table of Contents Next


Hour 4
The Napoleon Method: Divide and Conquer

After Napoleon crowned himself Emperor, it’s said that Beethoven changed the dedication of his Eroica symphony, originally dedicated to Bonaparte, to read “To the memory of a great hero.” I’ll try not to swell your head to that extent, but the divide-and-conquer methods in this hour are going to make you into a really great troubleshooter; it worked for Napoleon, and it’s going work for you.

Divide and conquer refers to the concept that the location of a given problem can be found more easily by splitting the problem area into smaller, manageable pieces. When you know that a problem is within a given area (say, a certain physical network or within a certain PC or user configuration), you can figure out which portion of that area it’s in by splitting the area into pieces.


Nine times out of ten, only one network problem is at work at any given time. Unless you’ve been struck by lightning recently, the odds of you having more than one problem simultaneously are very slim. (That’s not to say that domino effects don’t exist, though.)

The Numbers Game

Because only one problem usually exists at a time, it should be easy to search for (even in a large network). For example, let’s say that I’m thinking of a number from one to one million, and you want to guess what that number is. If you proceed sequentially and guess every number, you could potentially go through 999,999 numbers before getting it right. However, if you divide the maximum number in half, and I tell you “higher” or “lower,” and you keep dividing that result in half, you will take at most only 23 guesses. That’s quite an improvement. Don’t believe me? Here’s an example:

Me: Okay, I’m thinking of a number from 1 to 1,000,000.
You: 500,000?
Me: No, lower.
You: 250,000?
Me: No, higher.
You: (pausing to calculate 250,000 / 2 + 250,000) 375,000?
Me: No, lower.
You: (getting mad at me for picking a number between 250,000 and 375,000) Let’s see, there are 125,000 numbers between 250,000 and 375,000, so the middle of that would be…312,500?
Me: No, lower.
You: (whipping out your calculator) 281,250?
Me: No, lower.
You: (getting good at this now) 265,625?
Me: (astonished) How’d you guess?

In truth, this would probably go on for a couple of more guesses, but you get the idea. The range is initially a 1,000,000—a huge range. Then it goes to 500,000, then 250,000, then 125,000, then 62,500, then 31,250, then 15,625, then 7812, and so on. You can see that you lose zeros pretty fast in only seven guesses; by the time you guess another seven times, you’re down to only about 60 possibilities. That should come as no surprise. You can see in Figure 4.1 how fast dividing an area in half cuts down your search.


Figure 4.1  In a binary search, the search area gets smaller and smaller as the number of guesses progress.


If you want to impress your boss or look cool at a geek convention, you can refer to this method of guessing as a binary search. As a bonus, you can scrawl its mathematical representation onto the overhead projector:
n=log2 (x)

Here, x is the maximum number in the sequence, and n is the maximum number of guesses.


Your Waterloo: The Network Is Down!

Now, let’s look at what you might do when your network goes down. First of all, how do you know when the network is down? Is it a physical network problem or a server or router problem? The answer: divide and conquer. When you’re presented with a problem, logically work your way from the entirety of your network (the whole range of numbers from one to a million) to the specific problem (265,625). You’ll probably start by trying to figure out if there’s only a problem with one person (a local problem) or with a large group of people (a systemic problem).

If you determine that it’s just one person, you’re done with your systemic divide-and-conquer technique and can now proceed to workstation troubleshooting (which can require a combination of techniques, including divide and conquer). Otherwise, if it’s more than one person, you need to gather more information. Is everybody down? Usually not.


I’ve actually seen a situation in which everybody was down due to an electrician accidentally pushing the emergency off button of an equipment room’s UPS. Here’s one problem that caused a “domino effect” of a whole bunch of other problems, including no phone service, no network services, and general chaos. This is pretty unusual, and it’s the exception rather than the rule.

As network infrastructures are becoming as important as the telephone system, people are starting to get very paranoid about putting all their eggs in one basket. Many servers now have extra (redundant) fault-tolerant power supplies so that if one power supply breaks, the server stays up. Because servers are usually connected to a UPS (uninterruptible power supply or battery backup unit), it’s desirable to put each server power supply on a different UPS; otherwise, one broken UPS could take down the server. This might be expensive, but so is downtime. Whether your company has redundant power really depends on whether your management is committed to spending the money to get as much reliability and fault tolerance as possible.


Once you determine which functional group is not working properly, it’s time to haul out those maps you so diligently drew after you mastered Hour 2, “You Can’t Have Too Much Documentation!” People will tell you that they can’t log into the server, that their drive letters are gone, or even that all the PCs in their area have locked up. You’ll have to find out from those who call in whether it’s “just them” or if it affects everybody. You’ll also want to find out which department they’re in. Alternatively, you can take a look for yourself: If all the PCs were able to connect to the network a few days ago but now they can’t, then for all intents and purposes, these PCs are “down,” regardless of whether it’s an Ethernet problem or a problem with a switch or server.


Previous Table of Contents Next