Previous | Table of Contents | Next |
After Napoleon crowned himself Emperor, its said that Beethoven changed the dedication of his Eroica symphony, originally dedicated to Bonaparte, to read To the memory of a great hero. Ill try not to swell your head to that extent, but the divide-and-conquer methods in this hour are going to make you into a really great troubleshooter; it worked for Napoleon, and its going work for you.
Divide and conquer refers to the concept that the location of a given problem can be found more easily by splitting the problem area into smaller, manageable pieces. When you know that a problem is within a given area (say, a certain physical network or within a certain PC or user configuration), you can figure out which portion of that area its in by splitting the area into pieces.
Nine times out of ten, only one network problem is at work at any given time. Unless youve been struck by lightning recently, the odds of you having more than one problem simultaneously are very slim. (Thats not to say that domino effects dont exist, though.)
Because only one problem usually exists at a time, it should be easy to search for (even in a large network). For example, lets say that Im thinking of a number from one to one million, and you want to guess what that number is. If you proceed sequentially and guess every number, you could potentially go through 999,999 numbers before getting it right. However, if you divide the maximum number in half, and I tell you higher or lower, and you keep dividing that result in half, you will take at most only 23 guesses. Thats quite an improvement. Dont believe me? Heres an example:
In truth, this would probably go on for a couple of more guesses, but you get the idea. The range is initially a 1,000,000a huge range. Then it goes to 500,000, then 250,000, then 125,000, then 62,500, then 31,250, then 15,625, then 7812, and so on. You can see that you lose zeros pretty fast in only seven guesses; by the time you guess another seven times, youre down to only about 60 possibilities. That should come as no surprise. You can see in Figure 4.1 how fast dividing an area in half cuts down your search.
Figure 4.1 In a binary search, the search area gets smaller and smaller as the number of guesses progress.
If you want to impress your boss or look cool at a geek convention, you can refer to this method of guessing as a binary search. As a bonus, you can scrawl its mathematical representation onto the overhead projector:n=log2 (x)Here, x is the maximum number in the sequence, and n is the maximum number of guesses.
Now, lets look at what you might do when your network goes down. First of all, how do you know when the network is down? Is it a physical network problem or a server or router problem? The answer: divide and conquer. When youre presented with a problem, logically work your way from the entirety of your network (the whole range of numbers from one to a million) to the specific problem (265,625). Youll probably start by trying to figure out if theres only a problem with one person (a local problem) or with a large group of people (a systemic problem).
If you determine that its just one person, youre done with your systemic divide-and-conquer technique and can now proceed to workstation troubleshooting (which can require a combination of techniques, including divide and conquer). Otherwise, if its more than one person, you need to gather more information. Is everybody down? Usually not.
Ive actually seen a situation in which everybody was down due to an electrician accidentally pushing the emergency off button of an equipment rooms UPS. Heres one problem that caused a domino effect of a whole bunch of other problems, including no phone service, no network services, and general chaos. This is pretty unusual, and its the exception rather than the rule.As network infrastructures are becoming as important as the telephone system, people are starting to get very paranoid about putting all their eggs in one basket. Many servers now have extra (redundant) fault-tolerant power supplies so that if one power supply breaks, the server stays up. Because servers are usually connected to a UPS (uninterruptible power supply or battery backup unit), its desirable to put each server power supply on a different UPS; otherwise, one broken UPS could take down the server. This might be expensive, but so is downtime. Whether your company has redundant power really depends on whether your management is committed to spending the money to get as much reliability and fault tolerance as possible.
Once you determine which functional group is not working properly, its time to haul out those maps you so diligently drew after you mastered Hour 2, You Cant Have Too Much Documentation! People will tell you that they cant log into the server, that their drive letters are gone, or even that all the PCs in their area have locked up. Youll have to find out from those who call in whether its just them or if it affects everybody. Youll also want to find out which department theyre in. Alternatively, you can take a look for yourself: If all the PCs were able to connect to the network a few days ago but now they cant, then for all intents and purposes, these PCs are down, regardless of whether its an Ethernet problem or a problem with a switch or server.
Previous | Table of Contents | Next |