Previous Table of Contents Next


Division of Labor

Speaking of primitive, let’s hearken back to the old days of the DOS PC. If you’ve worked with DOS at all, you know that about a zillion little programs, called TSR (Terminate and Stay Resident) programs, would jack themselves up into your PC’s memory and sometimes wreak havoc with your other programs. The same was true of the Apple Macintosh computer, only TSRs were called CDEVs and INITs—and not all of them would work with one another. Therefore, if you didn’t have another PC or Mac to compare with, you would have to start getting rid of them one at a time to see if whatever you were trying to do would start working.

Of course, most seasoned tech support operators used to recommend that the first thing you do is to “boot vanilla”—that is, boot without all those startup programs. Guess what they were actually telling you? Divide and conquer! Instead of dealing with each little program, you get rid of them all and then see if you still have problems. This is similar to how you dealt with the previous hub problem—you split them in half and try again until you find the offending program.


Here’s how you can “boot vanilla” from Windows 95:
  Start the computer in “safe” mode by pressing F8 right after you turn on your computer (but before you see the blue sky design of the Windows startup screen). You can choose Safe Mode or Safe Mode with Network, depending upon whether you can test whatever it is without the network.
  Press and hold down the Shift key after logging in (this applies to Windows NT as well).

Here’s how to “boot vanilla” from DOS:

  Press F8 after you turn on your computer. This will allow you to select which TSRs and/or device drivers to load.
  Back up your startup files by copying AUTOEXEC.BAT to AUTOEXEC.BAK and CONFIG.SYS to CONFIG.BAK; then, get rid of drivers and TSRs manually. You can copy the respective BAK files back to CONFIG.SYS and AUTOEXEC.BAT once you’re done.

Every single networked system you have in existence is going to benefit from this same technique. Even though you’re no longer in a single-tasking, non-networked environment, this technique still applies.

As I discussed in the last hour, you can find many problems in a networked environment just by thinking about the changes that have been made recently. However, sometimes changes aren’t under your control or you’re oblivious to them. Good examples of this include browser updates, plug-ins, and virus protection patterns. Therefore, when the change is not obvious, you have to stop banging your head against the wall and get back down to basics.

For example, let’s say everybody in your office starts having problems shutting down. They all get stuck at the “Please wait while your computer shuts down” screen. It seems, at first glance, that everybody is going to have to deal with it. Nobody has changed anything recently that they know of, and no one is capable of wading through the guts of what’s going on.

Even though this doesn’t seem like a network problem, the fact that it just started spontaneously on a bunch of networked computers seems very odd, so it gets dumped in your lap. Fortunately, you realize that even if social engineering does not reveal the source of the change, something has changed, and you can at least use the divide-and-conquer method to figure out what it is.

Because your office runs Windows 95, many of the programs that run at startup are in the Startup menu. You get rid of everything in the Startup menu and reboot. All of a sudden, you can shut down again. You return half of the programs to the Startup folder and keep restarting until you find the source of the problem. It turns out to be your email notification program. However, you decide to start up with just the email notification problem, and you’re able to shut down.

In this case, you’ve got an interaction problem, which is further solvable via the divide-and-conquer method. You put back in half of the programs that were in the startup file, and you manage to track the problem down to a situation where you have both the virus protection program and the email notification program loaded. As you might have guessed, this troubleshooting session actually happened to me—the virus protection program, which is automatically updated from the Internet, had started to interfere with the email notification program. A quick search of the vendor’s Web site found a patch for the email client (not the virus protection program) and an annoying problem was fixed.

Obviously, the divide-and-conquer method doesn’t always work to ultimately solve your problems. In particular, it’s tough to troubleshoot intermittent problems, as well as quantitative (rather than qualitative) problems that don’t involve a black-and-white (broken or not broken) scenario.


You can remember the difference between qualitative and quantitative by keeping in mind that qualitative is the analysis of the quality of a situation (as in, “My workstation cannot print at all”). Quantitative refers to the analysis of the quantity involved with a situation (as in, “My workstation is slower at printing than Sally’s”).

For example, the divide-and-conquer method might lead you to believe that a new application is causing your network slowdowns (and you might be right). However, it’s not always feasible to get rid of a new application, and, furthermore, it might not be clear whether the trouble is this particular application or just that the network itself is at a saturation point in general. In this case, you might try a different application, but can you really switch a largely deployed application in a short period of time? You’ll probably just end up checking the application to see if it’s misconfigured and taking measurements to ensure that the application is behaving properly on your network.

The bottom line is this: Even when the divide-and-conquer method can’t directly find your problem, it can at least point you in the right direction.


Previous Table of Contents Next