Previous | Table of Contents | Next |
This is actually the hardest part of this method: figuring out what to compare and getting your arms around the items that are proving the problem. For example, if many (but not all) users are reporting a problem, particularly an intermittent problem, it can seem overwhelming at first, because its hard to say where to start and which pieces of the network might be causing the problem. In cases like this, youll save your sanity if you start logging the calls and writing down in a tabular format what the problems are, when they happened, under what circumstances, and the configuration of the workstations and users involved. Such a chart might look like the one shown in Table 5.1.
Date | Time | Department | User | Problem | Type of PC |
---|---|---|---|---|---|
8/11 | 2:00 | Finance | Jack | Illegal operation errorWordPerfect | Clone/486 |
8/11 | 2:20 | Finance | Leona | Fatal exception 06 (during system boot). Seemed OK afterwards. | Dell/P6 |
8/11 | 3:00 | Finance | Tracy | Illegal operation errorWordPerfect | Clone/486 |
8/12 | 9:00 | Finance | Tracy | Illegal operation errorWordPerfect | Clone/486 |
8/12 | 10:45 | Finance | Jack | Illegal operation errorWordPerfect | Clone/486 |
8/12 | 1:00 | Finance | Jill | Locked up. Had to reboot when reading email. | Clone/486 |
8/13 | 11:00 | Finance | Jack | Illegal operation errorWordPerfect | Clone/486 |
8/13 | 12:45 | Finance | Bill | Illegal operation errorWordPerfect | Clone/486 |
By looking at this detailed log, it becomes apparent that illegal operation errors are what youre getting the most of. (If you had more incidents listed in the log, it would become even more apparent. The other items are not repetitive and amount to bumps in the road, not hard errors.) So now you know that some of these things are not like the others. The ones that are alike are your persistent errors that youre trying to get rid of. Why isnt Bill calling as much as Tracy or Jack? A quick phone call reveals that he was out of town on the 11th and 12th, so hes not as much of a wildcard as the log implies.
Lets look at a concrete example. I was recently involved in upgrading a department to Windows 95, which included upgrading to a new version of WordPerfect. The word from management was that this was a strategic change, making the possibility of rollback very small; it was up to us to make it work. Although the sample machines I had pilot-tested worked just fine, we started to have problems a day or so after the upgrade. Certain (not all) users began to report illegal operation errors. I was absolutely sure that all users were configured the sameidentical login scripts, file permissions, and home directory configurationbut, unfortunately, because we were using clone hardware, I wasnt as sure about the workstations.
I logged the incidents over several days and discovered that some users never reported the error. This indicated that the problem was not a moving target and that it was staying in the same places. This is important to establish; some errors do not pop up in the same places all the timethat is, they move from workstation to workstation. This typically indicates a systemic problem rather than a problem with the individual workstations.
Next, I saw that only the PCs that were clones were having a problem with the illegal operation errornone of the name-brand PC users had reported it. Finally, I saw that not all of the users who had clones were reporting errorsonly certain of the clones. This led us to believe that there were component problems with certain clones.
Obviously, one group of these things was not like the other! To rule out a user problem, I switched the PC of a user who didnt have the problem with the PC of a user who did have the problem. (This didnt endear me to either user, but it did tell me that the problem was definitely workstation related.)
I took inventory of the workstations that were acting up because I wanted to see what those workstations had in common. I made a new chart of the lower-level components (see Table 5.2). I left out Windows 95 and its components, because I had taken pains during the rollout to make sure that all the workstations were identical in this regard. I also didnt scan for viruses, because I rolled out a scanner along with the Windows installation.
Workstation | Mfr/Bios | Video Card | Hard drive | RAM | Hub/port |
---|---|---|---|---|---|
Jack | AMI | Brand-X | Quantum | 16MB | 8/2 |
Tracy | Phoenix | Brand-X | Connor | 16MB | 9/3 |
Bill | AMI | Brand-X | Seagate | 24MB | 8/1 |
Previous | Table of Contents | Next |