Sams Teach Yourself Network Troubleshooting in 24 Hours:Hour 3 The Delta Method: Identifying Network Change

The Outsiders

Two types of vendors who don’t have to get into your building to wreak havoc on your network are ISPs (Internet service providers) , who you rely on to surf the Web and send email, and, of course, telephone companies, who may connect your sites to each other via leased lines. Of these two, the telephone company is the much more mature vendor. Although the various phone companies take a lot of abuse, they’ve been doing this stuff for decades, and they tend to have good change-management policies in place. Consider that ISPs have only been around in their current form for less than a decade, and it’s easy to see why they still have growing pains and therefore a bum rap.

Mondays can be tough. ISPs and phone companies tend to make changes over the weekend when utilization is low. If something that worked on Friday doesn’t work on Monday, it’s time to pick up the phone and call the appropriate provider and ask what’s been changed? The answer will likely be “nothing,” but if you can verify that nothing has broken over the weekend at your end and hang in there, the problem may mysteriously vanish around lunch time.

For longer-term problems, you may have to convince them that there’s nothing wrong with your computer equipment (or figure out that there is something wrong and apologize for having doubted them). One way of doing this is to set up a test network; if two sites can’t talk, you might as well bring the equipment to the same site and connect them directly. Once you do this and see that it works, you have pretty compelling evidence for the outside provider that nothing has changed with your equipment and that something has changed between the two sites.

The Risk/Benefit Ratio

New programs are cool. They offer features not offered in older versions, and it’s fun to be the first one on your block to have them. Unfortunately, experience shows that for every new feature introduced, there are probably two new bugs in a product. The breakneck speed of Internet time means that software developers have unbearable pressure on them to be first to market. This usually translates into quick product testing, which means that the programs are released with at least a few bugs. Check out any software vendor’s Web site—you’ll see fixes posted for products that have been out for at least six months.

Because you have better things to do with your day than report these bugs to the software vendor, it’s a good idea to not be the first one on your block to put a new application or operating system on your network. Unless you desperately need the new features of a new product, you should wait six months after product release to start rolling it out. If you need to do it sooner than that, consider what surgeons call the risk/benefit ratio—the amount of risk compared to the potential benefits.

Once you decide you need to start using a new product, you’ll still want to make sure you aren’t going to have any problems with it right out of the gate. For example, many IT shops were using Windows 95 internally for the better part of a year before they rolled it out to the masses. (Of course, using a new operating system introduces a sea of changes; a year is typically a longer pilot-testing period than you’ll want for a new word processor or spreadsheet.) The most important part of pilot-testing is the concept of limited production. After you’ve played with the product in an isolated area, roll out a limited deployment—in other words, install it for a couple of folks who will use it for their daily work and see how it goes. If it goes well, you’re usually going to be fine. What’s more, if something goes wrong, you only have to roll back a limited number of folks.

Another aspect to keep in mind is the concept of incremental rollouts. This means that after a limited deployment, you start giving an application or system to more and more folks rather than doing it “all at once,” thus rolling it out in small chunks that get bigger as the rollout becomes more successful. For example, you might give five people a new application. Later, you give the application to 10 more people; then 15, 20, and in your final increment, you might be rolling out 30 people a week (once you’re sure that things are working fine). Using an incremental rollout ensures that if you have a problem early on, the least number of folks are affected.

Even if you don’t have problems during a rollout, a new application or device can produce secondary effects in another item that doesn’t seem to be related to the new item. Accordingly, a good rule of thumb is to shut down new items during network or communications trouble. The trouble might not be related to the new device or program that you’ve installed, but if you shut it down, you’ve ruled it out as the source of the trouble.

If the trouble goes away, you can then kick the problem back to the vendor you bought the offending item from (or to the manufacturer). However, make sure the problem is reproducible (that is, make sure it happens repeatedly when you reintroduce the program or device back into the network) before going to your vendor.

You should try to give your vendor as much information as possible, especially when using telephone support, so that the technician can attempt to re-create your situation in his or her shop. Again, backup documentation such as logs and incident reports are key—in fact, technical support tends to pay much more attention to you if you can put your problem in writing.

Summary

Many network problems are the result of human-initiated change. Finding this change involves documenting and communicating your own actions, as well as politely interviewing your coworkers and outside vendors. Even unintended changes due to the “fat finger factor” can seriously damage a network, so it’s worth considering where you’ve been, no matter how unrelated it might seem. You’ll also want to figure out where others have been; however, don’t rely solely on logbooks. (Although to document is divine, people aren’t perfect. They’ll sometimes forget to write down what they’ve done.)

Before deploying a new network toy, it’s worth considering whether the risk is worth the potential benefit. Risk is always much higher with new products—you’re best off waiting a couple of months before using what might be a pretty green product. Limited rollouts can also limit your potential network risk. You should also always think about a rollback plan, just in case things don’t go as expected with a new project.

Table of Contents