Chapter 1

Understanding Web Technologies


CONTENTS


I'd like to begin each chapter with a brief overview of what's ahead. These lists might help you determine whether you would like to skim through any material that you are already familiar with.

By now it's hard to imagine how anyone can have missed learning at least something about the World Wide Web and the Internet. Mass-circulation newspapers and magazines and broadcast media feature the Internet regularly. You often see Web page addresses (known as Uniform Resource Locators, or URLs) in television commercials and printed advertisements. The story of the meteoric rise of Netscape Communications Corporation on the stock market jumped from the financial page to the front page. Universities, businesses, and other organizations have rushed to "get on the Web," while entrepreneurs have moved equally quickly to take advantage of this rush by setting up shop on the shoulders of the Information Superhighway, hawking everything from Internet connections to Web-page authoring to Web-related conferences.

You probably bought this book because you've been using the Web and you see the potential to use its technology on an Intranet within your organization. If you're wondering what an Intranet is, you might think of it as "the Web on a LAN" (Local Area Network). Obviously, there is much more to it than such a simplistic definition. The rest of this book is all about exploring the limitless possibilities of building and using an Intranet. Using Windows NT as the server, of course.

Note
For now, I assume you know the meaning of Web URLs, such as http://www.somecompany.com. If not, you might want to scan ahead to Chapter 5, "What You Need to Know About HTML" for an in-depth discussion about the different flavors of URLs and how they work.

This chapter is an overall introduction to the Web, and it lays a foundation for the rest of the book. Because you've probably seen similar introductory material before, I'll put a particular spin on the whole subject in this chapter by pointing to some of the things you'll be able to do on your Intranet. As you read the chapter, think of using your Web browser within your company to view your own company information instead of outside Web pages. Your corporate Intranet, then, is the implementation of World Wide Web services within your organization.

Overview of the World Wide Web and the Internet

The explosion of interest in the Internet is being driven by an even more explosive growth of the Web. Nevertheless, and this isn't meant merely in a pedantic sense, the Internet was here first and has been for more than 20 years. The Internet can be loosely defined as those computers and networks, worldwide, that are interconnected using TCP/IP (Transmission Control Protocol/Internet Protocol).

A Brief History of TCP/IP and the Internet

In the 1970s, the United States Department of Defense (DoD) contracted with researchers at the University of California at Berkeley and a company named BBN to develop networking for DoD computers worldwide. The primary objectives of the research project were to develop computer networking that

More than after-the-fact Cold War speculation, the last of the points relates to the possibility of large parts of the DoD network disappearing in a nuclear war and the need for the network to withstand it. In fact, today's Internet does exactly that: If a large portion of the network were to disappear because of some massive hardware failure, the rest of the network would simply find a way around the service interruption and keep on working! It's pretty amazing.

Even though the DoD funded most of the development of what came to be known as TCP/IP networking, the free thinkers at Berkeley managed to get permission to redistribute the network software they developed and the specifics of its protocols written into the contract with DoD. At about the same time, Berkeley was developing its own revised version of the UNIX operating system software, which it had licensed from AT&T (where UNIX was invented) as a research project. In short, TCP/IP networking was dropped right into BSD (Berkeley Software Distribution) UNIX, which was then made available to other academic institutions, also for research purposes, for the mere cost of a computer tape.

The wide distribution of these BSD tapes to other colleges, universities, and research institutions was the beginning of the Internet. TCP/IP networking not only allows individual computers to be linked into a network, but it also allows networks of computers to be linked to other networks with the appearance that all the computers on all the linked networks are on the very same internet.

Note
The word internet with a lowercase i refers to interconnected networks, perhaps on a university campus; whereas Internet with a capital I refers to the global interconnected network in which anyone can participate.

Universities began building local networks, linking them together, and connecting their local networks with remote networks at other locations or other institutions-laying the foundation for today's Internet explosion. The DoD built its own private Internet, called MILNET, using TCP/IP, and many other U.S. Government agencies set up networks as well, some of which eventually became part of the Internet.

TCP/IP Implementations

Because the implementation nuts and bolts of TCP/IP networking (that is, the detailed descriptions of the network protocols themselves) were publicly defined in documents known as Requests for Comments, software companies and individuals were free to develop and sell or give away their own TCP/IP software. For example, the first implementation of TCP/IP for the IBM pc was a university Master's thesis project, and the resulting software was given away; the authors went on to found FTP Software, Inc., makers of one of today's leading TCP/IP software packages, OnNet for IBM pcs and compatibles. Dozens of other vendors sell TCP/IP software for pcs and Microsoft has wisely built it into both Windows NT and Windows 95 as a standard feature.

Note
If the term network protocol is unfamiliar to you, you might think of it as a language that one computer on the Internet can use to speak with any other computer on the Internet, even if they are of a completely different make and model. Note that the usual purpose of the phrase computer language describes a language between a human and a computer, but network protocols aren't really meant to be read by humans.

Although the Internet has strong foundations in UNIX, Windows NT and Windows 95 are not only capable of serving as powerful Internet or Intranet platforms, but most folks find them much easier to manage. One reason this book is written about Windows is that the number of desktop pcs running these platforms far surpasses those running all others combined. Consequently, any Intranet project will need to integrate Windows applications. This book will show you how to configure the pcs in your organization using TCP/IP and Web technology to participate on your Intranet.

In an effort to compete against Microsoft's dominance of the desktop operating systems market, several vendors (including IBM, Sun, and Oracle) have announced plans to develop inexpensive computing devices that will have TCP/IP networking built in. Not full-blown pcs, but also not dumb terminals, these Internet Appliances would include not only TCP/IP, but also graphical capabilities and World Wide Web browser software. These appliances could prove to be a valuable part of your Intranet, because they'd give users access to any Web services you might make available; and at substantially lower cost than full-capability pcs or workstations. Microsoft Vice President Paul Maritz has indicated his view that these devices will not ultimately prove to be successful at displacing the popular pc. Although that must be taken as a biased opinion, it does appear that the impact of Internet Appliances will depend on several factors:

Only time will answer these questions. We should hear much more as the concept and reality of Internet Appliances progresses in 1996.

Most people are familiar with the general idea of a computer network; several computers in an office or other common environment are connected together with wires to enable sharing of printers and files, and to otherwise allow communication among them. The idea of the Internet is much the same, only a lot bigger, but it also has an important extra element. TCP/IP networking allows not only the connection of local computers to each other, but also permits networks to be connected to other networks. These connections create internets (purposely not capitalized here), in which it appears to users that the computers on all the connected networks are part of a single, large internetwork. The same capabilities of sharing devices and communicating data between computers exists, but the sharing has been extended from just the computers on one network to all the systems on all the connected networks.

Interconnected networks need not be in the same location or building; they can be physically remote from each other with connections using special-purpose data lines, satellite radio, infrared radio links, cable TV wiring, or even ordinary telephone lines and modems. Remote computers appear to become local, allowing file transfers, electronic mail, printer and disk sharing, and many other features, including, of course, access to the World Wide Web.

Internet Services

As previously stated, before there was a Web there was already an Internet (now capitalized), a worldwide network of networks interconnected using TCP/IP networking. Some of the major features of the Internet include (although all of these were pre-Web, they are still used):

Besides these major Internet services, many others have developed over the lifetime of the Internet, some of which use combinations of the above services. Using an Internet search tool called archie, for example, you can search a database of free software and find its location on the Internet just by sending a specially worded e-mail message to a special address. Return e-mail services transfer files to you, much like fax-back services, when you request them via e-mail. Special-interest electronic mailing lists have developed for like-minded people who want to discuss subjects ranging from computers and networks themselves to spelunking and job searching.

Each of these (and many other) Internet services are useful and powerful tools, and all are still widely used. Even before the existence of the Web, the need for electronic mail capabilities was driving substantial growth in the Internet. Each pre-Web Internet service, however, has its own particular user interface to be learned. Many of these interfaces are less than friendly to non-technical users. Figure 1.1 shows an archie search using a GUI software package (called WSARchIE) included on the CD-ROM with this book. Figures 1.2 and 1.3 show the same search using a Web browser and fill-in form interface to the archie service. The search term given in both cases is msie20.exe, which is the filename of Microsoft's Internet Explorer.

Figure 1.1: WSARchIE interface to archie Internet search service.

Figure 1.2: Web-based archie search.

As you can see in these figures, the Web interface is significantly more accessible; the difference literally speaks for itself. Instead of a raw list of anonymous FTP servers and lengthy directory paths, you see a nicely formatted list of locations with the ability to download the located file just by clicking the link. Even though the WSARchIE program is a very nice GUI, you still have to turn around and use FTP, a completely different service with a different user interface, to retrieve the file you want. Even assuming you can find the data you want, which Internet program do you use to access it? And where did you put the obscure set of instructions for this particular program? Actually using the ante-Web Internet then was not an easy proposition, particularly for casual computer users.

Figure 1.3: Results of Web-based archie search.

The Birth of the Web

In 1993, Tim Berners-Lee and other researchers at the European Particle Physics Lab (Conseil Europeen pour la Recherche Nucleaire, or CERN) in Geneva, Switzerland, developed a means of sharing data among their colleagues using something they called hypertext. CERN users could view documents on their computer screens using new browser software. Special codes embedded in these electronic documents allowed users to jump from one document to another on screen just by selecting a hyperlink. Internet capabilities were built into these browsers. Just as a user could jump from one text document on a computer to another, he could jump from a document on one computer to a document on another remote computer. Moreover, each of the major Internet services listed above was added to the browser software. A researcher could transfer a file from a remote computer to her local system, or log into a remote system just by clicking on a hyperlink, rather than using the clumsy FTP or Telnet mechanisms. CERN's breakthrough work is the basis of today's World Wide Web and its Web server and browser software (now being maintained by the World Wide Web Consortium) were the first of their kind.

Note
CERN has now moved on, or rather back, to its main mission of doing research on particle physics, but its Web-related legacy has been passed on to the World Wide Web Consortium, a group of academic and commercial organizations dedicated to the advancement of the Web. W3, as it's called, remains active in the development of the Web, and Berners-Lee is still right in the thick of things at W3. You may want to visit the W3 Web site at http://www.w3.org/.

Unlike today's Web browsers, CERN's Web browser was a plain-text package in which cursor keys were used to move around the computer screen and the Enter key to select hyperlinks. While it could access both hypertext documents and ante-Web Internet services like FTP, Gopher, and Telnet, it had no graphical capabilities. Marc Andreesen, a graduate student working part-time at the University of Illinois National Center for Supercomputer Applications, picked up CERN's work and turned it into what would become today's ncSA Mosaic, the first graphical Web browser with point-and-click capabilities. First developed for UNIX computer systems running the X Window graphical user interface, ncSA Mosaic was quickly ported to Microsoft Windows and Macintosh pc's. Mosaic rapidly became the proverbial "killer application" for the Internet. Just as Mosaic descended from the work at CERN, all subsequent graphical Web browsers come from this common ancestor.

Web Browsers

Besides ncSA Mosaic, there are a large number of other Web browsers, including, of course, the widely used Netscape Navigator package, now the leading Web browser in terms of market share, and Microsoft Internet Explorer. (Incidentally, Marc Andreesen left ncSA to co-found Netscape Communications Corporation.)

While this book concentrates on Microsoft Internet Explorer and Netscape Navigator, there are a lot of Web browser software packages to choose from besides these two. Depending on the type of workstations you have on your LAN, you may need to consider browsers written for platforms other than Windows. Netscape Navigator is available on nearly every platform and Microsoft Internet Explorer is available for Windows NT, Windows 95, Windows 3.1, and the Macintosh. Here is a quick look at just a few of the other browsers available:

Note
One source for more information about the various Web browsers available on the Internet is http://www.browserwatch.com/.

How Web Browsers Work

Graphical or not, all Web browsers work in essentially the same way. Look at what happens when you click on a hyperlink.

As you can see, a simple click on a hyperlink starts a pretty significant series of events involving not only your Web browser software but also a Web server somewhere on the Internet. Figure 1.4 shows this sequence of events.

Figure 1.4: Web browser/server communication using HTML and HTTP.

Note
For purposes of your Intranet, it's important to note that Web servers always identify the type of data they send in response to browser requests. Most of the time the data returned is text data with HTML markup, but any kind of data can be returned. This bit of information is critical to the potential capabilities of your Intranet: As long as your Web server can identify the data it's sending, your users' Web browsers can be set up to handle almost any kind of data including word processing files, spreadsheets, and the datafiles used by a wide variety of other applications. This simple-but-powerful mechanism explained in detail in Chapter 12, "MIME and Helper Applications" is what you can use to turn your Intranet into an interactive tool for getting your company's everyday work done.

Web Servers

Web browsers like Explorer or Navigator communicate over a network (including the worldwide Internet) with Web servers, using HTTP. Browsers send network messages to servers asking that specific documents or services be provided by the server. The server returns the document or service if it's available also using the HTTP protocol, and the browser receives and understands it.

There are many network protocols spoken on the Internet, each one for a specific and limited purpose. There are network protocols for electronic mail, file transfers, and other services you may have heard of, including Gopher, Telnet, and WAIS. Each of these protocols works well for its own purpose, and you can use individual programs on your computer that communicate with the protocols to locate and retrieve information on the Net. The HTTP protocol was designed to incorporate these, and other, network protocols into a single protocol. What's important to the World Wide Web user is that Web browsers speak the HTTP protocol, taking care of locating, retrieving, and, most important, interpreting the data, regardless of the actual underlying protocol or service.

Your Intranet will utilize the HTTP protocol and all the other TCP/IP protocols it subsumes to provide point-and-click access to a wide variety of your mission-critical information and services. This is an important point, and we'll come back to it in the final section of this chapter.

World Wide Web Server Software

You don't have to have a UNIX computer system to set up and run a World Wide Web server. In fact, this book will show you how easy it is to run a Web server on Windows NT. Windows NT is an order of magnitude easier to install and manage than UNIX or Netware, in my personal opinion. Windows NT is also a very powerful and secure operating system and many Web servers are available to take advantage of its features.

Note
For detailed information about setting up an Internet Web site on Windows NT (Server or Workstation) or Windows 95, please consult either of these two Sams.net books which I recently co-authored with Christopher Brown: Web Site Construction Kit for Windows NT and Web Site Construction Kit for Windows 95.

This book will cover aspects of configuring Microsoft Internet Information Server (IIS). Although all of the techniques in this book will apply to any NT Web server, some of the reasons for choosing IIS is that it comes free with Windows NT Server, it is well integrated with the operating system, it includes strong security features, and it was recently rated by pc Week Magazine as the fastest NT Web server. (Of course, benchmark results are a never-ending sea change.)

Information about setting up and using a Web Server is given in Chapter 7, "Running the Intranet Web Server". Since IIS only runs on Windows NT Server, you will need to consider other software if you plan to run Windows NT 4.0 Workstation. NT Workstation 4.0 includes a peer Web server, similar to IIS. Another very good free package is the EMWAC HTTPS. It is included on the CD with Web Site Construction Kit for Windows NT. Two powerful commercial servers that run on NT Workstation are Purveyor WebServer and ILAR Concepts FolkWeb. You'll probably want to dedicate a high-end machine to this task, rather than trying to run a server on somebody's desktop pc while it's in use-but this really depends on how much network traffic your Intranet server will need to handle.

Note
Several Windows NT Web Servers are discussed in Appendix B. For detailed information about the current features and capabilities of almost every Web server available, see http://www.webcompare.com/

Chapter 3, "The Software Tools to Build a Web" goes into more detail to help you select the hardware and software to make up your Intranet.

Commercial Web Server Software Features

With the explosive growth in numbers of Intranet and WWW server installations, most professional server software packages have been adding features at an equally fast pace. What follows is a list of some of these:

Of course you should also be able to expect commercial-grade support from Web server software vendors. This is a potentially critical matter especially if you don't have in-house expertise in managing the software. Free software packages are invariably not free when you have to provide your own support.

Other TCP/IP Services in Your Intranet

Although we've touched on this subject once or twice earlier, it's worth specific, focused attention for your Intranet. The HTTP protocol spoken by both Web servers and browsers includes a number of other TCP/IP services:

Because these services are built into the HTTP protocol, your Intranet can include any of them. Moreover, you can integrate any of these services without requiring your users to learn the service's native interfaces. Web browsers provide a common, point-and-click front end to all these services. You can, for example, set up an FTP server in your Intranet for distributing software updates or any other computer data. Similarly, you can use USENET news services as a means of collaboration and information sharing within your organization. In either case, your users-the people who are defined in Chapter 2, "Planning an Intranet" as your Intranet's customers-need learn only one interface, to access any of the services you're providing on your Intranet. You will see throughout this book, and especially in Chapter 11, that the Web browser is the key.

This is true regardless of whether your network is connected to the Internet. Because you need TCP/IP networking running in your corporate network in order to set up an Intranet, you can turn around and use this infrastructure to extend your Intranet. Doing so enables you to include a wide variety of other TCP/IP network services. Anonymous FTP, for example, need not be limited to the outside world; you can use it within your company just as well. Index internal data with WAIS, then make it available to your users. Use e-mail distribution lists within your company and your Web browser to read and send messages.

The upshot of this is that your Intranet need not be limited to the passive retrieval of HTML documents, or to extended use of helper applications described in this book. Because Web browsers understand virtually all TCP/IP network protocols, you're free to extend the capabilities of your Intranet to include any of the TCP/IP services that might be useful to your company's or organization's mission. Further, you can do so without incurring the organizational overhead of teaching people to use each and every different service that might be useful.

We will be discussing many of these ideas throughout the book in greater detail. And you will see that utilizing TCP/IP on your Intranet will serve you well when you are ready to open the door to the Internet. For information about using Windows NT as a router, please see Chapter 28, "Connecting the Intranet and the Internet".

Overview of Microsoft ActiveX

If you work with Windows NT or Windows 95 you've probably heard about the new Internet push from Microsoft called ActiveX. They announced several technologies under the umbrella of ActiveX at the Professional Developer's Conference in San Francisco in March, 1996. One day of the conference was even broadcast to dozens of theaters around the U.S. where programmers could sit and watch the "movies" as Microsoft explained their vision and demonstrated many of the new features coming soon in their software packages.

Depending on how you look at it, ActiveX is either the entire Microsoft Internet strategy rolled into one word, or it is simply a new name for the idea of fitting OLE custom controls onto the Web. Some of the trade literature has been rather confusing on this point, but mostly it is the latter. The fact that Microsoft is coming out with several other Internet plans at the same time as ActiveX, has led some to cast all of the technologies as "ActiveX". I don't know if this is what their marketing wizards are trying to accomplish, but it seems that many of the announcements do not necessarily depend on ActiveX. Here is a quick breakdown of a few of their recent initiatives:

We will discuss some of these in more detail in Chapter 17, "Understanding ActiveX Technologies." Keep in mind that Netscape-and several other software vendors-already have in place competing alternatives to many of these initiatives.

Summary

I've introduced the World Wide Web in this chapter, then put a spin on it that's applicable to the use of its technology in a corporate Intranet. We've introduced the following subjects, each of which are covered in detail in other chapters:

Chapters 2, 3, 4, and 5 will continue introducing the tools of the Intranet and Internet. By the end of Part I, we will have laid the groundwork for building your Intranet.