After you become connected to the Internet, one of the first problems you are likely to run across is how to deal with the many different types of files that are out there. What is a .zip (or .gif, .hqx, .ps, .html, .uue, .tar.Z) file, and what do you
do with it after you get it?
This was somewhat less of a problem in the early days of the Internet, because most of the things traveling around on the Net in those days were plain text files such as mail messages and Usenet news. When files were encoded, it was usually with
uuencode, which was built-in on UNIX systems, and often unavailable on other types of systems. The "uu" in uuencode stands for UNIX-to-UNIX, and by and large that's all it was good for. Uuencode uses a very simple encoding scheme to convert 8-bit
binary data into 7-bit ASCII (plain text) files. Due to an unfortunate design flaw, uuencoded files often become corrupted when they pass through a non-ASCII (for example, IBM mainframe) host. As a result, sending anything but plain text files was
difficult enough that most people simply didn't bother.
Times have changed, however, and these days everyone is passing around graphics, audio files, movies and more. If you want to get in on this good stuff too, you'll need to know what the different file types are and you'll have to get your hands on a
variety of file conversion utilities to deal with them.
A good source for this informationindeed, for any informationis the news.answers Usenet newsgroup, which contains periodic postings on almost any conceivable subject. Of particular interest to file conversion devotees are the FAQs (answers
to Frequently Asked Questions) from the comp.compression, alt.binaries.pictures.d, and alt.binaries.sounds.d newsgroups.
Because the Usenet FAQs are periodic postings, the one you are looking for may not currently be available at your site. If that's the case, or if you simply don't have access to Usenet, you can retrieve any of the Usenet FAQs via anonymous FTP from
rtfm.mit.edu, from the pub/usenet-by-hierarchy directory. World Wide Web devotees can also access the FAQs using the following Universal Resource Locator (URL):
http://www.cis.ohio-state.edu/hypertext/faq/usenet/FAQ-List.html
Another good place to look for this sort of information is in the comp.sys Usenet newsgroup hierarchy, where you will find groups devoted to most computer platforms, including Apple ][, Amiga, Atari, DEC, HP, IBM, Macintosh, NeXT, Sun, and many more.
Now, back to the original question. What is a .zip (or .gif, or. . .) file, and what can you do with it? In a moment, I'll list all of the common types of audio, image, video, encoded, archived, and compressed files, and suggest software you can use to
handle them. But any such list is bound to be incomplete, so it's worth spending a few moments to consider what you can do when you find a file type that isn't listed here.
The first thing to do is to look in the FAQs for the comp.compression, alt.binaries.pictures.d, and alt.binaries.sounds.d Usenet newsgroups mentioned earlier. That's where you'll find the most complete listings of file types and their meanings, and it's
also where you'll find the most up-to-date listings of software used to view, play, or otherwise manipulate these files.
If the information you are looking for isn't in one of those FAQs, the next best approach is to ask for help on one of the relevant Usenet newsgroups. Almost certainly, somebody out there knows the answer to your question, and would be more than willing
to help.
This section has been included for those readers who don't know how to use anonymous FTP or who don't understand the cryptic notation I use for FTP retrieval information. If you already know this stuff, just skip this sectionor read it over
anyway, just in case. If you don't learn anything new, you may still enjoy the warm fuzzy feeling you get when something confirms just how much you already know.
All of the public domain and shareware software described in this chapter may be retrieved over the Internet via anonymous FTP. In most cases, the packages are available from many different FTP sites, but to save space, I only list one. The retrieval
information given here is correct as I write this, but some of it will almost certainly be out of date by the time you read it. If you can't find one of these packages using the exact retrieval information given here, you might want to check for newer
versions of the same package at the same location. You can also use Archie to find alternative sources for any of this software.
The retrieval information is given in the following format:
sitename.domain:path/file
For example, the retrieval information for WHAM (a Windows audio player) is given as:
ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/viewers/wham131.zip
To retrieve this file, use FTP to connect to the machine named ftp.ncsa.uiuc.edu, login as anonymous, give your e-mail address as your password, go to the /PC/Mosaic/Windows/viewers directory, and retrieve the file wham131.zip in binary mode.
If you happen to be using NcFTP, you can use the retrieval information exactly as given, by putting it on the NcFTP command line. For example, to retrieve WHAM using NcFTP on a UNIX host, you would type:
% ncftp ftp.ncsa.uiuc.edu:/PC/Mosaic/viewers/wham131.zip
Audio data comes in many different formats. Some files are monaural, others are in stereo. Many different combinations of sampling rate, sample size, and number of channels are in use, and audio file formats differ considerably from one system to
another.
To convert sounds into digital format, they are sampled many times per second. The more samples that are taken per second, and the more bits used to store each sample, the better the results. Standard telephone-quality sound can be achieved with 8000
samples per second, using 8 bits to store each sample. Sound reproduction equivalent in quality to that of an audio CD requires 44,000 16-bit samples per second.
Fortunately, you don't have to worry about this very often. In most cases, your audio software can figure out what type of file you are feeding it, and make any necessary conversions to get it to play on your system.
Here is a list of some of the most common audio file types:
File Extension |
Description |
.au |
NeXT or Sun audio file |
.snd |
Mac, NeXT, PC or Sun sound file |
.aif, .aiff, .aifc |
Apple or SGI sound file |
.voc |
Sound Blaster sound file |
.wav |
Microsoft (windows) sound file |
.iff, .mod, .nst |
Amiga sound file |
These days, the most common types of sound files you will find on the Internet are Sun audio files and Macintosh sound files.
The hardware you need to play audio files comes as standard equipment on all Macintoshes, most Sun workstations, and on many other personal computer and workstation brands. The hardware required to play sounds is not present, however, on many PCs. To
get reasonably good quality sound from a PC, you will need to invest in a sound card and external speakers.
Many software packages, both public domain and commercial, are available to convert and play audio files on various platforms. There are too many packages to list them all, so I will mention just a few for each of the major platforms. I'll try to pick
packages that are commonly used, easily available, and reasonably powerful, but don't be surprised if I've left your favorite off the list. If the package I've chosen doesn't suit you, read the Audio File Formats FAQ from the alt.binaries.sounds.d Usenet
newsgroup and pick one that is more to your taste. You should also consult the FAQ if you have an audio-capable system that I don't mention here, such as a DEC VAXstation 4000.
To get reasonable quality sound out of a PC, you will need to invest in a sound card and external speakers. If you want to listen to Sun or Macintosh sound files on a PC, you will need to find some audio conversion software, such as SOX, WHAM, or
wplany.
SOX (SOund eXchange) converts between most common audio file types. The PC version of SOX is available from:
ftp://ftp.cwi.nl/pub/audio/sox5dos.zip
WHAM (Waveform Hold and Modify) is a Windows 3.1 application for manipulating and playing sound files.
ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/viewers/wham131.zip
Wplany is an audio player that works with the built-in PC speaker.
ftp://ftp.ncsa.uiuc.edu/PC/Mosaic/viewers/wplny09b.zip
Support for sound is built into all Macintoshes, and if you are running System 7, you can play Mac sound files by double-clicking on them in the Finder. For other types of sound files, however, you'll need to get a sound conversion utility such as
SoundMachine or SoundExtractor.
SoundMachine reads and plays most common sound formats.
ftp://sumex-aim.stanford.edu/info-mac/snd/util/sound-machine-21.hqx
SoundExtractor reads, plays, and converts between most common sound formats.
ftp://sumex-aim.stanford.edu/info-mac/snd/util/sound-extractor-131.hqx
Sun Workstations include sound support, although the mechanism for playing Sun sound files varies from one release of SunOS or Solaris to another. On some systems, the built-in sound command is "play," on others it is "audioplay" or
"audiotool," and on a few Suns, you may have to resort to the rather ugly cat > /dev/audio.
Sun's Audio Tool is shown in Figure 3.1.
SOX (SOund eXchange) runs on most UNIX systems and converts between most common audio file types. SOX source code was posted in eight parts to the alt.sources Usenet newsgroup, and is available from:
ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.12.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.13.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.14.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.15.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.16.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.17.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.18.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.19.2 ftp://ftp.sterling.com/usenet/alt.sources/volume93/Jul/930726.20.2
Table 3.2 shows some of the most common image file types.
File Extension |
Description |
.bmp |
Windows or OS/2 bit-mapped picture file |
.gif |
CompuServe's Graphics Interchange Format |
.im8, .img |
Sun Image file |
.jpe, .jpg, .jpeg |
Joint Photography Experts Group file |
.pcx |
PC Paintbrush file |
.pict |
Macintosh QuickDraw PICTure file |
.eps, .ps |
(Encapsulated) PostScript file |
.tif, .tiff |
Tagged Image Format File |
.xbm, .xwd |
X Window BitMap, X Window Dump files |
.pbm, .ppm, .pgm |
Portable BitMap, PixMap and GreyMap formats |
These days, GIF is the most popular image format on the Internet, with JPEG running a close second. Both GIF and JPEG images can be viewed on many different platforms, and they have built-in compression to reduce the size of the image files. GIF images
can have a maximum of 256 colors.
While not as popular as GIFs, JPEG images are becoming more and more common on the Net. JPEG uses a lossy compression format, which means that small amounts of image data are sacrificed to offer much better compression. JPEG files are often much
smaller than comparable GIF images, with negligible loss of quality. JPEG images can have up to 16.7 million different colors.
The next most common image file type you are likely to find on the Internet is PostScript. PostScript is a special-purpose language used to describe printed pages in a device- and resolution-independent manner. PostScript files can contain image data,
as well as text and printer commands (such as a command to put a printer into duplex mode). Because of its complexity, only a handful of programs can successfully deal with PostScript. These include GhostScript, GhostView, Adobe Illustrator, and Abode's
Display PostScript.
Both GIF and JPEG are raster image formats. Raster images are stored as a rectangular array of dots, or pixels (picture elements). Vector images, on the other hand, are stored as a sequence of drawing operations. Raster formats
(sometimes called paint files) are more common than vector formats (sometimes called draw files). Converting from a vector format to a raster is fairly easy, as is converting from one type of raster image to another. It's much harder to
convert a raster image into vector format.
Some images are monochrome, others are greyscale or 4-, 8-, 16- or 24-bit color. The nicest and largest files you are likely to come across are 24-bit color (sometimes called true color). More than likely you won't have the video hardware needed
to display 24-bit color images, so your image display software will convert them down to 8-bit color. Image files are big, and most of the image file types support one or more compression schemes to reduce the files to a more manageable (but still
large) size.
In monochrome images, each pixel is represented by a single bit of computer storage. That is just enough to represent two colors (for example, black and white). 4-, 8-, 16-, and 24-bit color images can contain 16, 256, 32 thousand, and 16 million
colors, respectively. It takes just over 2300 bits to store a 48x48 dot monochrome picture. A full-screen (1152x900), full-color (8-bit) image on my Sun workstation takes up over 8 million bits (1MB) of storage. In compressed format, that image might fit
into a 200KB file, but when it comes time to display it, the image will be expanded to its full size in memory, which is why good video boards include 1MB or more of graphics memory.
To view images on your own system, you need graphics hardware (often called a frame buffer or video card), graphics memory, and a bit-mapped display. Generally speaking, the faster the graphics hardware, and the more graphics memory you
have, and the larger the monitor, the better for displaying images.
Many software packages, both public domain and commercial, are available to convert and display image files on the various platforms. As with audio files, there are too many image utilities to list them all, so I will just hit the highlights. For more
information, read the FAQ from the alt.binaries.pictures Usenet newsgroup. Also consult the FAQ if you have an system that I don't mention here, such as an Atari ST or an Amiga.
Graphic Workshop for DOS converts and displays most major image formats. Figure 3.2 shows an example of graphics conversion with Graphic Workshop.
Figure 3.2. File conversion in Graphic Workshop.
You can get Graphic Workshop from:
ftp://nic.switch.ch/mirror/msdos/graphics/grfwk70b.zip
PaintShop for Windows also converts and displays most common image formats. The PaintShop Pro file conversion help screen is shown in Figure 3.3.
Figure 3.3. File conversion in PaintShop Pro.
PaintShop Pro is available from
ftp://oak.oakland.edu/pub/msdos/windows3/pspro200.zip
GhostScript and Ghostview are Postscript document viewers for Windows:
ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/viewers/gs261exe.zip ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/viewers/gsview10.zip
GIFConverter converts between most common image formats.
Mac.archive.umich.edu://mac/graphics/graphicsutil/gifconverter2.37.hqx
JPEGView can display JPEG and GIF images.
ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Mac/Helpers/jpeg-view-33.hqx
XV is a very nice image viewer for X11. It views and converts between most common image types. The more recent (shareware) versions of XV also perform many image processing operations. See Figure 3.4 for an example of a screen showing file conversion
with XV. XV also runs on VMS systems. You can get the source code for XV from
ftp://ftp.cis.upenn.edu/pub/xv/xv-3.00a.tar.Z
Figure 3.4. File conversion with XV.
NetPBM (as well as its predecessor PBMPlus) can convert just about any image format into any other format, and it runs on many platforms (UNIX, VMS, DOS, OS/2, Amiga). It also can perform a variety of image-manipulation operations. However, it is
command-line oriented, and it doesn't actually display imagesit just converts them. NetPBM is available from many locations, including:
sunsite.unc.edu:/pub/X11/contrib/netpbm-1mar1994.tar.gz
GhostScript is a Postscript previewer available on most platforms, including UNIX/X11.
ftp://ftp.ncsa.uiuc.edu/Web/Unix/viewers/ghostscript-2.6.tar.z
Animation files contain a series of still-frame images that are displayed in sequence on your computer. If the images are small, and your hardware is fast, the sequence of images will be displayed quickly enough that it appears as live action motion.
Here is a list of some common animation file types:
File Extension |
Description |
.mpeg, .mpg, .mpe |
Moving Picture Experts Group movie file |
.qt mov |
Macintosh Quicktime movie |
.movie |
Silicon Graphics movie file |
.dl, .gl |
Animated picture file |
.flc, .fli |
Animated picture file |
These days, the most common types of animation files you will find on the Internet are MPEG and Quicktime. Like JPEG, MPEG is a new format that is rapidly gaining in popularity and can be displayed on many different platforms.
Some animation files literally contain sequences of images, but others save space by storing only the differences between successive frames. Some animation files (such as Quicktime, for example) include audio information as well, so that the animation
can be accompanied by a soundtrack.
Because animations are essentially just sequences of images, they have hardware requirements similar to those of image files. As with displaying images, the faster your graphics hardware, and the more graphics memory you have, and the larger the
monitor, the better.
Many software packages, both public domain and commercial, are available to display animation files on the various platforms. As with audio and image files, there are numerous animation utilities, so I will just highlight some of the popular ones. For
more information, consult the FAQ from the alt.binaries.pictures.d Usenet newsgroup. Also consult the FAQ if you have an system that I don't mention here, such as an Atari ST or an Amiga.
Lview is a free GIF/JPEG viewer:
ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/viewers/lview31.zip
MPEGPLAY is a shareware mpeg viewer for Windows:
ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/viewers/mpegw32e.zip
MPEGXing is a free Windows MPEG viewer from Xing Technology:
ftp://S2K-ftp.cs.berkeley.edu/pub/multimedia/mpeg/Ports/xing/xing2.0.tar.Z
Simple Player is a Quicktime movie player from Apple. It is included with most versions of Quicktime. Simple Player is shown in Figure 3.5.
Figure 3.5. Playing movies with Simple Player.
Sparkle is an MPEG movie player for the Mac.
ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Mac/Helpers/sparkle-213.hqx
Mpeg_play is a free MPEG viewer for UNIX/X11 systems.
ftp://S2K-ftp.cs.berkeley.edu/pub/multimedia/mpeg/mpeg_play-2.0.tar.Z
Xanim plays most other animation files on UNIX/X11 systems, including .dl, .fli, .gif, and .iff. Available from many locations including:
ftp://syr.edu/software/X/xanim229.tar.Z.
Why do people encode, compress, or archive files?
Binary files must be encoded into an ASCII format in order to transmit them through text-only media such as electronic mail or Usenet news.
Macintosh files must be encoded in order to preserve their resource forks and file-type and creator information when they are transferred to other systems. Similarly, some VMS files must be encoded to preserve file-type and record-structure information.
Encoding a binary file typically increases its size by 30 or 35 percent. To compensate for this, people usually compress files before encoding them. There's no point in compressing a file after it's encoded, of course, because that would undo the
benefits of the encodingthe result would no longer be a plain-text file.
Groups of related files are often archived (combined into one file). This ensures that none of the parts gets lost or mislaid in transit, and also simplifies the downloading process.
Most archiving utilities (other than the venerable UNIX tar and shar programs) also compress files. Because tar doesn't, most tar archives are compressed or gzipped as well, and you see them on the Net as .tar.Z or .tar.gz files.
By and large, audio, image, and video files are quite portable, and, given the right hardware, can be displayed on practically anything.
Archived, encoded, and compressed files, though, tend to be much less portable. It's not that they are intrinsically less portable, but that they are more often intended for use on a single type of machine. With the exception of Zip and occasionally Zoo
archives, these files are intended for use on a particular type of system, and unpacking them elsewhere is usually more trouble than it's worth.
Even if you manage to find software that can unpack a UNIX tar archive on a Macintosh, chances are it won't do you any good; most tar archives contain UNIX source code, which won't compile on a Mac. Similarly, unpacking a Macintosh Compactor Pro archive
won't do you much good on a PC; more than likely, the archive contains Macintosh software, which won't run on your PC.
Table 3.4 lists the most common encoded, compressed and archive file types.
File Extension |
Description |
Capabilities |
.arc |
The old DOS archive standard |
Archiving and compression |
.cpt |
Compact Pro for Mac |
Archiving and compression |
.gz |
GNU gzip |
Compression only |
.hqx |
Macintosh BinHex |
Encodes Mac files preserving resource fork, file type and creator |
.lha, .lzh |
LHarc |
Archiving and compression |
.sea |
Macintosh |
Self-extracting archive |
.shar |
UNIX shell archive |
Archiving only |
.sit |
Stuffit for Macintosh |
Compression and archiving |
.tar |
UNIX Tape Archive |
Archiving only |
.uue, .uu |
UNIX UUEncoding |
Simple binary-to-ASCII encoding |
.z |
pack or gzip (see .gz) |
Compression only |
.Z |
UNIX compress |
Compression only |
.zip |
Phil Katz's pkzip |
Archiving and compression |
.zoo |
Rahul Dhesi's zoo |
Archiving and compression |
Phil Katz's pkunzip is the original DOS unzip utility. Find it at
ftp://oak.oakland.edu/msdos/zip/pkz204g.exe
Rahul Dhesi's zoo creates and unarchives DOS zoo files. Find it at
ftp://oak.oakland.edu/msdos/zoo/zoo210.exe.
The Gnu zip package is available for DOS systems, and it is useful for uncompressing .gz files found on the Internet.
ftp://labrea.stanford.edu/pub/gnu/gzip-1.2.2.msdos.exe
Uuexe, a UNIX-compatible uuencode/uudecode utility for DOS, enables PC users to decode uuencoded files found on the Internet. Uuexe is available from:
ftp://oak.oakland.edu/pub/msdos/decode/uuexe525.zip
BinHex decodes Macintosh BinHex files. Find it at
ftp://sumex-aim.stanford.edu/info-mac/util/binhex-40.hqx.
StuffIt Expander decompresses Macintosh StuffIt files. Find it at
ftp://sumex-aim.stanford.edu/info-mac/cmp/stuffit-expander-351.hqx.
Info-zip's portable unzip program is available for the Macintosh, enabling Mac users to unarchive .zip files found on the Internet.
Unzip for the Macintosh is available from
ftp://sumex-aim.stanford.edu/info-mac/util/unzip*-*.hqx
Uulite, a UNIX-compatible uuencode/uudecode utility for the Macintosh, is available from
ftp://mac.archive.umich.edu/mac/util/compression/uulite1.6.cpt.hqx
Tar, compress, uuencode and uudecode are built into almost all UNIX systems.
Info-zip's portable unzip program is available for UNIX systems.
Figure 3.6 shows an example of the use of unzip.
Figure 3.6. Unpacking an archive with unzip.
The source code for unzip is available from
ftp://oak.oakland.edu/pub/misc/unix/unzip512.tar.Z
The Gnu gunzip program (also known as zcat) uncompresses gzipped files. It is available from
ftp://labrea.stanford.edu/pub/gnu/gzip-1.2.2.tar
Mcvert is a UNIX utility that decodes Macintosh BinHex files. Find it at
ftp://sumex-aim.stanford.edu/info-mac/cmp/mcvert-215.shar
MIME (the Multi-part Internet Mail Exchange protocol) is the Internet standard mechanism for sending nontext documents through electronic mail. MIME provides a general mechanism for attaching text and nontext files to mail messages, for specifying
various types of documents, and for automatically encoding, decoding, and viewing different document types.
For each document, MIME distinguishes between its "content type" (what it is) and its "encoding type" (how it is stored, or how it has been encoded for transmission via e-mail). Through its use of a standard set of content and
encoding types, MIME makes it possible for applications to automatically encode, decode, and display many different types of files.
For example, suppose I have an image of myself stored in the file earl.gif. If I have a properly configured, MIME-compliant mail program, I can send that file to you with a simple attach command, and let my mail program worry about whether (and how) to
encode the file for transmission through the mail.
If you also have a MIME-compliant mail program, then when you receive the file your mail software would read the MIME content type and encoding type from the message header and automatically decode the file. This is shown in Figure 3.7.
Figure 3.7. Receiving a MIME mail message.
MIME is increasingly being used in nonmail applicationssuch as Gopher+ and the World Wide Webas well, so that they too can automatically encode, decode, and view many different types of documents.
The mime.types file provides a mapping between the filename extensions and their corresponding MIME types. Here is a sample mime.types file:
# Sample mime.types file application/postscript eps ps audio/basic au snd image/gif gif image/tiff tiff tif text/html html text/plain txt c cc h video/mpeg mpeg mpg mpe video/quicktime qt mov
The mime.types file in this example says that filenames ending in .eps or .ps are PostScript files, filenames ending in .au or .snd are basic audio files, and so on.
The companion file to mime.types is mailcap. The mailcap file specifies how each document type should be displayed. Here is a sample mailcap file:
# Sample mailcap file image/*; xv %s application/postscript ; ghostview %s video/mpeg; mpeg_play %s video/*; xanim %s
The mailcap file in this example directs MIME-compliant software such as Lynx and NCSA Mosaic to display images with xv, to display PostScript files with ghostview, to display mpeg videos with mpeg_play, and to display all other types of videos with
xanim.
If you set up the list of MIME types and viewers properly, and if you install all the right viewer applications, then whenever you use a World Wide Web or Gopher+ client or a MIME-compliant mail program, all the required file conversions will happen
automatically for youlike magic.
Netfind is a wonderful tool for finding people on the Internet, and it is an indispensable part of any Internet guru's toolkit.
Technically, Netfind is an Internet White Pages directory facility. That is, you can use Netfind to find people on the Internet in much the same manner as you use the White Pages in a telephone book to find people in your city.
Netfind was created by Mike Schwartz at the University of Colorado in Boulder, and while it's not the only tool you'll need to find people on the Net, it is one of the best.
The way that I look for people on the Internet is to look in the Phone Books-Other Institutions menu on the Notre Dame University Gopher (gopher.nd.edu). This menu lists all known Whois, CSO and X.500 directory services, and it can be used to determine
quickly whether a site maintains a searchable phone book of some description. If a site has a phone book, then using that phone book is probably the quickest and most reliable way to find people at that site. If there is no locally maintained phone book,
or if the person I am looking for isn't listed, then I'll try Netfind, usually via the Gopher-Netfind gateway on the Internic Gopher (ds.internic.net, port 4320).
Many organizations maintain electronic phone books or user directories of one form or another. Some sites have whois servers or CSO phone books, others use X.500 directory servers, and others use WAIS databases. Sometimes these directories are complete,
accurate and up-to-date, and sometimes they are not.
Most tools for finding people on the Internet rely on one or another of these directories. If you are looking for my e-mail address, for example, and you happen to know that the University of Saskatchewan maintains a whois server, and you know its
Internet address (whois.usask.ca), and the data on the server is up-to-date, and you have whois client software, then you can find me very quickly.
If a site doesn't have a directory server, or you can't find its address, or the information on the server is out-of-date, or you don't have the right tools, then most of the Internet White Pages tools will fail. Most, but not Netfind.
Given the name of a person on the Internet and a rough description of where that person is, Netfind attempts to locate telephone and e-mail information about them. It does this by going out over the network and interactively querying various hosts,
looking for people based on the search keys you provide.
Because it performs an interactive search, rather than using an existing directory of users, Netfind can often find people when other methods fail. If a site does not have a user directory, or if the directory is out-of-date or incomplete, Netfind may
be the only way to find people there.
On the other hand, as a consequence of its interactive nature, Netfind can be slow. Also, while Netfind may find an electronic mail address, it may not always find the best address. This is particularly true for people who have accounts on
many different systems.
To use Netfind, telnet to any of the following sites and log in as netfind:
North America: | |
bruno.cs.colorado.edu |
(University of Colorado, Boulder) |
ds.internic.net |
(InterNIC Directory and DB Services, South Plainfield, New Jersey) |
mudhoney.micro.umn.edu |
(University of Minnesota, Minneapolis) |
eis.calstate.edu |
(California State University, Fullerton) |
hto-e.usc.edu |
(University of Southern California, Los Angeles) |
netfind.ee.mcgill.ca |
(McGill University, Montréal, Québec, Canada) |
netfind.oc.com |
(OpenConnect Systems, Dallas, Texas) |
netfind.sjsu.edu |
(San Jose State University, San Jose, California) |
redmont.cis.uab.edu |
(University of Alabama at Birmingham) |
South America: | |
dino.conicit.ve |
(National Council for Technology and Scientific Research, Venezuela) |
malloco.ing.puc.cl |
(Catholic University of Chile, Santiago) |
netfind.if.usp.br |
(University of São Paulo, Sao Paulo, Brazil) |
Europe: | |
monolith.cc.ic.ac.uk |
(Imperial College, London, England) |
netfind.icm.edu.pl |
(Warsaw University, Warsaw, Poland) |
netfind.vslib.cz |
(Liberec University of Technology, Czech Republic) |
nic.uakom.sk |
(Academy of Sciences, Banská Bystrica, Slovakia) |
Australia: | |
archie.au |
(AARNet, Melbourne, Australia) |
netfind.anu.edu.au |
(Australian National University, Canberra) |
Asia: | |
krnic.net |
(Korea Network Information Center, Taejon, Korea) |
lincoln.technet.sg |
(Technet Unit, Singapore) |
When you connect to Netfind with telnet, you will see a screen similar to the following:
% telnet netfind.ee.mcgill.ca Trying 132.206.64.2... Connected to Excalibur.EE.McGill.CA. Escape character is '^]'. SunOS UNIX (excalibur) login: netfind password: ==================================================== Welcome to the Microelectronics and Computer Systems Laboratory Netfind Server. ==================================================== ... Top level choices: 1. Help 2. Search 3. Seed database lookup 4. Options 5. Quit (exit server) >
After you are connected to Netfind, select option number 2 to perform a search. Then, when prompted, type the name of the person you are looking for, followed by a set of location keywords, such as
schwartz boulder colorado university
The name can be a first, last, or login name, but (in most cases) only one name should be specified. The keys describe where the person may be found, and they typically include information such as the name of the individual's institution or its location
(city, state, country) or both.
After you've entered a name and location keys, one of three things may happen:
Figure 3.8. A Netfind search with too many keys.
Figure 3.9. A Netfind search with too few keys.
Figure 3.10. A successful search with Netfind.
As the search proceeds, Netfind keeps you informed of its progress by displaying information about the hosts it is searching and which search techniques it is using.
The search results are often several pages long and can quickly scroll off your screen; if you have some sort of session capture utility, it's a good idea to turn it on before running Netfind.
To use the Gopher-to-Netfind gateway, give your Gopher client the information from one of the following Gopher bookmarks:
Type=1 Name=Netfind Searches Path=netfind Host=archie.au Port=4320 Type=1 Name=Netfind - Network Wide E-mail Searches Path=netfind Host=gopher.vslib.cz Port=4320 Type=1 Name=Netfind Gateway Path=netfind Host=ds.internic.net Port=4320 Type=1 Name=Network Wide E-mail Searches Path=netfind Host=mudhoney.micro.umn.edu Port=4324
When you select one of these bookmarks in Gopher, you will be presented with a short menu similar to that in Figure 3.11.
Figure 3.11. Searching Netfind with Gopher.
Select option number 1, Search Netfind for E-mail Addresses, to perform a search; then, when prompted, type the name of the person you are looking for, followed by a set of location keywords, such as
lindner minneapolis minnesota
The name can be a first, last, or login name, but (in most cases) only one name should be specified. The keys describe where the person may be found, and they typically include information such as the name of the individual's institution, its location
(city, state, country), or both.
After you've entered a name and location keys and you submit the search, Netfind presents you with a Gopher menu of all the domains that match the location keys in your query, as shown in Figure 3.12.
Figure 3.12. First phase of a Gopher/Netfind search.
You can now choose any of these domains for further consideration, as shown in Figure 3.13. After you select a domain, Netfind proceeds to search individual hosts within the domain you've selected. This phase of the search may take anywhere from a few
seconds to a few minutes.
Figure 3.13. Second phase of a Gopher/Netfind search.
When the search is complete, the results will be returned to you in the form of a Gopher document. Netfind includes information about the hosts it searched and the search techniques it used along with the search results, so this document may be several
pages long.
Currently, there is no gateway linking the World Wide Web to Netfind. Most World Wide Web browsers can, however, communicate directly with Gopher servers, so you can use the Gopher-to-Netfind gateway from your favorite World Wide Web browser.
To use Netfind from a Web browser, connect to one of the following Universal Resource Locators (URLs):
gopher://archie.au:4320/1netfind gopher://gopher.vslib.cz:4320/1netfind gopher://ds.internic.net:4320/1netfind gopher://mudhoney.micro.umn.edu:4324/1netfind
When you connect to one of these URLs, you are presented with a short menu similar to the one shown in Figure 3.14.
Figure 3.14. Searching Netfind with Mosaic.
Select the first link, Search Netfind for E-mail Addresses, to perform a search and when prompted, type the name of the person you are looking for, followed by a set of location keywords, such as
berners-lee cern
The name can be a first, last, or login name, but (in most cases) only one name should be specified. The keys describe where the person may be found, and they typically include information such as the name of their institution, its location (city,
state, country), or both.
After you've entered a name and location keys, Netfind presents you with a World Wide Web document listing all the domains that match the location keys in your query, as shown in Figure 3.15.
Figure 3.15. First phase of a World Wide Web/Netfind search.
You can now choose any of these domains for further consideration, as shown in Figure 3.16. After you select a domain, Netfind proceeds to search individual hosts within the domain you've selected. This phase of the search may take anywhere from a few
seconds to a few minutes.
Figure 3.16: Second phase of a World Wide Web /Netfind search.
As with the Gopher interface to Netfind, the search results will be accompanied by information about the hosts searched and the search techniques used, so the resulting document may be several pages long.
A Netfind search proceeds in two phases.
The first phase is to narrow down the search using the location keys you provide. Netfind does this by looking up your keys in a site database, which contains information on Internet domains, the names of organizations, and their locations.
The site database is created by periodically scouring the Net, gathering and collating information from many different sources (including Internet domain name searches, UUCP (mail) maps, network traffic logs, Usenet message headers, and so on). Much of
this work can be performed automatically, but some of it is still done by hand by Netfind's creator, Mike Schwartz at the University of Colorado in Boulder. Every few weeks, all of the other sites that run their own Netfind servers FTP a copy of the site
database from the University of Colorado.
The location keys are used to narrow down the the scope of the search to a small number of hosts, usually in a single Internet domain. Using the site database, Netfind tries to find up to three Internet domains that match your query and that are worthy
of further consideration.
The choice of search keys is crucial to the success of Netfind. Like Goldilocks with her porridge, Netfind will proceed with a search only if your keys are just right.
The keys describe where the person you are looking for may be found, and THEY typically include information such as the name of the individual's institution or its location (city, state, country) or both. If you know the institution's Internet domain
name (cs.colorado.edu, for example) you can use it in your keys by specifying it without the dots (cs colorado edu, for example). You cannot, however, use host names as keywords. If you know of a machine named brazil.cs.colorado.edu, then cs, colorado, and
edu might be good keys, but brazil would not.
Using more than one key in a query implies the logical AND of the keys. Specifying too many keys may result in a search that is too narrow, so that Netfind will not find any domains to search. If this happens, try specifying fewer keys.
Specifying too few keys may result in a search that is too broad. If this happens, there will be too many domains to search in a reasonable amount of time and effort, so Netfind will present you with a list of the domains that match your query, and ask
you either to try again with different search keys or to select a few domains from the list for further consideration.
After the search has been restricted to just a few Internet domains, Netfind can proceed with the second phase of the search. At this point, it begins to interactively query various hosts in those domains, looking for the person whose name you've
supplied. It does this using a variety of techniques, including querying domain name servers (DNS), X.500 directory servers, whois databases, CSO nameservers, Simple Mail Transfer Protocol (SMTP) servers, and by using Finger.
The name that you supply can be a first, last, or login name, but in most cases only one name should be specified. A few of the data sources Netfind consults allow more than one name. If you do wish to specify two names, put them in quotes, like this:
"michael schwartz" boulder colorado university
In order to help Netfind do a better job of finding people at their sites, some organizations keep information about the White Pages (user directory) services they provide in their Domain Name System (DNS) database. At these sites, Netfind uses only the
directory services listed in the DNS database to carry out its search.
If DNS White Pages information is not available for a site, Netfind switches to a more exploratory search strategy, using DNS, SMTP and Finger. First, DNS is used to identify several well-known hosts within the domain, then SMTP is used to examine mail
forwarding relationships on these hosts, and Finger is used to examine login information. If the person you are looking for can be found using any of these techniques, the results will be returned to you.
Not all of the organizations listed in Netfind's site database are on the Internet. Often, organizations apply for a domain name months or years before their physical Internet connection is active. By keeping information on these sites in its database,
Netfind is able to find people at these locations as soon as their Internet connection is complete.
As a result, you can sometimes use Netfind to learn about organizations and the scope of their connection to the Internet. For example, one way (although certainly not the best way) to see if the Internet extends into a particular part of the world is
to search there with Netfind.
Suppose we use Netfind to search for "anyone zimbabwe," as shown in Figure 3.17.
Figure 3.17. Using Netfind to explore Zimbabwe.
This tells us that, at the time I tried this, Zimbabwe was not yet on the Internet. However, the University of Zimbabwe appears to be in the process of joining. When they do come online, their domain name will be uz.zw, and one of the first machines
that they are likely to connect is named zimbix.uz.zw.
This is very much the same situation that Hungary was in a few years ago, when Netfind discovered three registered domains in Hungary, none of which were connected to the Internet. Today Hungary is on the Internet, and when I tried it, Netfind
discovered over 100 Hungarian Internet domains.
You can use the same technique to see if a prospective employer is on the Net by, for example, searching for anyone xerox. Or, if you hate to be out of touch even when you are on vacation, you can use Netfind to discover Internet connections in the sun
by searching for anyone palm beach.
You can retrieve the current version of Netfind via anonymous FTP from the University of Colorado:
ftp://ftp.cs.colorado.edu/pub/cs/distribs/netfind/netfind4.6.tar.Z ftp://ftp.cs.colorado.edu/pub/cs/distribs/netfind/seeddb.tar.Z
The first of these files is the Netfind source code. The second is the seed (or site) database. If you choose to run your own copy of Netfind, you will probably want to retrieve a new copy of the seed database every few weeks.
Finger provides a very simple way to find information about users on your own system or on a remote system anywhere on the Internet. It can tell you who's currently logged on to the system, and it can give more detailed information about particular
users.
To use Finger on a UNIX system, for example, you simply type finger followed by a username. To Finger people on a remote system, append an "at" sign (@) and the remote host name to your query, as shown in the Figure 3.18.
Figure 3.18. Fingering an individual.
To get a list of all the people who are currently logged on to a remote system, use Finger but leave out the username.
If, as in the following example, a query matches multiple users, many systems return only summary information about each user. To get the full information, you must enter a more specific query or use the -l option with Finger to ask for a long listing.
Examples of both of these uses of Finger are shown in Figure 3.19.
Figure 3.19. More Finger examples.
Finger is one of the earliest, and simplest, of network information protocols, in which a user or program on one system (the client) can request information from another system (the server). The name Finger is derived from the phone company's old
slogan, "Let your fingers do the walking."
By Internet standards, Finger is an old protocol. In fact, its use predates the creation of the Internet by about six years.
Much useful information about the Internet and its predecessor, the ARPAnet, is published in the form of RFCs (Request for Comment documents). The original Finger RFC (RFC 742) was published in December 1977, at which time only three sites used Finger
(SAIL, SRI, and MIT). In those days, the main concern people had about Finger was how to promote its use. These days, the main concern is network security.
Finger's security problems are summed up neatly by this quotation from RFC 1288:
A common procedure in many Internet break-in attempts is to compile a list of users on the target system, and then to systematically try to guess their passwords. As a result, one of the most elementary precautions that security-conscious system
administrators adopt is to disable the broadcasting of user lists to the Net.
As a result, many sites refuse to answer Finger requests at all. Others will answer requests about specific users but refuse the more general list users form of Finger query.
One feature of the Finger protocol that was useful in the past has now become something of a security liability. This is the capability of relaying Finger requests through an intermediate host. The purpose of this feature was to enable hosts on two
separate networks to finger each other through a gateway machine, but it has the unfortunate side effect of enabling crackers to disguise their trail by passing Finger requests through an intermediate system.
The Finger security issues mentioned so far are all of concern to people supplying Finger information, but there are security issues for people who ask for Finger information as well. For one thing, not all Finger clients filter out control characters,
making it possible for unsavory individuals to embed escape sequences in their Finger information, messing up the terminal settings of anyone who fingers them.
There is another potential problem for users of the X11 Windowing System. A few sites respond to Finger requests by using X11 to display graphical output on the system that initiated the Finger query. This is used, for example, when you
finger yourhostname:0@drink.csh.rit.edu
to draw a pretty picture of a soda machine on your screen. If, however, you set up your X11 security to enable arbitrary remote sites to display output on your screen, then users at those remote sites can do other things to your machine as well,
including such tricks as recording all of your keystrokes.
Like Telnet, FTP, Gopher and the World Wide Web, Finger uses a client/server protocol, in which the client and server pass messages to one another using the Internet's underlying TCP/IP (Transmission Control Protocol/Internet Protocol).
When you finger someone on another machine on the Internet, your Finger client software makes a TCP connection to port 79 of the remote host, sends a one-line request, and waits for a response. At the other end, the remote host's Internet daemon process
(inetd) waits for incoming TCP connections. When a request comes in on port 79, it starts up the Finger daemon (fingerd) to deal with it. On some systems, there is no inetd, and the Finger daemon does all the work itself.
The Finger daemon reads the request sent by your Finger client, processes it, and sends back the results, which your Finger client then displays on your screen.
To see how this works, let's consider a specific example, in which you try to finger me on one of our local UNIX hosts. When you type, for example, finger fogel@willow.usask.ca, your Finger client makes a connection to willow.usask.ca on TCP port 79,
sends the line fogel<cr><lf> (the word fogel, followed by a carriage return/linefeed combination), and waits for a response.
You can try this yourself, bypassing the Finger client completely, by using Telnet to send a message directly to a remote Finger daemon, as shown in Figure 3.20.
If your local system does not support the -l option with Finger, you may be able to get the same effect by prefacing your query with the two character sequence /w. This too is shown in Figure 3.20.
Figure 3.20: Performing a Finger with Telnet.
Finger was intended to provide basic information about a system's users, but people soon learned they could do a lot more with it than that. You can use Finger to access all kinds of information, from current weather conditions to public encryption
keys.
Here are a few examples:
Fingering this |
Gives you |
aurora@xi.uleth.ca |
Current northern lights information |
forecast@typhoon.atmos.colostate.edu |
Tropical storm info |
dej@torfree.net |
Toronto Freenet information |
Scott Yanoff has produced two excellent online guides to Finger resources. Fingerinfo, shown in Figure 3.21, is a UNIX shell script that gives menu-driven access to some of the most interesting Finger sites.
Figure 3.21. The Fingerinfo main menu.
Yanoff's other guide, Special Internet Connections, shown in Figure 3.22, lists many different types of interesting Internet resourcesFinger, Telnet, e-mail, Gopher, and the World Wide Web.
Figure 3.22. Special Internet Connections.
This list is available both as a plain text file (Finger yanoff@alpha2.csd.uwm.edu for instructions on how to get the latest version) and as a World Wide Web page:
http://www.uwm.edu/Mirror/inet.services.html
World Wide Web users may be interested to know that there is a gateway from World Wide Web to Finger. As shown in Figure 3.23, the gateway converts ordinary Finger output into hypertext by translating it into the Web's Hypertext Markup Language (HTML).
Figure 3.23: The gateway from World Wide Web to Finger.
Most systems that support Finger also give users some control over what Finger says about them.
Some UNIX systems support the chfn (change Finger information) command, which enables individual users to modify their own Finger information. The exact nature of the information you can modify with chfn varies from system to system. On DEC Ultrix
systems, for example, you can change the text that Finger displays for your full name, your office address, and your office and home phone numbers.
On most UNIX systems, if you create a file named .plan in your home directory, then the contents of that file will be displayed whenever someone fingers you, in addition to the usual Finger information. If you create a file named .project, the first
line of that file will be displayed as well.
On some systems, you can do even more exotic things and have Finger run a program every time you are fingered. Some people use this mechanism to provide up-to-the-minute information about themselves, while others use it to keep track of how often they
are being fingered.
On UNIX systems that allow it, this is done by turning your .plan file into a named pipe, setting up a program that writes to the pipe, and waiting for someone to finger you. Here's how to do it:
First create a pipe named .plan in your home directory:
% mknod .plan p
Then run a program that writes to the pipe:
% plan.sh &
Here's a sample plan.sh script, which will send a different .plan file to each of the first five people who finger you:
#!/bin/sh # # sample script that runs a command via finger # count=0 object="person has" date=`/bin/date '+%r on %A %B %d, %Y'` while ( test $count -lt 5 ) do count=`expr $count + 1` echo "$count $object fingered me since $date" > .plan object="people have" done
Finally, wait for some people to finger you, as shown in Figure 3.24.
Figure 3.24: Running a program via Finger.
At most sites, the days of the large time-sharing central computer systems are gone, and this reduces the usefulness of Finger. While at one time you could track someone down by fingering one or two large hosts, you may now have to try hundreds or even
thousands of smaller machines. And because people often have accounts on more than one system, even when you find them, you may not find the system they use most frequently.
NetFind offers one solution to this problem. GNU Finger offers another.
The GNU (Gnu's Not UNIX) version of Finger is a drop-in replacement for the standard UNIX Finger and fingerd programs, which provides a site-wide (rather than system-wide) Finger service.
At MIT, for example, you can finger gnu.ai.mit.edu to find people on any of fourteen machines in the MIT AI lab.
GNU Finger offers several other extensions to the standard Finger protocol as well, including the ability to include "faces" (bit-mapped images) of users in the Finger output, and the ability to ask for general site information, such as a list
of machines that are currently idle.
Users can modify the information GNU Finger displays about them by creating an executable script named .fingerrc in their home directory. If you have a .fingerrc file, GNU Finger filters the normal Finger output through this script before passing it on
to the requestor, enabling you to modify, or completely replace, the usual information.
As mentioned earlier, if you do not have Finger client software, you can use Telnet to send a message directly to a remote Finger daemon instead. To finger user@host.domain, telnet to host.domain, port 79, and type the name of the user you want to
Finger, followed by a carriage return and linefeed.
Both the Finger client and server are included with most UNIX systems. GNU Finger is available from many sources, including anonymous FTP, from
ftp://labrea.stanford.edu/pub/gnu/finger-1.37.tar.gz
Finger client software for the Macintosh is available from many locations, including
ftp://archive.umich.edu/mac/util/network/finger1.37.sit.hqx
A Finger client is included as part of Stanford's MacIP package and as part of the Mailstrom mail program.
Finger client software is included with several commercial TCP/IP packages for the PC, including Sun's PC-NFS.
A Windows Winsock Finger daemon is available from
ftp://sunsite.unc.edu/pub/micro/pc-stuff/ms-windows/winsock/apps/fingerd.zip
Ping is one of the most basic Internet tools. It checks to see whether another machine on the network is reachable from your own host by sending it a message and waiting for a reply.
The original BSD (Berkeley Standard Distribution) version of Ping is quite terse in its output, as can be seen in the examples shown in Figure 3.25.
Later, and more sophisticated, versions of Ping keep trying until you ask them to stop, and they return more information, such as the packet size, round-trip travel time, and the order in which the return packets arrive (icmp_seq), as shown in Figure
3.26.
Ping uses the Internet Control Message Protocol (ICMP) to send messages across the network.
Most of the Internet tools (Telnet, FTP, Finger, Gopher, World Wide Web, and so on) communicate across the network using the Internet's Transmission Control Protocol (TCP). TCP is a complex protocol that breaks messages up into packets, sends them over
the network, and puts them back together again at the other end. During transmission, packets may be lost, duplicated, or delayed, and TCP also includes mechanisms for detecting and dealing with these conditions.
Some applications, however, don't require all the complexity of TCP, so the Internet supports two other, simpler protocols: User Datagram Protocol (UDP) and Internet Control Message Protocol (ICMP).
Ping requests, for example, always fit into a single packet, and it is no big deal if that packet is lost, because the originating application can always try again a few seconds later if no response is received the first time.
Ping checks to see if another host is reachable by sending it a series of ICMP ECHO_REQUEST messages, and listening for the responses. All Internet hosts are required to respond to ICMP ECHO_REQUEST messages, so if you can't get through to a host with
Ping, then it is pretty much guaranteed that none of the other Internet applications (such as Telnet) will work either.
Ping is built into UNIX systems.
On DOS systems, you can get a packet driver version of Ping from:
ftp://omnigate.clarkson.edu/pub/cutcp/v2.2-E/ping.exe
Windows users can get a Winsock version of Ping from:
ftp://winftp.cica.indiana.edu/pub/pc/win3/winsock/ws_ping.zip
Finally, a Macintosh version of Ping is available from:
ftp://ftp.germany.eu.net/pub/comp/macintosh/comm/ping-11.hqx.gz