|
|
|
To access the contents, click the chapter and section titles.
HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98
in a URL, you must encode it as:
ian%25euler
where %25 is the encoding for the percent character. If you do not do this, a program parsing the URL will try to interpret %eu as a character encoding. Conversely, you must not encode a special character if you require its special meaning. For example, the string
dir/subdir
indicates that subdir is a subdirectory of dir, while:
dir%2Fsubdir
is just the character string dir/subdir (%2F is the encoding for the slash).
The most common special characters are:
- The percent sign (%) This is the escape character for character encodings and is special in all URLs.
- The hash (#) This separates the URL of a resource from the fragment identifier for that resource. A fragment identifier references a particular location within a resource. This character is special in all URLs.
- The slash (/) This indicates hierarchical structures, such as directories.
- The question mark (?) This indicates a query string; everything after the question mark is query information to be passed to the server. This character is special only in Gopher, WAIS, and HTTP URLs.
Other characters that are special in certain URL schemes are the colon (:), semicolon (;), at (@), equals (=), and ampersand (&). These special cases will be noted as they arise.
TIP: URL Encoding Rule
Encode any character that might be special if you do not want to use its special meaning.
Examples of Uniform Resource Locators
Figure 8.1 illustrates three typical URLs, showing the different parts and the associated meanings. The parts are:
- Protocol Specifier. The first string in the URL, of the general form string:, specifies the Internet protocol to use in accessing the resourcethe examples here being for the HTTP (http:), Internet mail (mailto:), and telnet (telnet:) protocols. The protocol is indicated by the name before the colon. The string specifying the protocol can contain only lowercase letters (a-z). URL schemes are defined for most Internet protocols. The details of the different schemes are presented later.
Figure 8.1 Three example URLs (here http, telnet, and mailto URLs), showing the main components. Not all URLs follow these models, as discussed in the text.
- Domain Name and Port Number (Address). The second part of a URL is usually the Internet address of a server; this information lies between the double forward slash (//) and a terminating forward slash (/). This region contains the domain name of the server and, optionally, the port number to contact, the general form being // dom.name.edu:port/. Omitting the port number (and the colon before it) implies the default port for the given protocol. Numeric IP addresses can be used instead of domain names, for example:
//132.206.9.22:1234/
//128.100.100.1/
- You can sometimes (depending on protocol) include username and password information, if this is needed to access a resource. The form is then:
//username:password@www.address.edu:port/
- Note that the password can be read by anyone who sees the URL, so this is not a secure way to allow access to a resource.
- Some URL schemes do not require Internet domain names. This is the case for protocols that do not depend on a specific server, such as those for sending electronic mail (mailto) or accessing USENET newsgroup (news) articles. In these cases, users have, and must somehow specify, their own default mail or news server. The domain names for these servers are usually set using a browsers configuration menus.
- Resource Location. The forward slash after the host and port number field indicates the end of the address field and the beginning of the information required to locate the resource on the server. This field varies considerably, depending on the service being accessed. Often, it resembles a directory path leading down to a file, as in the http URL in Figure 8.1. In this context, the forward slash character (/) defines a change in hierarchy or directory and is used in place of all system-dependent symbols defining such relationships, such as the backslash (\) on DOS, OS/2, or Windows computers, the colon (:) on Macintoshes, and the [dir.subdir.subsubdir] expressions on VAX/VMS systems.
Query Strings in URLs
The URL syntax allows you to encode query strings to be passed to the designated Internet resource, in situations (typically Gopher, HTTP, or WAIS) that support queries. This is accomplished by appending the query strings to the URL, separated from it by a question mark. Two examples are (the query string is in boldface):
gopher://gopher.somewhere.edu/77/searches.phone?bob+steve
<http://www.somewhere.edu/cgi-bin/srch-data?archie+database>
The question mark is a special character in HTTP, Gopher, and WAIS URLs and must be encoded if you do not want to indicate a query string.
Encoding of Query Strings
Of course query strings, since they are part of a URL, must also be encoded. However, query string data take an additional level of encoding, over and above the encodings discussed to this point. By way of illustration, space characters in a query string are encoded as plus (+) signs and not via hex character encodings (as illustrated in the previous two examples). Query string encodings are specific to the protocol being used, and also to the mechanism used to gather data from the user (ISINDEX, FORM, or ISMAP active image). These mechanisms are discussed in the section on http URLs.
Some Simple URL Examples
The following examples illustrate basic URL structure:
http://www.w3.org/pub/WWW/Addressing/URL/Overview.html
- References the file Overview.html in the directory /pub/WWW/Addressing/URL/ obtainable from the server www.w3.org using the HTTP protocol at the default port number (80 for HTTP).
gopher://gumby.brain.headache.edu:151/7fonebook.txt
- References the searchable index fonebook.txt from the Gopher server at gumby.brain.headache.edu running on port number 151.
news:alt.rec.motorcycle
- References the newsgroup alt.rec.motorcycle, to be accessed from a new server. The identity of a news server must be specified elsewhere. With Web browsers, this is usually accomplished via the browsers configuration menus.
mailto:ross@physics.mcg.ca
- References sending an electronic mail message to the indicated e-mail address.
In general, a browser sends mail to a designated mail server, which then forwards the mail to the final destination. The name of this mail server must be specified by the user, usually via the browsers configuration menus.
|