|
To access the contents, click the chapter and section titles.
HTML 4.0 Sourcebook
Note that the URL syntax for Gopher queries uses a plus (+) sign to separate different search strings. Therefore, if you want to include a literal plus sign within a string, it must be encoded (the encoding for a plus sign is %2B). Client Construction of Query StringsInserting plus sign separators and converting plus signs in query strings into encoded values is done by the Web browser. When a user accesses a Gopher search from a Web browser, he or she is prompted for search strings. These are generally entered in a text box, using space characters to separate the different strings. When the search information is submitted, the search strings are appended, with appropriate encodings, to the URL. The client software is responsible for replacing space characters by plus signs and for encoding characters in the users search string that might be incorrectly interpreted. The Gopher protocol supports additional features not discussed here. Please see the references at the end of this chapter for additional information. HTTP URLsHttp URLs designate files, directories, or server-side programs accessible using the HTTP protocol. An http URL must always point to a file (text or program) or a directory. The general form is <http://int.dom.nam:port/resource> where the port number is optional (the default value is 80) and where resource specifies the resource. Resources are usually (but not always) files or directories. A directory is indicated by terminating the directory name with a forward slash, as in: <http://www.utoronto.ca/webdocs/HTMLdocs/> The following reference to this directory is an error, since it implies a reference to a file and not a directory: <http://www.utoronto.ca/webdocs/HTMLdocs> Most HTTP servers can detect this type of error and realize that the user wants to view the directory listing. In these cases, the server returns a server redirect HTTP response header, which contains the correct URL (with the trailing slash) and instructs the browser to try this URL instead. Server redirects are discussed in Chapter 9. Note, however, that you can omit the trailing slash when referencing the root of a Web site. Thus, the following two URLs are both equivalent and correct: <http://www.utoronto.ca/> <http://www.utoronto.ca> Special Characters in HTTP URLsThe forward slash (/), semicolon (;), question mark (?), and hash (#) are special characters in the path and query string portions of an http URL. The slash denotes a change in hierarchy (such as a directory), while the question mark ends the resource location path and indicates the start of a query string. The hash denotes the start of a fragment identifier. The semicolon is reserved for future use and should therefore be encoded in all cases where you intend a literal semicolon. URL Encoding of Query StringsHttp URLs can contain query data to be passed to the serverthese data are appended to the URL, separated from it by a question mark. Besides the character encodings required within URLs, query strings undergo additional levels of encoding to preserve information about the structure of the query data. This is necessary because certain characters in a query string are assigned special encoded meanings as part of the queryfor example, the plus character (+) used to encode spaces, as noted earlier. There are several different ways these encodings are done, depending both on the mechanism by which the data are input by the user and on the mechanism by which the data are sent to the server. Document authors do not usually have to worry about the encoding phase; browsers take ISINDEX or FORM data and do the encoding automatically. However, a gateway program author must explicitly decode these data to recover the original information; thus, he or she must understand the encoding in order to reverse the procedure. The following is a brief review of the encoding steps; you are referred to Chapters 6 (discussion of FORM elements) and 10 for more details. URL Encoding for ISINDEX and FORM DataThe following steps outline the query string encoding process, elaborated to illustrate the important points. If the data are from an ISINDEX query, these steps apply to the encoding of the entire query input string; if the data are from FORM-based input, the encoding steps apply to each name and value string from the forms user-input elements.
At this point, all ASCII punctuation characters are encoded, except for the five characters: _ - . * @ If the data are from an ISINDEX query, the encoding is complete. If they are from a FORM, only the individual name and value strings from each FORM input element have been encoded, as described in steps 1 through 7. These strings are then combined according to the following rules:
Note that the first encoding phase (steps 17) encoded all ampersands in the name and value strings, so that the only unencoded ampersands in the query string are those that separate name/value pairs. Query-String Encoding MIME Type Query string data encoded according to this algorithm are said to be URL-encoded. In fact, this encoding mechanism is assigned its own MIME type, namely: Content-type: application/x-www-form-url-encoded Note that you can easily tell if the data are from a FORM or ISINDEX query just by checking for unencoded equals signs. For example, the first of the following two URLs is from an ISINDEX query, the second from a FORM (the query string portion is in boldface): <http://some.site.edu/cgi-bin/foo?arg1+arg2+arg3> <http://some.site.edu/cgi-bin/program?name1=value1&name2=value2>
|
Products | Contact Us | About Us | Privacy | Ad Info | Home
Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement. |