Chapter 18, "Sockets, MAPI, and the Internet," introduces the WinInet classes that you can use to build Internet client applications at a fairly high level. This chapter develops an Internet application that demonstrates a number of these classes. The application also serves a useful function: You can use it to learn more about the Internet presence of a company or organization. You don't need to learn about sockets or handle the details of Internet protocols to do this.
Imagine that you have someone's email address (kate@gregcons.com, for example) and you'd like to know more about the domain (gregcons.com in this example). Perhaps you have a great idea for a domain name and want to know whether it's already taken. This application, Query, will try connecting to gregcons.com (or greatidea.org, or any other domain name that you specify) in a variety of ways and will report the results of those attempts to the user.
This application will have a simple user interface. The only piece of information that the user needs to supply is the domain name to be queried, and there is no need to keep this information in a document. You might want a menu item called Query that brings up a dialog box in which to specify the site name, but a better approach is to use a dialog-based application and incorporate a Query button into the dialog box.
A dialog-based application, as discussed in the section "A Dialog-Based Application" of Chapter 1, "Building Your First Application," has no document and no menu. The application displays a dialog box at all times; closing the dialog box closes the application. You build the dialog box for this application like any other, with Developer Studio.
To build this application's shell, choose File, New from within Developer Studio and then click the Project tab. Highlight MFC AppWizard(exe), name the application Query, and in Step 1 choose Dialog Based, as shown in Figure 19.1. Click Next to move to Step 2 of AppWizard.
FIG. 19.1 Choose a dialog-based application for Query.
In Step 2 of AppWizard, request an About box, no context-sensitive Help, 3D controls, no automation or ActiveX control support, and no sockets support. (This application won't be calling socket functions directly.) Give the application a sensible title for the dialog box. The AppWizard choices are summarized, as shown in Figure 19.2. Click Next to move to Step 3 of AppWizard.
FIG. 19.2 This application doesn't need Help, automation, ActiveX controls, or sockets.
The rest of the AppWizard process will be familiar by now: You want comments, you want to link to the MFC libraries as a shared DLL, and you don't need to change any of the classnames suggested by AppWizard. When the AppWizard process is completed, you're ready to build the heart of the Query application.
AppWizard produces an empty dialog box for you to start with, as shown in Figure 19.3. To edit this dialog box, switch to the resource view, expand the Query Resources, expand the Dialogs section, and double-click the IDD_QUERY_DIALOG resource. The following steps will transform this dialog box into the interface for the Query application.
FIG. 19.3 AppWizard generates an empty dialog box for you.
TIP: If working with dialog boxes is still new to you, be sure to read Chapter 2, "Dialogs and Controls."
The finished dialog box and the Style properties of the large edit box will resemble Figure 19.4.
FIG. 19.4 Build the Query user interface as a single dialog box.
When the user clicks the Query button, this application should somehow query the site. The last step in the building of the interface is to connect the Query button to code with ClassWizard. Follow these steps to make that connection:
FIG. 19.5 Add a function to handle a click on the Query button, still with the ID IDOK.
Click OK to close ClassWizard. Now all that remains is to write CQueryDlg::OnQuery(), which will use the value in m_host to produce lines of output for m_out.
FIG. 19.6 Connect IDC_HOST to CQueryDlg::m_host.
The first kind of connection to try when investigating a domain's Internet presence is HTTP because so many sites have Web pages. The simplest way to make a connection using HTTP is to use the WinInet class CInternetSession and call its OpenURL() function. This will return a file, and you can display the first few lines of the file in m_out. First, add this line at the beginning of QueryDlg.cpp, after the include of stdafx.h:
#include "afxinet.h"
This gives your code access to the WinInet classes. Because this application will try a number of URLs, add a function called TryURL() to CQueryDlg. It takes a CString parameter called URL and returns void. Right-click CQueryDlg in the ClassView and choose Add Member Function to add TryURL() as a protected member function. The new function, TryURL(), will be called from CQueryDlg::OnQuery(), as shown in Listing 19.1. Edit OnQuery() to add this code.
void CQueryDlg::OnQuery() { const CString http = "http://"; UpdateData(TRUE); m_out = ""; UpdateData(FALSE); TryURL(http + m_host); TryURL(http + "www." + m_host);
}
The call to UpdateData(TRUE) fills m_host with the value that the user typed. The call to UpdateData(FALSE) fills the IDC_OUT read-only edit box with the newly cleared m_out. Then come two calls to TryURL(). If, for example, the user typed microsoft.com, the first call would try http://microsoft.com and the second would try http://www.microsoft.com. TryURL() is shown in Listing 19.2.
void CQueryDlg::TryURL(CString URL) { CInternetSession session; m_out += "Trying " + URL + "\r\n"; UpdateData(FALSE); CInternetFile* file = NULL; try { //We know for sure this is an Internet file, //so the cast is safe file = (CInternetFile*) session.OpenURL(URL); } catch (CInternetException* pEx) { //if anything went wrong, just set file to NULL file = NULL; pEx->Delete(); } if (file) { m_out += "Connection established. \r\n"; CString line; for (int i=0; i < 20 && file->ReadString(line); i++) { m_out += line + "\r\n"; } file->Close(); delete file; } else { m_out += "No server found there. \r\n"; } m_out += "------------------------\r\n"; UpdateData(FALSE);
}
The remainder of this section presents this code again, a few lines at a time. First, establish an Internet session by constructing an instance of CInternetSession. There are a number of parameters to this constructor, but they all have default values that will be fine for this application. The parameters follow:
dwAccessType defaults to using the value in the Registry. Obviously, an application that insists on direct Internet access or proxy Internet access is less useful than one that enables users to configure that information. Making users set their Internet access type outside this program might be confusing, though. To set your default Internet access, double-click the My Computer icon on your desktop, then on the Control Panel, and then on the Internet tool in the Control Panel. Choose the Connection tab (the version for Internet Explorer under Windows 95 is shown in Figure 19.7) and complete the dialog box as appropriate for your setup. If you are using NT or Windows 98, or if your browser version is different, you might see a slightly different dialog, but you should still be able to choose your connection type.
FIG. 19.7 Set your Internet connection settings once, and all applications can retrieve them from the Registry.
If you want to set up an asynchronous (nonblocking) session, for the reasons discussed in the "Using Windows Sockets" section of Chapter 18, your options in dwFlags must include INTERNET_FLAG_ASYNC. In addition, you must call the member function EnableStatusCallback() to set up the callback function. When a request is made through the session--such as the call to OpenURL() that occurs later in TryURL()--and the response will not be immediate, a nonblocking session returns a pseudo error code, ERROR_IO_PENDING. When the response is ready, these sessions automatically invoke the callback function.
For this simple application, there is no need to allow the user to do other work or interact with the user interface while waiting for the session to respond, so the session is constructed as a blocking session and all the other default parameters are also used:
CInternetSession session;
Having constructed the session, TryURL() goes on to add a line to m_out that echoes the URL passed in as a parameter. The "\r\n" characters are return and newline, and they separate the lines added to m_out. UpdateData(FALSE) gets that onscreen:
m_out += "Trying " + URL + "\r\n"; UpdateData(FALSE);
Next is a call to the session's OpenURL() member function. This function returns a pointer to one of several file types because the URL might have been to one of four protocols:
Because CGopherFile and CHttpFile both inherit from CInternetFile and because you can be sure that TryURL() will not be passed a file:// URL, it is safe to cast the returned pointer to a CInternetFile.
TIP: There is some confusion in Microsoft's online documentation whenever sample URLs are shown. A backslash (\) character will never appear in an URL. In any Microsoft example that includes backslashes, use forward slashes (/) instead.
If the URL would not open, file will be NULL, or OpenURL()_ will throw an exception. (For background on exceptions, see Chapter 26, "Exceptions and Templates.") Whereas in a normal application it would be a serious error if an URL didn't open, in this application you are making up URLs to see whether they work, and it's expected that some won't. As a result, you should catch these exceptions yourself and do just enough to prevent runtime errors. In this case, it's enough to make sure that file is NULL when an exception is thrown. To delete the exception and prevent memory leaks, call CException::Delete(), which safely deletes the exception. The block of code containing the call to OpenURL() is in Listing 19.3.
CInternetFile* file = NULL; try { //We know for sure this is an Internet file, //so the cast is safe file = (CInternetFile*) session.OpenURL(URL); } catch (CInternetException* pEx) { //if anything went wrong, just set file to NULL file = NULL; pEx->Delete();
}
If file is not NULL, this routine will display some of the Web page that was found. It first echoes another line to m_out. Then, in a for loop, the routine calls CInternetFile::ReadString() to fill the CString line with the characters in file up to the first \r\n, which are stripped off. This code simply tacks line (and another \r\n) onto m_out. If you would like to see more or less than the first 20 lines of the page, adjust the number in this for loop. When the first few lines have been read, TryURL() closes and deletes the file. That block of code is shown in Listing 19.4.
if (file) { m_out += "Connection established. \r\n"; CString line; for (int i=0; i < 20 && file->ReadString(line); i++) { m_out += line + "\r\n"; } file->Close(); delete file;
}
If the file could not be opened, a message to that effect is added to m_out:
else { m_out += "No server found there. \r\n"; }
Then, whether the file existed or not, a line of dashes is tacked on m_out to indicate the end of this attempt, and one last call to UpdateData(FALSE) puts the new m_out onscreen:
m_out += "------------------------\r\n"; UpdateData(FALSE); }
You can now build and run this application. If you enter microsoft.com in the text box and click Query, you'll discover that there are Web pages at both http://microsoft.com and http://www.microsoft.com. Figure 19.8 shows the results of that query.
FIG. 19.8 Query can find Microsoft's Web sites.
If Query doesn't find Web pages at either the domain name you provided or www. plus the domain name, it doesn't mean that the domain doesn't exist or even that the organization that owns the domain name doesn't have a Web page. It does make it less likely, however, that the organization both exists and has a Web page. If you see a stream of HTML, you know for certain that the organization exists and has a Web page. You might be able to read the HTML yourself, but even if you can't, you can now connect to the site with a Web browser such as Microsoft's Internet Explorer.
As part of a site name investigation, you should check whether there is an FTP site, too. Most FTP sites have names like ftp.company.com, though some older sites don't have names of that form. Checking for these sites isn't as simple as just calling TryURL() again because TryURL() assumes that the URL leads to a file, and URLs like ftp.greatidea.org lead to a list of files that cannot simply be opened and read. Rather than make TryURL() even more complicated, add a protected function to the class called TryFTPSite(CString host). (Right-click CQueryDlg in the ClassView and choose Add Member Function to add the function. It can return void.)
TryFTPSite() has to establish a connection within the session, and if the connection is established, it has to get some information that can be added to m_out to show the user that the connection has been made. Getting a list of files is reasonably complex; because this is just an illustrative application, the simpler task of getting the name of the default FTP directory is the way to go. The code is in Listing 19.5.
void CQueryDlg::TryFTPSite(CString host) { CInternetSession session; m_out += "Trying FTP site " + host + "\r\n"; UpdateData(FALSE); CFtpConnection* connection = NULL; try { connection = session.GetFtpConnection(host); } catch (CInternetException* pEx) { //if anything went wrong, just set connection to NULL connection = NULL; pEx->Delete(); } if (connection) { m_out += "Connection established. \r\n"; CString line; connection->GetCurrentDirectory(line); m_out += "default directory is " + line + "\r\n"; connection->Close(); delete connection; } else { m_out += "No server found there. \r\n"; } m_out += "------------------------\r\n"; UpdateData(FALSE);
}
This code is very much like TryURL(), except that rather than open a file with session.OpenURL(), it opens an FTP connection with session.GetFtpConnection(). Again, exceptions are caught and essentially ignored, with the routine just making sure that the connection pointer won't be used. The call to GetCurrentDirectory() returns the directory on the remote site in which sessions start. The rest of the routine is just like TryURL().
Add two lines at the end of OnQuery() to call this new function:
TryFTPSite(m_host); TryFTPSite("ftp." + m_host);
Build the application and try it: Figure 19.9 shows Query finding no FTP site at microsoft.com and finding one at ftp.microsoft.com. The delay before results start to appear might be a little disconcerting. You can correct this by using asynchronous sockets, or threading, so that early results can be added to the edit box while later results are still coming in over the wire. However, for a simple demonstration application like this, just wait patiently until the results appear. It might take several minutes, depending on network traffic between your site and Microsoft's, your line speed, and so on.
FIG. 19.9 Query finds one Microsoft FTP site.
If Query doesn't find Web pages or FTP sites, perhaps this domain doesn't exist at all or doesn't have any Internet services other than email, but there are a few more investigative tricks available. The results of these investigations will definitely add to your knowledge of existing sites.
As with FTP, TryURL() won't work when querying a Gopher site like gopher.company.com because this returns a list of filenames instead of a single file. The solution is to write a protected member function called TryGopherSite() that is almost identical to TryFTPSite(), except that it opens a CGopherConnection. Also, rather than echo a single line describing the default directory, it echoes a single line describing the Gopher locator associated with the site. Add TryGopherSite to CQueryDlg by right-clicking the classname in ClassView and choosing Add Member Function, as you did for TryFTPSite(). The code for TryGopherSite() is in Listing 19.6.
void CQueryDlg::TryGopherSite(CString host) { CInternetSession session; m_out += "Trying Gopher site " + host + "\r\n"; UpdateData(FALSE); CGopherConnection* connection = NULL; try { connection = session.GetGopherConnection(host); } catch (CInternetException* pEx) { //if anything went wrong, just set connection to NULL connection = NULL; pEx->Delete(); } if (connection) { m_out += "Connection established. \r\n"; CString line; CGopherLocator locator = connection->CreateLocator(NULL, NULL, GOPHER_TYPE_DIRECTORY); line = locator; m_out += "first locator is " + line + "\r\n"; connection->Close(); delete connection; } else { m_out += "No server found there. \r\n"; } m_out += "------------------------\r\n"; UpdateData(FALSE);
}
The call to CreateLocator() takes three parameters. The first is the filename, which might include wild cards. NULL means any file. The second parameter is a selector that can be NULL. The third is one of the following types:
GOPHER_TYPE_TEXT_FILE
GOPHER_TYPE_DIRECTORY
GOPHER_TYPE_CSO
GOPHER_TYPE_ERROR
GOPHER_TYPE_MAC_BINHEX
GOPHER_TYPE_DOS_ARCHIVE
GOPHER_TYPE_UNIX_UUENCODED
GOPHER_TYPE_INDEX_SERVER
GOPHER_TYPE_TELNET
GOPHER_TYPE_BINARY
GOPHER_TYPE_REDUNDANT
GOPHER_TYPE_TN3270
GOPHER_TYPE_GIF
GOPHER_TYPE_IMAGE
GOPHER_TYPE_BITMAP
GOPHER_TYPE_MOVIE
GOPHER_TYPE_SOUND
GOPHER_TYPE_HTML
GOPHER_TYPE_PDF
GOPHER_TYPE_CALENDAR
GOPHER_TYPE_INLINE
GOPHER_TYPE_UNKNOWN
GOPHER_TYPE_ASK
GOPHER_TYPE_GOPHER_PLUS
Normally, you don't build locators for files or directories; instead, you ask the server for them. The locator that will be returned from this call to CreateLocator() describes the locator associated with the site you are investigating.
Add a pair of lines at the end of OnQuery() that call this new TryGopherSite() function:
TryGopherSite(m_host); TryGopherSite("gopher." + m_host);
Build and run the program again. Again, you might have to wait several minutes for the results. Figure 19.10 shows that Query has found two Gopher sites for harvard.edu. In both cases, the locator describes the site itself. This is enough to prove that there is a Gopher site at harvard.edu, which is all that Query is supposed to do.
FIG. 19.10 Query finds two Harvard Gopher sites.
TIP: Gopher is an older protocol that has been supplanted almost entirely by the World Wide Web. As a general rule, if a site has a Gopher presence, it's been on the Internet since before the World Wide Web existed (1989) or at least before the huge upsurge in popularity began (1992). What's more, the site was probably large enough in the early 1990s to have an administrator who would set up the Gopher menus and text.
There is another protocol that can give you information about a site. It's one of the oldest protocols on the Internet, and it's called Finger. You can finger a single user or an entire site, and though many sites have disabled Finger, many more will provide you with useful information in response to a Finger request.
There is no MFC class or API function with the word finger in its name, but that doesn't mean you can't use the classes already presented. This section relies on a trick--and on knowledge of the Finger and Gopher protocols. Although the WinInet classes are a boon to new Internet programmers who don't quite know how the Internet works, they also have a lot to offer to old-timers who know what's going on under the hood.
As discussed in the "Using Windows Sockets" section of Chapter 18, all Internet transactions involve a host and a port. Well-known services use standard port numbers. For example, when you call CInternetSession::OpenURL() with an URL that begins with http://, the code behind the scenes connects to port 80 on the remote host. When you call GetFtpConnection(), the connection is made to port 21 on the remote host. Gopher uses port 70. If you look at Figure 19.10, you'll see that the locator that describes the gopher.harvard.edu site includes a mention of port 70.
The Gopher documentation makes this clear: If you build a locator with a host name, port 70, Gopher type 0 (GOPHER_TYPE_TEXT_FILE is defined to be 0), and a string with a filename, any Gopher client simply sends the string, whether it's a filename or not, to port 70. The Gopher server listening on that port responds by sending the file.
Finger is a simple protocol, too. If you send a string to port 79 on a remote host, the Finger server that is listening there will react to the string by sending a Finger reply. If the string is only \r\n, the usual reply is a list of all the users on the host and some other information about them, such as their real names. (Many sites consider this an invasion of privacy or a security risk, and they disable Finger. Many other sites, though, deliberately make this same information available on their Web pages.)
Putting this all together, if you build a Gopher locator using port 79--instead of the default 70--and an empty filename, you can do a Finger query using the MFC WinInet classes. First, add another function to CQueryDlg called TryFinger(), which takes a CString host and returns void. The code for this function is very much like TryGopherSite(), except that the connection is made to port 79:
connection = session.GetGopherConnection(host,NULL,NULL,79);
After the connection is made, a text file locator is created:
CGopherLocator locator = connection->CreateLocator(NULL, NULL, GOPHER_TYPE_TEXT_FILE);
This time, rather than simply cast the locator into a CString, use it to open a file:
CGopherFile* file = connection->OpenFile(locator);
Then echo the first 20 lines of this file, just as TryURL() echoed the first 20 lines of the file returned by a Web server. The code for this is in Listing 19.7.
if (file) { CString line; for (int i=0; i < 20 && file->ReadString(line); i++) { m_out += line + "\r\n"; } file->Close(); delete file;
}
Putting it all together, Listing 19.8 shows TryFinger().
void CQueryDlg::TryFinger(CString host) { CInternetSession session; m_out += "Trying to Finger " + host + "\r\n"; UpdateData(FALSE); CGopherConnection* connection = NULL; try { connection = session.GetGopherConnection(host,NULL,NULL,79); } catch (CInternetException* pEx) { //if anything went wrong, just set connection to NULL connection = NULL; pEx->Delete(); } if (connection) { m_out += "Connection established. \r\n"; CGopherLocator locator = connection->CreateLocator(NULL, NULL, GOPHER_TYPE_TEXT_FILE); CGopherFile* file = connection->OpenFile(locator); if (file) { CString line; for (int i=0; i < 20 && file->ReadString(line); i++) { m_out += line + "\r\n"; } file->Close(); delete file; } connection->Close(); delete connection; } else { m_out += "No server found there. \r\n"; } m_out += "------------------------\r\n"; UpdateData(FALSE);
}
Add a line at the end of OnQuery() that calls this new function:
TryFinger(m_host);
Now, build and run the application. Figure 19.11 shows the result of a query on the site whitehouse.gov, scrolled down to the Finger section.
FIG. 19.11 Query gets email addresses from the White House Finger server.
NOTE: If the site you are investigating isn't running a Finger server, the delay will be longer than usual and a message box will appear, telling you the connection timed out. Click OK on the message box if it appears.[dagger]n
One last protocol provides information about sites. It, too, is an old protocol not supported directly by the WinInet classes. It is called Whois, and it's a service offered by only a few servers on the whole Internet. The servers that offer this service are maintained by the organizations that register domain names. For example, domain names that end in .com are registered through an organization called InterNIC, and it runs a Whois server called rs.internic.net (the rs stands for Registration Services.) Like Finger, Whois responds to a string sent on its own port; the Whois port is 43. Unlike Finger, you don't send an empty string in the locator; you send the name of the host that you want to look up. You connect to rs.internic.net every time. (Dedicated Whois servers offer users a chance to change this, but in practice, no one ever does.)
Add a function called TryWhois(); as usual, it takes a CString host and returns void. The code is in Listing 19.9.
void CQueryDlg::TryWhois(CString host) { CInternetSession session; m_out += "Trying Whois for " + host + "\r\n"; UpdateData(FALSE); CGopherConnection* connection = NULL; try { connection = session.GetGopherConnection ¬("rs.internic.net",NULL,NULL,43); } catch (CInternetException* pEx) { //if anything went wrong, just set connection to NULL connection = NULL; pEx->Delete(); } if (connection) { m_out += "Connection established. \r\n"; CGopherLocator locator = connection->CreateLocator(NULL, host, GOPHER_TYPE_TEXT_FILE); CGopherFile* file = connection->OpenFile(locator); if (file) { CString line; for (int i=0; i < 20 && file->ReadString(line); i++) { m_out += line + "\r\n"; } file->Close(); delete file; } connection->Close(); delete connection; } else { m_out += "No server found there. \r\n"; } m_out += "------------------------\r\n"; UpdateData(FALSE);
}
Add a line at the end of OnQuery() to call it:
TryWhois(m_host);
Build and run the application one last time. Figure 19.12 shows the Whois part of the report for mcp.com--this is the domain for Macmillan Computer Publishing, Que's parent company.
FIG. 19.12 Query gets real-life addresses and names from the InterNIC Whois server.
Adding code after the Finger portion of this application means that you can no longer ignore the times when the Finger code can't connect. When the call to OpenFile() in TryFinger() tries to open a file on a host that isn't running a Finger server, an exception is thrown. Control will not return to OnQuery(), and TryWhois() will never be called. To prevent this, you must wrap the call to OpenFile() in a try and catch block. Listing 19.10 shows the changes to make.
//replace this line: CGopherFile* file = connection->OpenFile(locator); //with these lines: CGopherFile* file = NULL; try { file = connection->OpenFile(locator); } catch (CInternetException* pEx) { //if anything went wrong, just set file to NULL file = NULL; pEx->Delete();
}
Change TryFinger(), build Query again, and query a site that doesn't run a Finger server, such as microsoft.com. You will successfully reach the Whois portion of the application.
The Query application built in this chapter does a lot, but it could do much more. There are email and news protocols that could be reached by stretching the WinInet classes a little more and using them to connect to the standard ports for these other services. You could also connect to some well-known Web search engines and submit queries by forming URLs according to the pattern used by those engines. In this way, you could automate the sort of poking around on the Internet that most of us do when we're curious about a domain name or an organization.
If you'd like to learn more about Internet protocols, port numbers, and what's happening when a client connects to a server, you might want to read Que's Building Internet Applications with Visual C++. The book was written for Visual C++ 2.0, and though all the applications in the book compile and run under later versions of MFC, the applications would be much shorter and easier to write now. Still, the insight into the way the protocols work is valuable.
The WinInet classes, too, can do much more than you've seen here. Query doesn't use them to retrieve real files over the Internet. Two of the WinInet sample applications included with Visual C++ 6.0 do a fine job of showing how to retrieve files:
There are a lot more Microsoft announcements to come in the next few months. Keep an eye on the Web site www.microsoft.com for libraries and software development kits that will make Internet software development even easier and faster.
© Copyright, Macmillan Computer Publishing. All rights reserved.