Today you will learn how to capture data from relational databases by using Structured Query Language (SQL). SQL is a powerful language designed specifically for manipulating data.
When you complete this lesson, you will be able to use SQL to transform a relational database's raw data into a rich harvest of useful information. You will appreciate how SQL queries and a properly designed relational database enable you to fulfill requests for information that were unanticipated at the time that the database was first created.
Today you will
In Day 1, "Choosing the Right Database Technology," you learned that relational databases have their own language interpreter. This language interpreter enables a relational database to interpret and execute commands sent to the database. These commands are written in Structured Query Language, a language specifically designed to work with data that resides in relational databases. The name for the original version of this database language was Sequel. However, that name has been changed to SQL.
NOTE |
When referring to the SQL language, the word SQL should be pronounced ess-que-ell instead of sequel. The sequel pronunciation is sometimes used when referring to Microsoft SQL Server and Sybase SQL Server. However, when talking about the SQL language, use ess-que-ell. |
Just as there is a standard for the C++ language, there is a standard for SQL. The most recent version of the SQL standard is SQL-92 (ANSI Document Number X3.135-1992). Relational databases use SQL. Each database vendor implements the SQL standard in its database but then adds its own extensions. For instance, the ANSI SQL used in Microsoft SQL Server is essentially the same as the ANSI SQL used in Oracle. However, SQL Server and Oracle each add their own custom extensions to the language.
SQL is not a procedural language. It's not like C++ or BASIC. In fact, SQL is probably unlike any programming language you have ever seen. SQL is built to deal with sets of records. These records can come from a table or from multiple tables inside a relational database.
With SQL, the whole idea is to collect a set of records that match the criteria you specify and then to perform some operation on these records. SQL has an almost English-like syntax (but with no accent). SQL statements typically have a verb, which indicates the action to be taken-a command such as SELECT, UPDATE, DELETE, or INSERT. The most common operation is to retrieve (or SELECT) records.
The SQL SELECT statement enables you to retrieve information from the database. The SELECT statement begins with the SELECT command and is followed by the noun(s) indicating which field(s) you want to SELECT. You can use an asterisk to indicate that you want all the fields. You have to tell the database which table you are talking about, so you include a FROM clause that indicates the table. The basic syntax looks like this:
SELECT which fields FROM which table
An SQL query selects a set of records that match the criteria you specify. The preceding SELECT statement would select all the records from the table. You can see a SELECT statement like this in action right inside Visual C++.
Open the database project you created in yesterday's lesson (Day 2,"Tools for Database Development in Visual C++ Developer Studio"). Select the Data View tab on the Workspace pane. As you know, this database relates to the sample application mentioned in Day 1-software for taking orders for products advertised on TV commercials. Double-click the Customers table (see Figure 3.1) to open it and view all the records, shown in Figure 3.2.
Figure 3.1 : The Visual C++ Data View.
Figure 3.2 : The Customers table.
Click the SQL button on the Query toolbar (the third button from the right), and you will see an SQL SELECT statement in the splitter window above the records in the table.
Figure 3.3 : The Customers table with a SELECT statement.
The SELECT statement shown in the window retrieved those records from the database. The SQL statement looks like this:
SELECT 'Customers'.* FROM 'Customers'
Visual C++ sent this statement to the database (in this case, the Access ODBC driver and Jet database engine). The database interpreted the SQL statement, read the data from the MDB file, and returned the set of records to Visual C++. Visual C++ displays those records in the window.
You can see that the SQL statement follows the basic syntax for the SELECT statement described earlier. However, this statement uses tablename.fieldname syntax to indicate which fields should be selected. The statement says SELECT 'Customers'.* to select all the fields in the Customers table. It places single quotes around the table name in case the table name contains spaces. The statement then says FROM 'Customers' to indicate from which table to select the records.
Because the statement says FROM 'Customers', the 'Customers'.* syntax might seem redundant. (The table name is specified twice in the same statement.) However, the tablename.fieldname syntax comes in handy when you have two tables in your FROM clause and the same field name exists in both tables. The tablename.fieldname syntax enables you to indicate from which table you want the field.
This SELECT statement selects all the fields from all the records in the table. Try editing the SELECT statement so that you don't get all the fields. Change the 'Customers'.* to 'Customers'.custlastname. Figure 3.4 shows the results.
Press the SQL Check button on the Query toolbar to verify the SQL syntax. This ensures that the syntax of your SQL statement is correct. It sends your SQL statement to the database's SQL interpreterÑin this case, the Jet database engineÑto see whether it can properly run the statement. If you haven't edited something you should have (or vice versa), you will receive the message that appears in Figure 3.5.
Figure 3.4 : Editing the SELECT statement.
Figure 3.5 : The SQL Syntax Verified message box.
Run the SQL statement by pressing the ! (run) button on the Query toolbar. When you run your SELECT statement, you will see only the custlastname field of every record in the Customers table, as in Figure 3.6.
Figure 3.6 : The Last Name field from the Customers table.
It's disappointing to note that if you misspell the field name, the Jet syntax checker will not catch it. When you run the query, you receive a rather unhelpful error message (see Figure 3.7). Some databases do a better job than others of verifying the syntax.
With certain SQL syntax errors, you receive a more descriptive error message, such as the one in Figure 3.8. However, if you ever need to track down some mysterious bug in your SQL statement, you might feel that this error message isn't descriptive enough either.
Figure 3.7 : The error message from the Jet/Access ODBC driver.
Figure 3.8 : The error message from the Jet/Access ODBC driver.
Change the SELECT statement back so that it selects all the fields and run the query again to see the whole table.
SELECT * FROM Customers
Wouldn't it be nice to see the customers in alphabetical order? That's very easy to do with SQL. All you have to do is add an ORDER BY clause to the SELECT statement, like this:
SELECT * FROM Customers ORDER BY custlastname
When you run this query, you will see all the customer records, ordered by the CustLastName field (see Figure 3.9).
Figure 3.9 : Customer names in order by last name.
You can select the customers in reverse alphabetical order by adding DESC to the end of the ORDER BY clause. Observe the results in Figure 3.10.
Figure 3.10: Customer names descending in order by last name.
With SQL SELECT statements, you can collect a set of records that fit specific criteria. The SELECT statements you have used so far select all the records. That's pretty broad criteria. To narrow that down, you add a predicate to the SQL statement. The predicate takes the form of a WHERE clause that indicates which records to select. Edit the SELECT statement to include a WHERE clause, like this:
SELECT 'Customers'.* FROM 'Customers' WHERE custnumber = 2
You can type a lowercase WHERE. When you click the button to check the SQL syntax, this will convert the WHERE to uppercase, as in Figure 3.11. Uppercase letters for SQL keywords are a standard convention.
Figure 3.11: The customers WHERE CustNumber = 2.
Now change the WHERE clause to find all customers with a last name of Travolta. Use single quotes around Travolta to indicate to the database that it's a string data type. Case sensitivity varies between different relational databases. Don't worry too much about it now. Nothing that you do with this database is case sensitive.
SELECT * FROM Customers WHERE custlastname = 'travolta'
Execute the statement and you will see that there really is only one John Travolta (see Figure 3.12).
Figure 3.12: The customers WHERE CustLastName ='Travolta'.
Wouldn't it be interesting to see what products John Travolta buys from TV advertising? SQL lets us find that information quite easily from our relational database. The first thing to do is look at John's orders. Double-click the Orders table to see all the orders, as shown in Figure 3.13.
Figure 3.13: The Orders table.
You can see in the Customers table that John Travolta's customer number is 4. You can see a CustomerNumber field in the Orders table. A couple of records in the Orders table contain 4 in the CustomerNumber field. You can rightly assume that those were orders placed by Mr. Travolta.
What you're doing is looking at the primary key in the Customers table, which is the CustNumber field, and comparing it to a foreign key field in the Orders table, which is the CustomerNumber field. Is there some way to put this information together without having to eyeball it? Yep, there is. It's called an SQL join.
First, you change the SQL query so that you find only the fields you are interested in. Edit and run the query so that you obtain only the first and last name fields of records in which the last name equals Travolta (see Figure 3.14).
Figure 3.14: The first and last name fields for Travolta.
You want to retrieve the part number from the Orders table, so add partnumber to the list of fields to select. You need to pull information from the Orders table, so add the Orders table to the FROM clause.
SELECT custfirstname, custlastname, partnumber FROM Customers, orders WHERE custlastname = 'travolta'
Here's the nifty part. You want only the records from the Orders table that have John Travolta's customer number. You specify this in the WHERE clause. You want records in which the CustomerNumber field (in the Orders table) equals the CustNumber field (in the Customers table). You still want only records from the Customers table that have a last name of Travolta. To do this, you add an AND to the WHERE clause, like this:
WHERE custlastname = 'travolta' AND customers.custnumber = orders.customernumber
Run the query, and the results will look like Figure 3.15.
Figure 3.15: The part numbers purchased by John Travolta.
You performed a two-table join to discover the part numbers of the products that John Travolta purchased. However, having part numbers isn't sufficient. You want to know the names of the products he purchased. That means you need information from the Products table, too. Double-click the Products table to open it and view all the products, as in Figure 3.16.
Figure 3.16: The Products table.
You know that you want the product name instead of the part number, so change the SELECT statement to select the ProductName field instead of the PartNumber field. You also know that the ProductName field comes from the Products table, so add the Products table to the FROM clause.
You want only product names that have the same part number as the ones you found in the Orders table for John Travolta. You specify this by adding another AND condition to the WHERE clause. Your SELECT statement will look like this:
SELECT custfirstname, custlastname, productname FROM Customers, orders, products WHERE custlastname = 'travolta' AND customers.custnumber = orders.customernumber AND orders.partnumber = products.partnumber
When you run the query, you can see in Figure 3.17 that Mr. Travolta has a particular penchant for 8-track music from the 70s.
Figure 3.17: The product names purchased by John Travolta.
Now that you have the information you need, you can close the query windows. When you close the Customers Query window, you will be prompted to save the query. Because you could probably re-create this query in a heartbeat if necessary, you don't need to save it.
SQL syntax can be elegant and powerful. SQL queries can be nested to perform operations requiring lengthy code in a procedural language. Nested queries are called subqueries.
A subquery enables you, with very little code, to find all the customers who have placed orders since a certain date, such as November 11, 1998.
Open a query window for the Orders table by double-clicking the Orders table in the Data View. Click the SQL button to view the SELECT statement that retrieved the records (see Figure 3.18).
Figure 3.18: The Orders table.
You need to find the customers who have ordered since November 11, 1998. The first step is to find the orders placed since then. Modify the query so that the window contains only order records after that date. You do this by adding a WHERE clause like this:
SELECT 'Orders'.* FROM 'Orders' WHERE OrderDate > { d '1998-11-11' }
The ANSI SQL convention is to use braces and the d, with single quotes around the date to indicate to the database that this is a date, as opposed to a text string or numeric type. When you run the query, you will find only the orders placed since that date, as shown in Figure 3.19.
Figure 3.19: The orders since November 11, 1998.
Now that you have the orders since that date, you are going to find the customers who placed those orders, by retrieving the customer numbers from the orders. You need to change the query so that you obtain only the customer numbers, as shown in Figure 3.20.
Figure 3.20: The customer numbers from orders since November 11, 1998.
Now you can use these customer numbers to find the customer information. Build a SELECT statement that retrieves the customer records from the Customers table in which the customer number is among the numbers retrieved in the orders query. To do this, you can embed the orders query into the WHERE clause of a new query, like this:
SELECT 'Customers'.* FROM 'Customers' WHERE custnumber IN (SELECT 'Orders'.customernumber FROM 'Orders' WHERE OrderDate > { d '1998-11-11' })
The easiest way to create this new query is to open a new query window to the Customers table. Press the SQL button on the Query toolbar so that you can see the customers query. Add the WHERE clause so the query looks like the following:
SELECT 'Customers'.* FROM 'Customers' WHERE custnumber IN ()
Then select the text of the orders query, copy it to the Clipboard, and paste it inside the parentheses at the end of the WHERE clause in the customers query, like this:
SELECT 'Customers'.* FROM 'Customers' WHERE custnumber IN (SELECT 'Orders'.customernumber FROM 'Orders' WHERE OrderDate > { d '1998-11-11' })
When you run the query, you can see in Figure 3.21 that two customers have placed orders since November 11, 1998.
The code you would have to write in C++ to retrieve this same information from a binary data file, or even from a record manager, is considerably more complex. SQL is made to work with data, and that is where it excels.
Figure 3.21: The customers with orders since November 11, 1998.
Another helpful query to learn is which order in the database is the most recent order. One way to find this is to select all the orders and include an ORDER BY OrderDate DESC clause.
SELECT 'Orders'.* FROM 'Orders' ORDER BY orderdate DESC
The first record is the most recent order.
Another way to find the most recent order is to ask the database for the order in which the date is equal to the maximum date in the table. Asking for the maximum date in the table is easy. You do it like this:
SELECT MAX(orderdate) FROM Orders
MAX is an aggregate function, meaning that it operates on multiple records but returns a single value. You can embed this SQL statement as a subquery in another SELECT statement to find the most recent order (see Figure 3.22).
Figure 3.22: The most recent order.
You can also perform a join with the Customers table to find the customer who placed the most recent order.
When an application issues an SQL SELECT statement to a database, the database interprets it and returns a set of records. As you know, this is what happens in Visual C++ when you double-click a table in the Data View. Visual C++ sends a SELECT * statement to the database and the database returns the records. Visual C++ then displays those records in a query window.
The set of records that a relational database returns as the result of a query is called the resultset.
SQL doesn't care whether that resultset contains one record or one million records. SQL cares only about giving you a set of records that fit the criteria you specified. Most relational databases provide a simple model for the application to retrieve data after a submitting query. Records are returned to the application one at a time in the order specified by the query, until the end of the set is reached. SQL has no provision for moving back to a previous record.
Applications often require the capability to move back and forth through a resultset, one record at a time, and possibly edit or delete a particular record in the set. SQL was not originally designed to provide that capability. This is where cursors come in.
A cursor is a mechanism that enables the individual records of a resultset to be processed one record at a time. The mechanism is called a cursor because it indicates the current position in a resultset. This is similar to the cursor on a computer screen, which indicates the current position of the input pointer in, for example, a document.
A cursor that provides the ability to move only forward within a resultset is called a forward-only cursor.
A cursor that provides the ability to move forward and backward within a resultset is called a scrollable cursor.
A cursor that enables a user to change or delete data in addition to scrolling is referred to as a scrollable, updatable cursor.
You can see that Visual C++ uses scrollable, updatable cursors when you open a query window. You can move back and forth through the records and edit the information in the records.
Today you learned that SQL is a language that excels at manipulating data. You learned how to issue SELECT queries to retrieve records from a single table in a relational database. You learned how to perform joins to retrieve records from multiple tables. You learned how to use subqueries in SQL to obtain information that requires a large quantity of procedural code to retrieve. Last, you learned that cursors are a mechanism that enables record-at-a-time navigation through a resultset.
Is it possible to write complete applications, using SQL? | |
It's not possible to write applications by using ANSI SQL as the only programming language. ANSI SQL has no control of flow constructs, such as conditional loops and if statements. No user interface mechanisms or input/output routines (other than SELECT, which provides output) are available. However, some relational database servers, such as Microsoft SQL Server, do provide some control of flow extensions in their SQL implementations. | |
Is SQL always interpreted? Wouldn't SQL run faster if there were a way to compile the SQL statements? | |
SQL is often interpreted, but there are ways to compile SQL statements to make them run faster. Relational database servers have the capability to store SQL statements in the database with the data. The database server interprets, optimizes, compiles, and saves the SQL statements in their compiled form for speedy execution. These prepared SQL statements are called stored procedures. |
The Workshop quiz questions test your understanding of today's material. The answers appear in Appendix F, "Answers." The exercises encourage you to apply the information you learned today to real-life situations.
SELECT customers.* WHERE customers.custnumber = 3
SELECT customers.* FROM customer, orders WHERE customers.custnumber = 3
© Copyright, Sams Publishing. All rights reserved.