Up to this point in the book, you've been introduced to only the most basic SQL queries: SELECT queries. This chapter describes the five additional types of queries that you can execute with SQL's data manipulation language: crosstab, append, update, delete, and make-table queries. The crosstab query is a special form of SELECT query that summarizes data in spreadsheet style, most often in time-series format. You are likely to find that 75 percent or more of the decision-support applications you create for your firm or clients include crosstab queries. In many cases, the crosstab query result set serves as the datasource for graphs and charts.
The remaining four types of queries described in this chapter are called action queries. Microsoft Access introduced the action query category for manipulative (Data Manipulation Language or DML) queries, which update the data in existing tables and create new tables. Action queries can substitute for lengthy blocks of Visual C++ code when you need to modify the content of tables in your database or when you want to create a new database table.
Even before the early days of the personal computer and the VisiCalc spreadsheet application, managers were accustomed to viewing summary data in the row-column format of accountants' worksheets. Worksheets that are used for comparative performance analysis fall into two basic categories:
Today's spreadsheet applications, such as Lotus 1-2-3 and Microsoft Excel, replace the drudgery of preparing handwritten worksheets with automated computer-based procedures. However, the majority of the detail information that is needed to prepare summary data for management is available only in the fields of tables of relational accounting databases. Chapter 4, "Optimizing the Design of Relational Databases," demonstrated that most spreadsheet data formats violate the "no repeating groups" rule of relational tables. Conventional ANSI SQL statements return data in relational format, not spreadsheet format. Thus, a substantial amount of data manipulation ordinarily is required to create a time-series or classification spreadsheet from relational data. In fact, the "denormalization" process is almost as complex as that described in Chapter 4 for normalizing spreadsheet data to relational form.
The following sections describe how summary data is returned by ANSI SQL and how you use Access SQL's TRANSFORM and PIVOT statements to automate the denormalization of relational data to spreadsheet format.
Figure 8.1 illustrates the format of data returned by conventional SELECT queries that use the SQL aggregate function SUM() to prepare data from which you can create a time-series worksheet. The tables employed to create the query return set, shown in the Crosstab query window of Figure 8.1, are Categories, Products, Orders, and Order Details. These tables' values are derived from the Northwind Traders sample database that accompanies Access. The query return set totals orders that were received for each of eight categories of products for the four calendar quarters of 1996. Thus, the grid contains eight records with a Category Name field and four quarter fields.
Figure 8.1. Using two GROUP BY criteria to sum the values of orders by category and date.
NOTE
Figure 8.1 is a composite. Access 95 won't show both a query SQL statement and the result of the query at the same time. You can toggle between the two views using Access 95's View menu when a query is open.
NOTE
The Access 95 CROSSTAB 95.MDB database (located in the CHAPTR08 subdirectory on the CD that comes with this book) is used for all the examples in this chapter. CROSSTAB 95.MDB contains records for the Categories, Products, and Employees tables that are the same as those in NorthWind.MDB (from Access), except that unnecessary fields have been deleted. The order dates have been updated. Records for years other than 1995 and 1996 have been deleted from the Orders and Order Details tables.
Access 95 was used to perform all the sample queries in this chapter.
The query that was used to create the data shown in Figure 8.1 is rather lengthy and deserves further explanation. The following list describes each element of the Access SQL statement that appears in Figure 8.1:
SELECT Categories.[Category ID] AS Cat, Format([Order Date],"yyyy\Qq") AS Period, Format(SUM([Order Details].[Unit Price]*[Order Details].[Quantity]), "$#,##0.00") AS Sales
It's clear from the rows returned by the preceding query and illustrated in Figure 8.1 that a substantial rearrangement of worksheet cells is required if you import this data into a worksheet that is intended to display data in time-series format.
You can achieve a result that is similar to the preceding example with ANSI/ODBC SQL using Microsoft Query. The following ANSI/ODBC SQL statement returns sales for each of the eight categories by month in 96 rows and also illustrates the use of the scalar ODBC date function MONTH() in SQL statements that are processed by ODBC drivers:
SELECT Categories."Category ID" AS Cat, {fn MONTH(Orders."Order Date")} AS Month, Sum("Order Details"."Unit Price" * "Order Details".Quantity) AS Sales FROM Categories, "Order Details", Orders, Products WHERE Products."Category ID" = Categories."Category ID" AND "Order Details"."Order ID" = Orders."Order ID" AND "Order Details"."Product ID" = Products."Product ID" GROUP BY Categories."Category ID", {fn MONTH(Orders."Order Date")}
Figure 8.2 shows the SQL window of MS Query displaying the preceding SQL statement and 13 of the last records of the query result table created by executing the statement.
Figure 8.2. An ANSI/ODBC SQL statement that summarizes category sales by month.
You can use MS Query's conventional SQL processing method, in which you write the SQL statement in MS Query's SQL window or choose File | Execute SQL from MS Query to send an SQL statement directly to the ODBC driver using SQL pass-through. Figure 8.3 shows MS Query's Execute SQL dialog box, ready to execute the SQL statement in Figure 8.2.
Figure 8.3. MS Query's Execute SQL window, which is used to implement SQL pass-through.
The rows that return when you specify monthly data require even more rearrangement than the quarterly data returned from the Access query.
NOTE
MS Query can't represent the SQL statement graphically and therefore will give you an error message. The query will be successful regardless.
Access SQL's crosstab query feature solves the data cell rearrangement problem. Two Access SQL keywords, TRANSFORM and PIVOT, handle the denormalization of the data. The elements of an Access SQL crosstab query are as follows:
Figure 8.4 shows the initial syntax of the Access SQL statement that creates the crosstab query return set shown in Figure 8.5. The statement in the SQL window of Figure 8.4 consists of the SQL statement illustrated in Figure 8.1, plus the necessary modifications to make the statement conform to the rules of crosstab queries given in the preceding list.
Figure 8.4. Executing a crosstab query with Microsoft Access.
NOTE
Figure 8.4 is a composite. Access 95 won't show both a query SQL statement and the result of the query at the same time. You can toggle between the two views using Access 95's View menu when a query is open.
Figure 8.5. The crosstab query result set from the SQL statement in Figure 8.4.
The CROSSTAB 95.MDB database includes a QueryDef object, qryCategoryByMonth (Crosstab), that creates the query result set shown in Figure 8.5. Select the query tab in the Database window in Access. When a query is open, the query toolbar is displayed. The first three buttons let you change from the Design, SQL, and Datasheet view modes. You can then display the SQL statement. After that, click the Datasheet view toolbar button to execute the query. When you execute the crosstab query shown in Figure 8.4, the Access database engine's query optimizer modifies the FROM and INNER JOIN statements as follows:
TRANSFORM Format(Sum([Order Details].[Unit Price] * [Order Details].[Quantity]), "$#,##0") AS Sales SELECT Categories.[Category Name] FROM (Categories INNER JOIN Products ON Categories.[Category ID] = Products.[Category ID]) INNER JOIN (Orders INNER JOIN [Order Details] ON Orders.[Order ID] = [Order Details].[Order ID]) ON Products.[Product ID] = [Order Details].[Product ID] WHERE ((Orders.[Order Date] Between #01/1/91# And #12/31/91#)) GROUP BY Categories.[Category Name] ORDER BY Categories.[Category Name] PIVOT Format([Order Date],"yyyy\Qq");
The preceding syntax example, which substitutes nested INNER JOIN syntax for individual INNER JOIN statements, more closely follows the SQL-92 syntax for JOIN statements. Each of the tables specified in the original FROM clause appears in the INNER JOIN statements, so it's not necessary to provide a separate table_names list. The first INNER JOIN of the FROM statement uses conventional INNER JOIN syntax. The succeeding nested INNER JOIN causes a join to be created between the result of the first INNER JOIN and the criterion of the last ON statement. The second and third INNER JOINs use the second ON criterion. Nested INNER JOIN statements are quite difficult to interpret correctly and are even more difficult to write.
A simpler approach to specifying INNER JOINs uses a compound WHERE clause. The following WHERE clause syntax achieves the same result as the INNER JOIN syntax of the preceding example:
TRANSFORM Format(Sum([Order Details].[Unit Price] * [Order Details].[Quantity]), "$#,##0") AS Sales SELECT Categories.[Category Name] FROM Categories, [Order Details], Orders, Products WHERE Products.[Category ID] = Categories.[Category ID] AND [Order Details].[Order ID] = Orders.[Order ID] AND [Order Details].[Product ID] = Products.[Product ID] AND Orders.[Order Date] Between #01/1/91# And #12/31/91# GROUP BY Categories.[Category Name] ORDER BY Categories.[Category Name] PIVOT Format([Order Date],"yyyy\Qq");
TIP
If you use Access to create and test the text for SQL statements to generate crosstab queries, you'll find that Visual C++ executes Access 1.x crosstab query syntax without generating an error. However, your code will be simpler and more readable if you use the WHERE clause to create equi-joins.
If you change the PIVOT statement in the preceding example from PIVOT Format([Order Date],"yyyy\Qq") to PIVOT Format([Order Date],"mmm yy") to create a monthly rather than a quarterly crosstab query, the column titles appear in the strange sequence shown in Figure 8.6. The PIVOT statement automatically orders the columns alphabetically.
Figure 8.6. Monthly column headings sorted alphabetically.
You could solve the column sequence problem by changing the PIVOT statement to PIVOT Format([Order Date],"mm/yy") to provide column titles 01/95 through 12/96, but many people prefer three-letter abbreviations for months. Access SQL lets you use the IN() predicate to create fixed column names that appear in any sequence you specify. The following example demonstrates a PIVOT...IN statement that creates monthly column titles for any year:
PIVOT Format([Order Date],"mmm") IN ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec");
CAUTION
You can't use the IN() predicate with SQL statements that use the full INNER JOIN syntax of Access, as shown in Figure 8.4. If you attempt to execute such a query, you receive the error message Can't repeat field_name in FROM statement.
The preceding PIVOT statement gives the more satisfactory result shown in Figure 8.7.
Figure 8.7. The query from Figure 8.6 with fixed column heads.
The only restriction on the values of the IN() predicate's arguments is that, except for case, each value must exactly match the values returned by the formatted PIVOT statement. If you misspell an argument value, you will find that the data rows of the column with the mistaken header are empty.
NOTE
The Access 1.x documentation states that crosstab queries with fixed column heads execute faster than queries without the IN() predicate. In practice, the difference in execution speed is difficult to measure. Using the IN() predicate doesn't result in a demonstrable improvement in the performance of the preceding query when executed against the CROSSTAB 95.MDB or NorthWind.MDB database.
With a few simple changes and additions to the preceding query, you can create a crosstab query in classification format to compare sales of product categories by employee during a specific period of time. The following SQL statement adds the Employees table to the query and displays the last name of each employee of Northwind Traders as column headings, with sales by category for 1991 as the data cell values:
TRANSFORM Format(Sum([Order Details].[Unit Price] * [Order Details].[Quantity]), "$#,##0") AS Sales SELECT Categories.[Category Name] FROM Categories, [Order Details], Orders, Products, Employees WHERE Products.[Category ID] = Categories.[Category ID] AND [Order Details].[Order ID] = Orders.[Order ID] AND [Order Details].[Product ID] = Products.[Product ID] AND Orders.[Employee ID] = Employees.[Employee ID] AND Orders.[Order Date] Between #01/1/91# And #12/31/91# GROUP BY Categories.[Category Name] ORDER BY Categories.[Category Name] PIVOT Employees.[Last Name];
Figure 8.8 shows the crosstab query result set returned by the preceding SQL statement that is executed in Microsoft Access.
Figure 8.8. A classification crosstab query executed in Access.
Action queries provide a method of creating or modifying persistent database objects without the necessity of writing low-level Visual C++ code to manipulate table record pointers. Action queries are especially effective when you want to add, delete, or modify many records in a database with a single operation. (An operation that makes changes to multiple sets of records is called a bulk update.) The following list briefly describes the four types of action queries:
The following sections give examples of the use of each of the four types of action queries that Access SQL supports. All of the following examples use the CROSSTAB 95.MDB database opened in Microsoft Access.
To avoid modifying the existing tables in the CROSSTAB 95.MDB database, the first action query example creates a new table from the Orders table, tblOrders, that includes all orders that have order dates earlier than July 1, 1991. To test make-table action query syntax, enter the following in Access's Query SQL Statement window:
SELECT *, [Order Date] INTO tblOrders FROM Orders WHERE [Order Date]<#07/1/91#
You need to include the [Order Date] entry in the SELECT list because Access SQL doesn't let you specify a WHERE criterion on a field that isn't explicitly declared in the SELECT list.
NOTE
The NewOrders table doesn't appear in the Tables tab of Microsoft Access until you execute the query after saving it. After the query is executed (Access will warn you that a new table will be created), you can view the new table.
Use a make-table query to create a backup of a table before you test other SQL action statements on the table. If your append, update, or delete query gives an unexpected result, you can start fresh by deleting all the records in the original table and appending all the records from the backup table.
You can add an IN clause to your make-table query to create a table in another database. Here's the full syntax of an SQL make-table clause:
SELECT [ALL|DISTINCT|DISTINCTROW] select_list INTO dest_table [IN database_name[ connect_string]] FROM source_table WHERE criteria
Thus, you can create a tblOrders table in the C:\ACCESS\SAMPAPPS\NorthWind.MDB database (if you have the Northwind Traders database installed) with the following statement:
SELECT *, [Order Date] INTO tblOrders IN "c:\access\nwind.mdb" FROM Orders WHERE [Order Date]<#07/1/91#
You can append the balance of the records for the year 1991 to the tblOrders table with the following append query statement:
INSERT INTO tblOrders ([Order ID], [Customer ID], [Employee ID], [Order Date], [Shipped Date], [Order Amount]) SELECT [Order ID], [Customer ID], [Employee ID], [Order Date], [Shipped Date], [Order Amount] FROM Orders WHERE [Order Date] Between #07/1/91# And #12/31/91#
You need to explicitly declare the field list of both the destination table (tblOrders) and the source table (Orders) if you use a WHERE criterion. The field list of the destination table must be enclosed in parentheses. If you attempt to use the asterisk (*) to add all fields and then add the [Order Date] field to the SELECT statement, as in the make-table example, you receive a syntax error message.
To change values in data fields, you use the UPDATE predicate. The SET clause specifies the expression used to update one or more fields. The following update action query SQL statement reduces the dollar amount of each order received in April 1991 to 90 percent of its current value:
UPDATE tblOrders SET [Order Amount] = 0.9*[Order Amount] WHERE [Order Date] BETWEEN #4/1/91# AND #4/30/91#;
If you want to update the values of more than one field, add the field name, the equals sign (=), and an expression separated from the preceding SET expression by a comma (,).
The simplest and most potentially destructive of the action queries is the delete query. If you execute a delete query and forget to add a WHERE criterion, all the records in your table can disappear in an instant. To delete the records for the last half of 1991 from the tblOrders table, use the following statement:
DELETE FROM tblOrders WHERE [Order Date] BETWEEN #7/1/91# AND #12/31/91#;
You now can safely verify that the DELETE FROM tblOrders statement without the WHERE clause does indeed delete all the records in the table.
This chapter demonstrated how to use the Access SQL TRANSFORM and PIVOT keywords to create crosstab queries and showed the syntax necessary to implement action queries with SQL statements. Crosstab queries are one of the most commonly used forms of SELECT queries in Visual C++ decision-support applications. The next chapter makes more extensive use of crosstab queries. Using action queries to create and update tables often can save you from writing a substantial amount of Visual C++ code.
This chapter ends Part II of this book. At this point, you have the background you need in database design and the use of both ANSI and Access SQL to develop meaningful database applications. Part III shows you how to design applications to display and print the information that you extract from databases with SQL SELECT queries.