Teach Yourself Database Programming
with Visual C++ 6 in 21 days


Day 8
      Utilizing the Capabilities of Database Servers



Relational database servers include tools that provide incredible power for processing and presenting information. These tools can be highly valuable and productive for you in writing database applications. However, the value-or even the existence-of these tools is often not readily apparent from database documentation. Today you will discover these relational database server power tools.

Today you will

NOTE
Today's work deals specifically with relational database servers, such as SQL Server and Oracle. Some of the functionality described today is not available in the Access/Jet engine. The tools that are not available in Access/Jet are noted for you. If you have use of a relational database server, you will be able to try all the database tools described today. If not, you will have to learn some of these tools without being able to try them yourself.

Database Transactions

As you learned yesterday, when you normalize a database, you create a large number of specialized tables. One side effect of normalization is that operations on the database typically involve several tables. For instance, in the sample application, when a customer places an order, you must add records to two tables in the database: the NewOrders table and the ProductsPurchased table.

When you add records to two different tables, you need to be assured that the two tables remain in sync before, during, and after that operation. You don't want a database failure of some kind to cause a record to be added to one table without the corresponding record also being added to the other table. If that happened, the usefulness of your database would be reduced because some of its information would not be reliable.

A transaction in a relational database is a series of operations that must happen together. A transaction is a single unit of work that must be done completely or not at all.

In database parlance, a transaction must have atomicity, consistency, isolation, and durability. These are called the ACID properties of database transactions:

Relational databases use a variety of mechanisms to ensure the ACID properties of transactions. These mechanisms include locking (which you learned a little about in Day 5, "Adding, Modifying, and Deleting Data"), referential integrity constraints (which you learned about in Day 7, "Database Design"), SQL transaction statements (which you will learn next), and triggers (which you will learn later today).

SQL Transaction Statements

The syntax for SQL transaction statements varies somewhat among relational databases. The basic idea is that you initiate a transaction, execute a series of SQL statements, and then either commit or rollback the transaction.

In Microsoft SQL Server, the code to execute SQL statements inside a transaction could look something like Listing 8.1.

NOTE
The code in Listings 8.1 and 8.2 isn't compatible with the Access/Jet engine. You cannot run this code from a Query window in Visual Studio that is con-nected to an Access/Jet database.


Listing 8.1.  Transaction Code in Microsoft SQL Server

 1:  BEGIN TRANSACTION
 2:    INSERT INTO neworders(ordernumber, customernumber)
 3:      VALUES (5, 3)
 4:    INSERT INTO productspurchased(ordernumber, partnumber)
 5:      VALUES (5, 'CLAP-003')
 6:    INSERT INTO productspurchased(ordernumber, partnumber)
 7:      VALUES (5, 'MAG-292')
 8:  COMMIT TRANSACTION

Line 1 in Listing 8.1 initiates the transaction. Lines 2-7 perform a series of INSERT statements to record an order in the database. If some kind of a database failure occurred while, say, line 6 was being executed, the records inserted in lines 2-5 would be automatically removed (rolled back) by the database. Therefore, the process of adding these three records is an all-or-nothing proposition.

All three inserts in Listing 8.1 will occur in the database, or none of them will. If the execution goes as expected, line 8 commits the transaction. When line 8 executes, you can be assured that all three records have been written to the database and will be persistent despite any database failures.

In Oracle, the code to do the same transaction would be somewhat different, as in Listing 8.2.


Listing 8.2.  Transaction Code in Oracle

 1:  INSERT INTO neworders(ordernumber, customernumber)
 2:    VALUES (5, 3)
 3:  INSERT INTO productspurchased(ordernumber, partnumber)
 4:    VALUES (5, 'CLAP-003')
 5:  INSERT INTO productspurchased(ordernumber, partnumber)
 6:    VALUES (5, 'MAG-292')
 7:  COMMIT

In Oracle, the transaction begins implicitly with the first INSERT statement. Therefore, line 1 in Listing 8.2 begins the transaction. If the commit in line 7 isn't performed, all the changes in lines 1-6 are rolled back.

Like Oracle, SQL Server can be made to begin transactions by implicitly using the SET IMPLICIT_TRANSACTIONS ON statement. When this option is turned on, and if there are no outstanding transactions already, every SQL statement automatically starts a transaction.

You can see that the syntax can be different between SQL Server and Oracle. However, the principles of database transactions are the same in both databases.

Code like that shown in Listings 8.1 and 8.2 ensures the transactions' atomicity and consistency. The transactions' isolation is controlled through locking and the isolation level of the transaction.

A key function of a relational database server is its capability to ensure that multiple users can read consistent sets of records and make modifications without inadvertently overwriting each other's changes. The isolation level tells the database how zealous to be in protecting a user's work from interaction with the work of other concurrent users.

Oracle and SQL Server perform the task of isolating concurrent users by using very different locking and isolation strategies. Consult your database server documentation for information on transaction isolation levels.

The basic thing to understand is that higher degrees of transaction isolation typically result in more locks being placed on the data for longer periods of time. This can cause the database performance to bog down.

To enable the database to provide transaction isolation without too much burdensome locking, it's necessary to keep transactions as brief as possible. If possible, a transaction should not span multiple round trips to the server. Transactions also should not remain open during the wait for user input.

Setting the appropriate level of isolation for the transactions and keeping transactions as brief as possible enable the database to provide isolation between multiple transactions that execute concurrently.

A distributed transaction is a transaction that involves making changes to the data in more than one database server.

Distributed transactions enable transactions to cover changes to data in two or more networked database servers. SQL Server and Oracle both support distributed transactions. The implementation of distributed transactions varies between database vendors, so consult your database documentation for details on distributed transactions.

TIP
You need to use transactions only for those operations that modify (INSERT, UPDATE, or DELETE) data in the database. You should not use transactions when performing SELECT queries.

Transactions are a vital tool for effective database applications. Transactions ensure that complex operations on the database are performed reliably. As you can see from Listings 8.1 and 8.2, transactions don't require you to write much extra code. The database server automatically handles the commit or rollback of transactions.

Triggers to Execute SQL Code Automatically

Triggers are SQL statements that are executed, or triggered, when certain operations are performed in the database. Triggers provide the capability to easily perform the following:

NOTE
The code in Listing 8.3 isn't compatible with the Access/Jet engine. You can't run the code from a Query window in Visual Studio that is connected to an Access/Jet database.


Listing 8.3.  Code to Create a Trigger in Microsoft SQL Server

 1:  CREATE TRIGGER SaveDelOrders
 2:  ON neworders
 3:  FOR DELETE
 4:  AS
 5:  INSERT DeletedOrders
 6:    SELECT * FROM deleted

Line 1 in Listing 8.3 creates a trigger named SaveDelOrders. Line 2 indicates that this trigger is attached to the NewOrders table. Line 3 specifies that this trigger should fire whenever record(s) are deleted from the NewOrders table. Lines 5 and 6 insert the deleted record(s) into the DeletedOrders table.

You will notice the FROM deleted statement in line 6. SQL Server triggers can use two tables called inserted and deleted, which are temporary holding tables for records being inserted, deleted, or updated (an update causes both a record to be deleted and a new record to be inserted).

The inserted and deleted tables can be accessed only by code inside triggers. Oracle has temporary holding tables similar to inserted and deleted; these are called new and old.

When a record is deleted from the NewOrders table, it is removed from the NewOrders table and placed in the deleted table. A trigger attached to the NewOrders table, such as the one in Listing 8.3, can access that record while it's in the deleted table.

The trigger in Listing 8.3 reads the record from deleted and inserts it into another table. This trigger provides a backup copy of all records deleted from the NewOrders table.

With SQL Server, you can also create triggers that fire when records are inserted into a table and when records are updated in a table. Line 3 of Listing 8.3 indicates that it's a delete trigger. For an insert trigger, line 3 would say

FOR INSERT

For an update trigger, line 3 would say

FOR UPDATE

For a trigger that fires on insert, update, and delete operations, line 3 would say

FOR INSERT, UPDATE, DELETE

Oracle uses a slightly different syntax for triggers. Oracle triggers also offer some capabilities not found in SQL Server triggers. Consult the documentation of the relational database server you are using for details on its implementation of triggers.

SQL Aggregate Functions

You've learned that relational databases have the capability to process thousands, or even millions, of records at the server and then send only the relevant resultset to the client application. You saw some examples of server-side processing in Day 6, "Harnessing the Power of Relational Database Servers."

Today you will learn more about the server-based processing power of relational databases. You will explore server code that can summarize mountains of raw data and return small polished gems of information to client applications. This power comes in the form of SQL aggregate functions.

Aggregate functions process multiple records and return a single value. They are also called set functions because they operate on sets of records. They calculate summary values from a particular field in a set of records. For each set of records they process, they return a single value.

At times, the information you want to retrieve from a database table isn't stored in an individual record but rather in a set of records. For instance, you might want to count the number of records that meet a certain criteria or to know the maximum or minimum value of the data in a particular field in a set of records. You might need to calculate the average or sum of a field in a set. Cases like these are where aggregate functions come in very handy.

The following define five aggregate functions and give an example of each:

SELECT COUNT(*) FROM productspurchased
WHERE ordernumber = 4
SELECT MAX(quantity) FROM productspurchased
WHERE ordernumber = 4
SELECT MIN(quantity) FROM productspurchased
WHERE ordernumber = 4
SELECT SUM(price) FROM productspurchased
WHERE ordernumber = 4
SELECT AVG(price) FROM productspurchased
WHERE ordernumber = 4

COUNT

The COUNT function returns the number of records that match a criterion. You can substitute COUNT(*) for the field list in a SELECT statement to see how many records it will return. Open a Query window and enter the query shown in Listing 8.4. As you can see, the COUNT function is not restricted to working with records from a single table.


Listing 8.4.  Using COUNT(*) to Count the Rows Returned by a SELECT Statement

 1:  SELECT COUNT(*)
 2:  FROM customers, neworders, productspurchased, products
 3:  WHERE custlastname = 'clinton' AND
 4:  customers.custnumber = neworders.customernumber AND
 5:  neworders.ordernumber = productspurchased.ordernumber AND
 6:  productspurchased.partnumber = products.partnumber

Listing 8.4 performs a join of four tables. Selecting COUNT(*) returns the number of rows that are selected.

If you use COUNT(fieldname) instead of COUNT(*), the function will not count records that have a NULL value in the specified field. You could use this to count how many records have non-NULL entries in a field. You can also use COUNT(DISTINCT fieldname) to determine how many distinct values exist in a field.

MAX, MIN, SUM, and AVG

These functions take a single field name or a numeric expression as a parameter. For example, Listing 8.5 will return the sum of the Price and ShippingAndHandling fields for all records that have an OrderNumber of 4.


Listing 8.5.  Using the SUM Function to Calculate the Total Price of an Order

 1:  SELECT SUM(Price + shippingandhandling) FROM ProductsPurchased
 2:  WHERE ordernumber = 4

The MIN, MAX, and AVG functions work similarly. You will notice that you didn't need to write a loop that reads every record into a client application and then adds the appropriate fields to a variable. The programming model used by the aggregate functions makes the code for performing calculations on sets of records very straightforward.

The aggregate functions can be helpful to you not only because they reduce your code complexity but also because they execute at the server. The database server can process all the records in the database and then return only a single value to the client application that requested the information.

This means that a LAN would not be burdened by huge amounts of network traffic between client and server. It also means that the client application could be a browser accessing the database over the Internet. If you were not using a relational database server, which is capable of processing data at the server, there is no way that a thin client (such as a browser) could effectively access your database over a low-bandwidth, high-latency connection such as the Internet.

Aggregate Functions and the GROUP BY Clause

Aggregate functions enable you to obtain sums, averages, and so on, from the database. However, at times you need to obtain sets of sums or sets of averages. For instance, you might need to calculate the total sales revenue for each month during the past year or to discover the average number of products purchased each week for the past six months. These are the cases where you need to use the aggregate functions with the GROUP BY clause.

To try the GROUP BY clause, you first need to select a set of records. Try the query shown in Listing 8.6.


Listing 8.6.  Using the SELECT Statement to Retrieve All Products Purchased, the Order Number, and the Date

 1:  SELECT neworders.ordernumber, neworders.orderdate,
 2:    productspurchased.*
 3:  FROM neworders, productspurchased
 4:  WHERE neworders.ordernumber =
 5:    productspurchased.ordernumber

The query shown in Listing 8.6 will return all the products that have been sold, the order number each was sold under, and the date of the order. The results are shown in Figure 8.1.

Figure 8.1 : All products purchased, the order number, and the date.

It would be interesting to see the total price of each order. This means that you need to add up the Price and the ShippingAndHandling fields for every record on each order. You saw in Listing 8.5 how to obtain the total of a single order. How do you find the total of every order? Listing 8.7 shows you how.


Listing 8.7.  Using the SUM Function with the GROUP BY Clause to Calculate the Totals of Each Order

 1:  SELECT neworders.ordernumber,
 2:    SUM(price + shippingandhandling)
 3:  FROM neworders, productspurchased
 4:  WHERE neworders.ordernumber = productspurchased.ordernumber
 5:  GROUP BY neworders.ordernumber

Line 1 of Listing 8.7 selects the order number (so you have the context for the totals). Line 2 selects the sum of the Price and ShippingAndHandling fields. Lines 3 and 4 specify the tables and the join. (Actually, a join with the NewOrders table isn't necessary in this query because all the fields you need are in the ProductsPurchased table. Having the NewOrders table in this query doesn't hurt anything, so just leave it in because the next query will require the OrderDate field from NewOrders.)

Line 5 specifies that you want the sum of the Price and ShippingAndHandling fields grouped by OrderNumber. Figure 8.2 shows the output of this query.

Figure 8.2 : The total price of each order.

If you look at Figure 8.1 and add up the numbers yourself, you will see that the totals in Figure 8.2 are correct. With the small amount of data in this example, you could probably perform these calculations inside the client application (or even in your head). However, imagine the amount of work required to process thousands of orders or tens of thousands of product sales. If you retrieved all those records and did the calculations at the client, the application would bog down. The GROUP BY clause lets you perform all the work at the server and then return only the small resultset to the client. Is that cool, or what?

It would be interesting to see the total sales volume per week. To do this, you must group by the order date. More specifically, you must group by the week of the order date. Even more specifically, you must group by the year and the week, in case you have orders that span more than one year. The GROUP BY clause lets you group by multiple fields and multiple expressions. Listing 8.8 shows how to retrieve the total sales volume per week.


Listing 8.8.  Using the SUM Function with the GROUP BY Clause to Show the Total Sales Volume per Week

 1:  SELECT DATEPART ('yyyy', neworders.orderdate) AS Year,
 2:    DATEPART ('ww', neworders.orderdate) AS Week,
 3:    SUM(price + shippingandhandling) AS Total
 4:  FROM neworders, productspurchased
 5:  WHERE neworders.ordernumber = productspurchased.ordernumber
 6:  GROUP BY DATEPART ('yyyy', neworders.orderdate),
 7:    DATEPART ('ww', neworders.orderdate)

Line 1 of Listing 8.8 uses the DATEPART function to retrieve the year. The AS Year modifier on the end makes it so that the first field in the resultset has a name of Year. Specifying a name for a field that contains an expression is handy because it gives you an easy way to access the field in ADO with the GetCollect function. (You learned about the GetCollect function on Day 4, "Retrieving SQL Data Through a C++ API," in Listing 4.12.)

Line 2 uses the DATEPART function to retrieve the week of the year for the OrderDate. It names this field Week. Line 3 selects (and calculates) the sum of the Price and ShippingAndHandling fields in the group. This field is named Total. Lines 4 and 5 are identical to Listing 8.6. Lines 6 and 7 group the results of the SUM function by both the year and the week of the order date.

In Figure 8.3, you see the results of the query in Listing 8.8. There are two records, one for the 45th week and one for the 47th week of the year. No other weeks of the year had any sales, so they don't appear in the resultset.

Figure 8.3 : The total sales volume per week.

The code in Listing 8.8 works with Access/Jet. However, the code to break down the date (the DATEPART function) and modifiers to specify the field names for expressions will vary slightly in SQL Server and Oracle. Check your database server documentation for details.

NOTE
In a client/server application, the Access/Jet database engine will process the GROUP BY query shown in Listing 8.8 at the client machine (rather than at the server machine). Running a similar query on a relational database server, such as Oracle or SQL Server, will cause all the records to be processed at the server machine and only the small resultset to be sent to the client. This is one of the fundamental differences between relational database servers and databases, such as Access/Jet, that use ISAM files.

SQL Server provides the CUBE and ROLLUP operators as optional switches in the GROUP BY clause. These operators produce super-aggregate rows, where the rows generated by the GROUP BY clause are aggregated. The CUBE and ROLLUP operators are typically used in data warehouse applications. Oracle also provides extensions to the GROUP BY clause that are useful in data warehouse applications. See your database server documentation for more information on extensions to the GROUP BY clause.

As you can see, aggregate functions enable relational database servers to process and summarize vast quantities of information and make it available to thin clients across thin network connections. The ability to place intelligence at every tier in an application, and to use each tier where its strengths are, enables you to build highly advanced client/server and multitier systems.

SQL Views

A view is a virtual table whose contents are defined by a query (a SELECT statement).

A view looks like a normal table. However, a view doesn't actually exist as a table in the database. Instead, the view is materialized whenever it is opened. A view can consist of fields and records from several tables.

A view's base query is the SELECT statement that defines the view.

A view's base tables are the tables from which the view gets its data.

The SELECT statement that defines the view (its base query) is stored in the database. The view doesn't store any data. In other words, no records are stored in the view. The records that appear in a view are actually stored in the view's base tables. Every time a view is opened, the database reads the records into the view from its base tables by executing the view's base query.

Views enable users to work with the particular data that interests them. For instance, in the sample application, the company manager might want to be able to see the weekly sales totals on demand. Listing 8.9 shows the code to create a view that the manager could use to find that information.


Listing 8.9.  Code to Create a View That Shows the Total Sales Volume per Week

 1:  CREATE VIEW WeeklySales AS
 2:  SELECT DATEPART ('yyyy', neworders.orderdate) AS Year,
 3:    DATEPART ('ww', neworders.orderdate) AS Week, 
 4:    SUM(price + shippingandhandling) AS Total
 5:  FROM neworders, productspurchased
 6:  WHERE neworders.ordernumber = productspurchased.ordernumber
 7:  GROUP BY DATEPART ('yyyy', neworders.orderdate),
 8:    DATEPART ('ww', neworders.orderdate)

Line 1 in Listing 8.9 issues the SQL CREATE VIEW statement to create a view called WeeklySales. The SELECT statement in lines 2-8 is the same as the one in Listing 8.8. This SELECT statement defines the WeeklySales view; it is the WeeklySales view's base query.

Execute the code in Listing 8.9 against the sample Access database. When you do, the view will be created in the database. It will appear as a view in the Data View of Visual Studio (you will need to right-click the data source and select the Refresh menu to see it the first time).

When you double-click the WeeklySales view in the Data View, it will open just like a table. In fact, you can even use WeeklySales like a table name in a query. Click the SQL button on the Query toolbar, and you will see that the SELECT statement queried the view as if it were a table. As you can see, a view looks and acts like a table.

Change the SELECT statement in the Query window so that it looks like the one shown in Listing 8.10. This will cause only the record that applies to week 45 to appear in the Query window.


Listing 8.10.  The SELECT Statement That Uses a View

 1:  SELECT *
 2:  FROM WeeklySales
 3:  WHERE week = 45

When making a query against a view like this, two queries are actually executed. First, the base query for the view is executed to populate the view with records. Second, the query you are making against the view (as in Listing 8.10) is executed.

CAUTION
When you run a SELECT statement against a view, it might execute more slowly than you expect. Before anything else happens, the database has to populate the view with records by executing the view's base query. Then, after the view is completely populated, the database can execute the SELECT statement you are running against it. This process of populating the view before running your SELECT statement, combined with your inability to build indexes on views, can make SELECT statements that run against views take longer than you might expect.

Views can be updateable or read-only. If the view is updateable, users can insert, update, and delete records in it just as they can from a table. Bear in mind that when users insert, update, and delete records in a view, they are actually inserting, updating, and deleting records in the view's base table(s).

For a view to be updateable, the particular field(s) and record(s) in the view must correspond directly to field(s) and record(s) in its base table(s). The database must be able to discern which field/record to update in the base table. In the view created in Listing 8.10, no direct correspondence exists between the records in the view and the records in the base tables. The records in the view in Listing 8.10 are aggregates of several records in the base tables. There is no way that the database can figure out how to map records in the view to records in the base tables, so the WeeklySales view is read-only.

NOTE
In Day 6, "Harnessing the Power of Relational Database Servers," you learned that Visual Studio treats stored queries in Access as views rather than as stored procedures. In Day 6 you also worked with two Access stored queries called CustomerWithMostRecentOrder and CustomersWithOrdersSinceDate. In Day 7, "Database Design," you normalized the database and dropped the Orders table. These two Access stored queries that you worked with on Day 6 will no longer run properly. They accessed the Orders table, which no longer exists in the database.

Views can provide a layer of abstraction for users of your database. Views enable users to concentrate on the data they are interested in without needing to know the details of the database schema underneath the view.

Summary

Today you learned some powerful tools that relational database servers have to offer. You learned about transactions, triggers, aggregate functions, and views.

These tools can enable you to build highly advanced database applications. Transactions enable reliable changes to the data. Triggers make the database react automatically to changes in the data. Aggregate functions cause the bulk of the data processing to happen at the server. Finally, views enable you to customize the way people see your database.

Q&A

Q
How do relational database servers ensure that transactions will be durable even when there is some kind of database failure?
A
Relational database servers typically log all the changes to the data. The database writes to the log file first and then commits the changes to the database. Every time the database is started, it checks the log for uncommitted changes and uncompleted transactions. At this point, the database will automatically complete the transactions or roll them back.
Q
If I have an insert trigger that inserts another record into the table, won't the insert trigger fire again and put the database into an endless loop?
A
Typically, relational database servers are written so that a trigger won't cause itself to fire again. Consult your relational database server documentation for clarification of this.
Q
Why can't I create an index on a view?
A
A view is a virtual table. No records are stored in it. It is populated with records every time it's opened. These records are released from the view when it's closed. The virtual-ness of the records in a view prevents you from being able to create a (nonvirtual) index on the records in the view.

Workshop

The Workshop quiz questions test your understanding of today's material. (The answers appear in Appendix F, "Answers.") The exercises encourage you to apply the information you learned today to real-life situations.

Quiz

  1. What are the ACID properties of a transaction?
  2. What is the isolation level of a transaction?
  3. How does the GROUP BY clause interact with the SQL aggregate functions?
  4. How many triggers can be attached to a table in Microsoft SQL Server?
  5. Does a view on a large table occupy much room in the database? Why or why not?

Exercises

  1. Modify the SELECT statement in Listing 8.7 so that the resultset is sorted by the total sales volume, from the highest volume to the least.
  2. Modify the SELECT statement in Listing 8.7 so that the query returns the average product price, from the highest to the least.

© Copyright, Sams Publishing. All rights reserved.