Day 8

Selecting Data with SQL

Today is your first lesson in Structured Query Language (SQL). SQL is a powerful manipulation language used by Visual Basic and the Microsoft Access Jet database engine as the primary method for accessing the data in your databases. SQL statements fall into two broad categories: data manipulation language statements (DML) and data definition language statements (DDL). The DDL statements enable you to define data tables, indexes, and database relationships. DML statements are used to select, sort, summarize, and calculate the information stored in the data tables.

Today, you will learn about the DML statements. When you complete this lesson, you will be able to use SQL statements to construct database queries that can be retrieved, and you will be able to reorder data in any format recognized by Visual Basic. Because SQL is used in almost all relational database systems (SQL Server, Oracle, Gupta, and so on), you will also be able to apply the knowledge you gain here in almost any other relational database environment you might encounter in the future.

In this lesson, you will learn how to use the SELECT_FROM statement to select data from one or more tables and present that information in a single table for update or review. You will also learn how to limit the data you select to only the records that meet your criteria using the WHERE clause. You'll learn how to easily reorder the data in tables using the ORDER BY clause. You will also learn how to create simple statements that automatically summarize and total the data using the GROUP BY_HAVING clause.

You will learn about typical SQL functions to manipulate numbers and strings. This lesson also covers advanced DML statements such as PARAMETERS, UNIONS, JOINS, and TRANSFORM_PIVOT.

Today, you will create actual SQL queries (and in some cases, store them for later use) using the Visual Basic Visdata program you learned about on Day 7.

What Is SQL?

Before jumping into specific SQL statements and their use, you should understand the definition of SQL and its uses and origins. SQL stands for Structured Query Language. It was developed in the 1970s at IBM as a way to provide computer users with a standardized method for selecting data from various database formats. The intent was to build a language that was not based on any existing programming language, but could be used within any programming language as a way to update and query information in databases.


NOTE: The word SQL should be pronounced ess-que-ell instead of sequel. The confusion about the pronunciation of the word stems from the database language's origin. The SQL language is a successor of a language called Sequel developed by IBM in the late 1960s. For this reason, many (especially those familiar with IBM's Sequel language) continued to pronounce the name of the new database language improperly.

SQL statements are just that--statements. Each statement can perform operations on one or more database objects (tables, columns, indexes, and so on). Most SQL statements return results in the form of a set of data records, commonly referred to as a view. SQL is not a particularly friendly language. Many programs that use SQL statements hide these statements behind point-and-click dialogs, query-by-example grids, and other user-friendly interfaces. Make no mistake, however, that if the data you are accessing is stored in a relational database, you are using SQL statements, whether you know it or not.

ANSI Standard SQL Versus Microsoft Jet SQL

SQL syntax is determined by a committee that is part of the American National Standards Institute (ANSI). The ANSI-SQL committee is made up of information systems professionals who take on the job of establishing and enforcing standards on the rapidly moving computer-programming industry. Although each computer-programming language and database interface has its own unique version of SQL, nearly everyone has agreed to adhere to the basic standards defined by the ANSI-SQL committee. The most widely used SQL standard is SQL-89. This standard was first promulgated in 1989. An updated set of standards (SQL-92) was developed three years later.

Within each set of SQL standards, there are three levels of compliance. A database product must meet Level I compliance in order to call itself an SQL-compatible product. Levels II and III are optional levels of compliance that products can also attain in order to increase interoperability among database systems.

The Microsoft Jet database engine that is used to process all Visual Basic SQL statements is ANSI SQL-89 Level I compliant. There are very slight differences between ANSI SQL-89 and Microsoft Jet SQL at Level II and Level III. We won't dwell on these differences here. Those who are interested in learning more about ANSI SQL standards and Microsoft Jet compliance can find additional documentation elsewhere. The lessons in this book focus strictly on the Microsoft Jet SQL syntax. Be assured that once you master the concepts covered here, you will be able to use the same skills in almost all SQL-based programming and query tools you encounter.

SQL Basics

Now it's time to start building SQL statements. If you haven't already done so, load the Visual Basic Visdata application you learned about on Day 7. Using Visdata, load the BOOKS5.MDB that is included in the \TYSDBVB5\SOURCE\DATA directory of the CD that ships with this book. You will use this database for most of today's lesson.


NOTE: This book shows reserved SQL words in uppercase letters (for example, SELECT). This is not required by Visual Basic, but it is a good programming habit.

The SELECT_FROM Statement

In this section, you will learn about the most commonly used SQL statement, the SELECT_FROM statement. The SELECT_FROM statement lets you pick records from one or more tables in a database. The results of a SELECT_FROM statement are returned as a view. This view is a subset of the source data. In Visual Basic, the view can be returned as a Recordset, Table, Dynaset, or Snapshot. Because today's lesson focuses on getting results you can display, views will be returned as Visual Basic Snapshot data objects.

In its simplest form, a SELECT_FROM statement contains two parts:


NOTE: Standard SQL syntax uses the word column to describe a field and row to describe a record. This book uses the term field interchangeably with column and record interchangeably with row.

A simple example of a valid SQL statement is

SELECT AUID FROM Authors

This SQL statement tells the Microsoft Jet database engine to return a data object that contains the AUID from the Authors table. Enter this SQL statement into the Visdata SQL window and click the Execute button to see the returned result set. Your screen should look similar to the one in Figure 8.1.

Figure 8.1. The result set from the first SELECT statement.


As you can see from the result set, the SELECT_FROM statement returns all the rows in the table. Whether the table contains 10 or 10,000 records, you can get a complete result set with just one SELECT_FROM statement. This is quite handy, but it can also be quite dangerous. If the result of your SELECT_FROM statement contains too many records, you can slow down the network, possibly run out of memory on your local workstation, and eventually lock up your PC. Later in this lesson, you will learn how to use the WHERE clause to limit the size of your view to only those records you need.

To return all the columns from a table, you can list each column in the SELECT statement. This works if you have only a few columns in the table. However, if you have several columns, it can become quite tedious. There is a shortcut. To automatically list all columns in the table in your result set, instead of typing column names, you can type the asterisk (*). The asterisk tells SQL to return all columns in the requested table. The SELECT statement to display all columns of the Author table would look like this:

SELECT * FROM Authors

Enter the preceding SELECT statement into the Visdata SQL window and review the results. Your screen should look like the one in Figure 8.2.

Figure 8.2. The results of the SELECT * query.


Notice that even though you listed no fields in your SELECT statement, all fields were returned in the result set. This is very useful when you want to display a data table but do not know the names of all the columns. As long as you know a valid table name, you can use the SELECT_FROM statement to display the entire table.

The order in which you list columns in the SELECT_FROM statement controls the order in which they are displayed in the result set. Figure 8.3 shows the results of the following SELECT_FROM statement:

SELECT Name, AUID FROM Authors


Figure 8.3. Using the SELECT_FROM statement to change column display order.


The ORDER BY Clause

When you use the SELECT_FROM statement, the records returned in the result set are returned in the order in which they were found in the underlying table. But what if you wanted to display the results of your SELECT_FROM statement in a specialized sorted order? You can use the ORDER BY clause to do just that.

Placing ASC or DESC after each field in the ORDER BY clause indicates the order in which you want to sort the column, ascending or descending. If no order is supplied, SQL assumes that you want the set sorted in ascending order.

The following SQL example shows how you can display the records in the Authors table in descending sorted order, by Author Name.

SELECT * FROM AUTHORS ORDER BY Name DESC

Enter this statement in the SQL window of Visdata and execute it. Compare your results to Figure 8.4.

You can enter more than one field in the ORDER BY clause. SQL will create a result set that reflects the aggregate sort of the ORDER BY clause. Using Visual Basic Visdata, enter and execute the following SELECT_FROM statement. Compare your results to those in Figure 8.5.

SELECT StateProv, City FROM Publishers ORDER BY StateProv DESC, City ASC

Notice in the example shown in Figure 8.5 that you have combined the ability to alter the row order of the data in the result set with the ability to alter the column order of the data in the result set. These are powerful tools. Now that you know how to use SQL to display complete, single-data tables, you can learn how to limit the result set to only those records you need.

Figure 8.4. The results of the descending ORDER BY clause.

Figure 8.5. Results of multiple-column ORDER BY clause.


The WHERE Clause

One of the most powerful aspects of the SELECT_FROM statement is its capability to control the content of the result set using the WHERE clause. There are two ways to use the WHERE clause to control the content of the result set:

Using WHERE to Limit the Result Set The WHERE clause enables you to perform logical comparisons on data in any column in the data table. In its simplest form, the WHERE clause consists of the following:

WHERE column = value

In this line, column represents the name of the column in the requested data table, and value represents a literal value such as NY or Smith. It is important to know that the WHERE clause is always preceded by a SELECT_FROM statement. Use Visdata to enter and execute the following SQL statement, and compare your results to those in Figure 8.6.

SELECT Name, StateProv FROM Publishers
   WHERE StateProv = `CA'


Figure 8.6. The results of a simple WHERE clause SQL query.



TIP: This book uses single quotation marks (`) around string literals within SQL statements. Visual Basic SQL accepts both single and double quotation marks within SQL. Because you will often be building SQL statements in Visual Basic code, using single quotation marks within SQL statements makes it easier to construct and maintain SQL statements as Visual Basic strings.

The previous SQL statement returns a subset of the data in the result set. That is, the resulting view does not contain all of the rows of the Publishers table. Only those rows that have columns meeting the WHERE clause criteria are returned in the result set.

You can link WHERE clauses using the AND and OR operators. Enter and execute the following SQL statement, and compare your results to Figure 8.7.

SELECT Name, StateProv, City FROM Publishers
   WHERE StateProv = `CA' AND City <> `Senoma'


Figure 8.7. The results of a complex WHERE clause.


You can use several AND and OR operators to link valid logical comparisons together to form a single WHERE clause. You can also use more than just =, <>, >, <, <=, and >= logical comparisons. Visual Basic SQL supports the use of BETWEEN_AND, IN, and LIKE comparisons. The following SQL statement illustrates the use of BETWEEN_AND in a WHERE clause. Check your results against those shown in Figure 8.8.

SELECT PubID, Name, StateProv, City FROM Publishers
   WHERE PubID BETWEEN 10 AND 15


Figure 8.8. Using BETWEEN_AND in a WHERE clause.


The result set contains only rows that have a PubID value between 10 and 15. Notice that the values listed in the BETWEEN_AND clause (10 and 15) are included in the result set.

You can also use SQL to return a result set that contains rows that match a set of noncontiguous data. For example, if you wanted a list of all the publishers in the states of New York, California, and Alaska, you could use the IN keyword followed by the desired values, separated by commas, within parenthesis, as part of the WHERE clause. Enter and execute the following SQL statement, and check your results against those shown in Figure 8.9.

SELECT PubID, Name, City, StateProv FROM Publishers
   WHERE StateProv IN (`NY','CA','RI')


Figure 8.9. Using the IN keyword in the WHERE clause.

You can also use the LIKE function to return all rows whose columns' contents are similar to the literals passed in the function. For example, to return all rows with a StateProv column that has the letter A in any position, you would use the following SQL SELECT_FROM statement (see Figure 8.10 for results):

SELECT PubID, Name, City, StateProv FROM Publishers
   WHERE StateProv LIKE(`*I*')


Figure 8.10. Using the LIKE function in a WHERE clause.

The LIKE function is a very powerful tool. It is covered in more depth in a later section of today's lesson, "SQL Aggregate Functions." Using WHERE to Link Two or More Tables in a Result Set You can use the WHERE clause to compare columns from different tables. In doing so, you can set up criteria that can link two or more tables in a single result set. The syntax for this form of the WHERE clause is

SELECT table1.columnA, table2.columnA FROM table1, table2
WHERE table1.columnA = table2.columnA

table1 and table2 are different data tables in the same database. columnA represents a single column in each of the tables. Use Visdata to enter and execute the following SQL statement. Compare your result set to the one in Figure 8.11.

SELECT Titles.Title, Publishers.Name
    FROM Publishers, Titles
   WHERE Publishers.PubID =Titles.PubID


Figure 8.11. Using the WHERE clause to link two tables in a single result set.


The preceding SQL statement creates a result set that displays the book title and publisher's name. This is accomplished using the WHERE clause to tell SQL to select only those rows where the PubID values in each table match up. Keep in mind that this is done without any programming code, special indexing, or sorting commands. SQL handles all those tasks for you. Also, there are a few new items in this SQL statement that bear further review.

This is the first SQL statement you have encountered today that lists columns from two different tables. When selecting columns from more than one table, it is good programming practice to precede the column name with the table name and join the two with the period (.). As long as the column name is unique among all columns in the tables from which you are selecting, SQL does not require you to use the table.column syntax. But it is a good habit to do so, especially when you are building SQL statements in Visual Basic code.

You should also notice that the WHERE clause comparison columns (Publishers.PubID and Titles.PubID) were not included in the SELECT portion of the statement. You do not have to include the column in the SELECT portion of the statement to use it in the WHERE portion of the statement, as long as the column already exists in the underlying table.

Combining tables using the WHERE clause always returns a nonupdateable result set. You cannot update the columns in a view created in this manner. If you want to link tables together and also be able to update the underlying tables for that view, you need to use the JOIN clause, which is covered later today.

You can combine the link-type and limit-type versions of the WHERE clause in a single SQL SELECT_FROM statement. Execute the following statement and compare your results to those in Figure 8.12.

SELECT Titles.PubID,Titles.Title,Publishers.Name
   FROM Titles, Publishers
   WHERE Titles.PubID = Publishers.PubID
      AND Publishers.PubID BETWEEN 5 AND 10


Figure 8.12. Combining link-type and limit-type WHERE clauses.

The preceding SQL statement selects only those records in which the PubID columns match and the PubID values are between 5 and 10.

You can use the WHERE clause to link more than two data tables. The linking column for table1 and table2 does not have to be the same column for table2 and table3. Execute the following statement and review your results against those in Figure 8.13.

SELECT Titles.PubID,Titles.Title,Publishers.Name,Authors.Name
   FROM Titles, Publishers,Authors
   WHERE Titles.PubID = Publishers.PubID
      AND Titles.AUID = Authors.AUID


Figure 8.13. Using the WHERE clause to link three tables.


In the previous example, the Publishers table and the Titles table are linked using the PubID column. The Titles table and the Authors table are linked using the AUID field. When the link is made, the selected columns are displayed in the result set.

You might have noticed that SQL assigns column names to the result sets. There are times when these assigned names can be misleading or incomplete. You can use the AS keyword to rename the columns in the result set. The following SQL statement is one example of using the AS keyword in the SELECT statement to rename the column headers of the result set. This renaming does not affect the original column names in the underlying tables. Execute the following SQL statement and compare your results to those in Figure 8.14.

SELECT Titles.PubID AS PubCode,    Titles.Title AS BookTitle,
   Publishers.Name AS PubName,
   Authors.Name AS AuthorName
   FROM Titles, Publishers,Authors
   WHERE Titles.PubID = Publishers.PubID
      AND Titles.AUID = Authors.AUID


Figure 8.14. Using the AS keyword to rename columns in the result set.

Now that you know how to use the SELECT_FROM statement to select the desired rows and columns from data tables, read about how to use SQL functions to calculate and manipulate data within your selected columns and rows.

SQL Aggregate Functions

The SQL standards define a core set of functions that are present in all SQL-compliant systems. These functions are known as aggregate functions. Aggregate functions are used to quickly return computed results of numeric data stored in a column. The SQL aggregate functions available through the Microsoft Access Jet database engine are

The following SQL statement illustrates all five of the SQL aggregate functions. Enter and execute this statement, and check your results against Figure 8.15.

SELECT COUNT(Units) AS UnitCount,
   AVG(Units) AS UnitAvg,
   SUM(Units) AS UnitSum,
   MIN(Units) AS UnitMin,
   MAX(Units) AS UnitMax
   FROM BookSales


Figure 8.15. Using SQL aggregate functions.


You can use the WHERE clause and aggregate functions in the same SELECT_FROM statement. The following statement shows how you can use the WHERE clause to limit rows included in the aggregate calculation. See Figure 8.16 for results. Compare these numbers to those in the view returned in the previous query (refer to Figure 8.15).

SELECT COUNT(Units) AS UnitCount,
   AVG(Units) AS UnitAvg,
   SUM(Units) AS UnitSum,
   MIN(Units) AS UnitMin,
   MAX(Units) AS UnitMax
   FROM BookSales
   WHERE Qtr = 1


Figure 8.16. Using the WHERE clause to limit the scope of aggregate functions.


Using Visual Basic Functions in a SELECT Statement

When you call the Microsoft Access Jet database engine from within a Visual Basic program, you can use any valid Visual Basic functions as part of the SQL statement. For example, if you want to create a result set with a column that holds only the first three characters of a field in the underlying table, you could use the Visual Basic Left$ function as part of your column list in the SELECT_FROM statement, in the following line (see Figure 8.17):

SELECT Left$(Name,3), Name
   FROM Authors


Figure 8.17. Using Visual Basic functions in an SQL statement.

You can also use Visual Basic syntax to combine several data table columns into a single column in the result set. Enter and execute the following example and compare your results to Figure 8.18.

SELECT Name, City+", "+StateProv+"  "+Zip AS ADDRESS
   FROM Publishers


Figure 8.18. Using Visual Basic syntax to combine columns.


You can also use Visual Basic functions as part of the WHERE clause in an SQL statement. The following example (Figure 8.19) returns only rows that have the letter a as the second character in the Name column.

SELECT Name FROM Publishers
   WHERE Mid$(Name,2,1)="a"


Figure 8.19. Using Visual Basic functions in an SQL WHERE clause.


Even though using familiar Visual Basic functions and syntax is very handy, it has its drawbacks. Chief among them is the fact that after you create an SQL statement that uses Visual Basic-specific portions, your code is no longer portable. If you ever move the SQL statements to another database engine (such as SQL Server), you must remove the Visual Basic-specific portions of the SQL statements and replace them with something else that will work with the database engine you are using. This will not be an issue if you plan to stick with the Microsoft Access Jet engine for all your database access.

Another possible drawback that you'll encounter if you use Visual Basic-specific syntax in your SQL statements is that of speed. Extensive use of Visual Basic-specific code within SQL statements results in a slight performance hit. The speed difference is minor, but it should be considered.

It is better to use as few Visual Basic-specific functions in your SQL statements as possible. You will not limit the portability of your code, and you will not suffer from unduly slow processing of the SQL statements.


NOTE: You can't use user-defined functions within your SQL statements when you use the Microsoft Access Jet database engine from within Visual Basic. You can only use the built-in SQL functions and the predefined Visual Basic functions.

More SQL DML Statements

Now that you know how to create basic SQL SELECT_FROM statements and you know how to use the built-in SQL functions, return to the basic SELECT_FROM statement and add a few more enhancements to your SQL tool kit.

The DISTINCT and DISTINCTROW Clauses

There are times when you select data from a table that has more than one occurrence of the rows you are trying to collect. For example, you want to get a list of all the customers that have at least one order on file in the Orders table. The problem is that some customers have several orders in the table. You don't want to see those names appear more than once in your result set. You can use the DISTINCT keyword to make sure that you do not get duplicates of the same customer in your result set.

Enter and execute the following statement. As a test, execute the same SQL statement without the DISTINCT clause and compare the result sets. Refer to Figure 8.20 as an example.

SELECT DISTINCT AUID FROM Titles
   ORDER BY AUID


Figure 8.20. Using the DISTINCT keyword to remove duplicates from a result set.


If you include more than one column in the SELECT list, all columns are used to evaluate the uniqueness of the row. Execute and compare the result sets of the following two SQL statements. Refer to Figure 8.21 as a guide.

SELECT DISTINCT Title

   FROM BookSales

SELECT DISTINCT Title, Units
   FROM BookSales


Figure 8.21. Using DISTINCT on multiple columns.


Notice that the first SQL statement returns a single record for each Title in the data table. The second SQL statement returns more records for each Title because there are distinct Units values for each Title.

There are also times when you want to collect data on all rows that are distinct in any of the fields. Instead of using the DISTINCT keyword and listing all the fields in the table, you can use the DISTINCTROW keyword. The following SQL statement (see Figure 8.22) uses DISTINCTROW to return the same records as the SQL statement in the previous example.

SELECT DISTINCTROW *
   FROM BookSales
   ORDER BY Title


Figure 8.22. Using DISTINCTROW in an SQL statement.


Both the DISTINCT and DISTINCTROW keywords enable you to limit the contents of the result set based on the uniqueness of one or more columns in the data table. In the next section, you'll learn how you can limit the contents of the result set to the records with the highest numeric values in selected columns.

The TOP n and TOP n PERCENT Clauses

You can use the TOP n or TOP n PERCENT SQL keywords to limit the number of records in your result set. Suppose you want to get a list of the five top-selling books in a data table. You can use the TOP n clause to get just that. TOP n returns the first n number of records. If you have two records of the same value, SQL returns both records. For the previous example, if the fifth and sixth records were both equal, the result set would contain six records, not just five.

When you use the TOP clause, you must also use the ORDER BY clause to make sure that your result set is sorted. If you do not use the ORDER BY clause, you receive an arbitrary set of records because SQL first executes the ORDER BY clause and then selects the TOP n records you requested. Without the ORDER BY clause, it is quite likely that you will not get the results you intended. If a WHERE clause is present, SQL performs the WHERE clause, the ORDER BY clause, and then the TOP n clause. As you can see, failure to use the ORDER BY clause most certainly returns garbage in your result set (see Figure 8.23).

SELECT TOP 5 * FROM BookSales
   ORDER BY Sales DESC


Figure 8.23. Using TOP n to limit the result set.


Notice that the preceding example uses the DESC keyword in the ORDER BY clause. Whether you use the DESC or ASC ORDER BY format, the result set still contains the first n records in the table (based on the sort). Also note that the result set contains more than five records, because several records have the same Sales value.

The TOP n PERCENT version returns not the top five records, but the top five percent of the records in the underlying data table. The results of the following SQL statement (see Figure 8.24) contain several more records than the result set shown previously.

SELECT TOP 5 PERCENT * FROM BookSales
   ORDER BY Sales


Figure 8.24. Using TOP n PERCENT to limit the result set.


The GROUP BY_HAVING Clause

One of the more powerful SQL clauses is the GROUP BY_HAVING clause. This clause lets you use the SQL aggregate functions discussed earlier today to easily create result sets that contain a list of subtotals of the underlying data table. For example, you might want to be able to create a data set that contains a list of Titles and the total Units sold, by Title. The following SQL statement (see Figure 8.25) can do that:

SELECT Title, SUM(Units) AS UnitsSold
   FROM BookSales
   GROUP BY Title


Figure 8.25. Using GROUP BY to create subtotals.

The GROUP BY clause requires that all numeric columns in the SELECT column list be a part of an SQL aggregate function (SUM, AVG, MIN, MAX, and COUNT). Also, you cannot use the * as part of the SELECT column list when you use the GROUP BY clause.

What if you wanted to get a list of all the book titles that sold more than 100 units for the year? The first thought would be to use a WHERE clause:

SELECT Titles, SUM(Units) AS UnitsSold
   WHERE Sum(Units) > 100
   GROUP BY Units

However, if you try to run this SQL statement, you discover that SQL does not allow aggregate functions within the WHERE clause. You really want to use a WHERE clause after the aggregate function has created a resulting column. In plain English, the query needs to perform the following steps:

Luckily, you don't have to actually write all this in a series of SQL statements. You can get the same results by adding the HAVING keyword to the GROUP BY clause. The HAVING clause acts the same as the WHERE clause, except that the HAVING clause acts upon the resulting columns created by the GROUP BY clause, not the underlying columns. The following SQL statement (see Figure 8.26) returns only the Titles that have sold more than 100 units in the last year:

SELECT Title, SUM(Units) AS UnitsSold
   FROM BookSales
   GROUP BY Title HAVING SUM(Units)>100


Figure 8.26. Using the HAVING clause with GROUP BY.

The columns used in the HAVING clause do not have to be the same columns listed in the SELECT clause. The contents of the HAVING clause follow the same rules as those for the contents of the WHERE clause. You can use logical operators AND, OR, and NOT, and you can include VB-specific functions as part of the HAVING clause. The following SQL statement (see Figure 8.27) returns sales in dollars for all titles that have more than 100 units sold and whose titles have the letter a as the second letter in the title:

SELECT Title, SUM(Sales) AS SalesAmt
   FROM BookSales
   GROUP BY Title
   HAVING SUM(Units)>100 AND Mid$(Title,2,1)="a"


Figure 8.27. Using a complex HAVING clause.


SQL JOINs

The JOIN clause is a very powerful optional SQL clause. Remember when you learned how to link two tables together using WHERE table1.column1 = table2.column1? The only problem with using the WHERE clause is that the result set is not updateable. What if you need to create an updateable result set that contains columns from more than one table? You use JOIN.

There are three types of JOIN clauses in Microsoft Access Jet SQL:

The following sections describe each form of JOIN and how it is used in your programs. The INNER JOIN The INNER JOIN can be used to create a result set that contains only those records that have an exact match in both tables. Enter and execute the following SQL statement (see Figure 8.28):

SELECT PublisherComments.Comments,
   Publishers.Name, Publishers.StateProv
   FROM PublisherComments INNER JOIN Publishers
   ON PublisherComments.PubID = Publishers.PubID


Figure 8.28. Using the INNER JOIN SQL clause.


NOTE: PublisherComments was used as the name for the table used in the preceding example. When creating a Microsoft Access database, we could easily have named the table "Publisher Comments" (note the space), in which case, we would have had to enclose the table name in brackets in the preceding query, like so:

[Publisher Comments]

This is a good time to point out that it is a bad idea to use embedded spaces as table names. Not only do you need to include brackets around the name in a query, but also the Wizard available to up-size Access data files to Microsoft SQL Server does not work successfully on tables with spaces embedded in their names.


The preceding SQL statement returns all the records from the Publisher table that have a PubID that matches a PubID in the [Publisher Comments] table. This type of JOIN returns all the records that reside within both tables--thus, an INNER JOIN.

This is handy if you have two tables that you know are not perfectly matched against a single column and you want to create a result set that contains only those rows that match on both sides. The INNER JOIN also works well when you have a parent table (such as a CustomerTable) and a child table (such as a ShipAddressTable) with a one-to-one relationship. Using an INNER JOIN, you can quickly create a list of all CustomerTable records that have a corresponding ShipAddressTable record on file.

INNER JOINs work best when you create a JOIN on a column that is unique in both tables. If you use a table that has more than one occurrence of the JOIN column, you'll get a row for each occurrence in the result set. This might be undesirable. The following example illustrates the point (see Figure 8.29):

SELECT Titles.Title,BookSales.Units
   FROM Titles INNER JOIN BookSales
   ON Titles.Title = BookSales.Title

In the previous example, the table BookSales has numerous entries for each title (one for each quarter recorded), so the result of the INNER JOIN returns each Title multiple times. The LEFT JOIN The LEFT JOIN is one of the two outer joins in the SQL syntax. Although INNER JOIN returns only those rows that have corresponding values in both tables, the outer joins return all the records from one side of the join, whether or not there is a corresponding match on the other side of the join. The LEFT JOIN clause returns all the records from the first table on the list (the left-most table) and any records on the right side of the table that have a matching column value. Figure 8.30 shows the same SQL query that was shown in Figure 8.28.

SELECT Publishers.Name,PublisherComments.Comments
   FROM Publishers LEFT JOIN PublisherComments
   ON Publishers.PubID = PublisherComments.PubID


Figure 8.29. Using an INNER JOIN on a non-unique column.

Figure 8.30. Using the LEFT JOIN clause.


Notice that the result set has blank comments in several places. The LEFT JOIN is handy when you want a list of all the records in the master table and any records in the dependent table that are on file. The RIGHT JOIN The RIGHT JOIN works the same as the LEFT JOIN except that the result set is based on the second (right-hand) table in the JOIN statement. You can use the RIGHT JOIN in the same manner you would use the LEFT JOIN.

UNION Queries

Another powerful SQL clause is the UNION clause. This SQL keyword lets you create a union between two tables or SQL queries that contain similar, but unrelated, data. A UNION query is handy when you want to collate information from two queries into a single result set. Because UNION queries return nonupdateable result sets, they are good for producing on-screen displays, reports, and base data for generating graphs and charts.

For example, if you have a customer table and a vendor table, you might want to get a list of all vendors and customers who live in the state of Ohio. You could write an SQL statement to select the rows from the Customers table. Then write an SQL statement to select the rows from the Vendors table. Combine the two SQL statements into a single SQL phrase using the UNION keyword. Now you can get a single result set that contains the results of both queries.

In the following SQL statement (see Figure 8.31), you are creating a result set that contains all Publishers and Buyers located in the state of New York.

SELECT Name, City, StateProv, Zip FROM Publishers WHERE StateProv='NY'
   UNION
SELECT Name, City, StateProv, Zip FROM Buyers WHERE StateProv='NY'
   ORDER BY Zip


Figure 8.31. An example of a UNION query.


A note of caution when using the UNION query. To keep the same number of data columns, SQL does a data type override to insert results into columns that are not the same data types. The UNION query uses the column names of the first SQL query in the statement and creates a result set that displays the data even if data types must be altered to do so.

Each portion of the UNION query must have the same number of columns. If the first query results in six displayable columns, the query on the other side of the UNION statement must also result in six columns. If there is not an equal number of columns on each side of a UNION query, you receive an SQL error message.

You can also use UNION queries on the same table. The following SQL statement (see Figure 8.32) shows how you can use SQL to return the top-selling titles and the bottom-selling titles in the same result set:

SELECT SUM(Sales) AS TotSales,Title FROM BookSales
   GROUP BY Title HAVING SUM(Sales)>4000
UNION
   SELECT SUM(Sales) AS TotSales,Title FROM BookSales
   GROUP BY Title HAVING SUM(Sales)<1000
ORDER BY TotSales


Figure 8.32. Using UNION on the same data table.


You can use Visual Basic stored queries (QueryDefs) as replacements for the complete SQL statement on either side of a UNION keyword. You can also link several SQL queries together with successive UNION keywords.

Crosstab Queries with TRANSFORM_PIVOT

The last SQL statement covered today is the TRANSFORM_PIVOT statement. This is a very powerful SQL tool that enables you to create result sets that contain summarized data in a form known as a crosstab query. Instead of trying to explain a crosstab query, let's look at a sample problem.

Suppose you have a data table that contains information on book titles and sales by quarter (sound familiar?). You have been asked to produce a view set that lists each book title down the left side and each quarter across the top with the sales figures for each quarter to the right of the book title. The only problem is that your data table has a single record for each quarter for each book. For example, if book A has sales in three quarters this year, you have three rows in your data table. If book B has sales for four quarters, you have four rows, and so on. How can you produce a view that lists the quarters as columns instead of as rows?

You can accomplish this with a complicated set of subsequent SQL statements that produces temporary views, merges them together, and so on. Thanks to the folks who invented the Microsoft Access Jet database engine, however, you can use the TRANSFORM_PIVOT statement instead. You can produce the entire result set in one SQL statement using TRANSFORM_PIVOT. The following SQL statement shows how this can be done. See Figure 8.33 for a sample result set.

TRANSFORM SUM(BookSales.Sales)
   SELECT Title FROM BookSales
   GROUP BY Title
PIVOT BookSales.Qtr


Figure 8.33. The TRANSFORM_PIVOT example.

Notice the form of the TRANSFORM_PIVOT statement. It starts with the TRANSFORM keyword, not the SELECT keyword. Notice that a single SQL aggregate function immediately follows the TRANSFORM keyword. This is required, even if no real totaling will be performed. After the TRANSFORM aggregate function clause, you have the standard SELECT_FROM clause. Notice that the preceding example did not include the Booksales.Sales column in the SELECT statement because it will be produced by the TRANSFORM_PIVOT clause automatically. The GROUP BY clause is required in order to tell SQL how to treat the successive rows that will be handled for each BookSales.Title. Finally, add the PIVOT keyword, followed by the column that you want to use, as the set of headers that follow out to the right of the GROUP BY clause.

TRANSFORM_PIVOT uses the data in the PIVOT column as column headers in the result set. You will have as many columns in your result set as you have unique values in your PIVOT column. This is important to understand. Using columns that contain a limited set of data (such as months of the year) produces valuable result sets. However, using a column that contains unique data (such as the CustomerID column) produces a result set with an unpredictable number of columns.

The nice thing about TRANSFORM_PIVOT is that it is easy to produce several different views of the same data by just changing the PIVOT column. For example, what if you wanted to see the book sales results by BookSales.SaleRep instead of by BookSales.Qtr? All you have to do is change the PIVOT field. See the following code example and Figure 8.34.

TRANSFORM SUM(BookSales.Sales)
   SELECT Title FROM BookSales
   GROUP BY Title
PIVOT BookSales.SalesRep


Figure 8.34. Changing the PIVOT field.

Notice, in Figure 8.34, that you can see a column with the header <>. When Microsoft Access Jet ran the SQL statement, it discovered some records that had no value in the BookSales.SaleRep column. SQL automatically created a new column (<>) to hold these records and make sure they were not left out of the result set.

Even though TRANSFORM_PIVOT is a powerful SQL tool, there is one drawback to its widespread use in your programs. The TRANSFORM_PIVOT clause is not an ANSI-SQL clause. Microsoft added this clause as an extension of the ANSI-SQL command set. If you use it in your programs, you will not be able to port your SQL statements to other back-end databases that do not support the TRANSFORM_PIVOT SQL clause. Despite this drawback, you will find TRANSFORM_PIVOT a very valuable SQL tool when it comes to producing result sets for summary reports, data graphs, and charts.

Nested SELECT Queries

Visual Basic 5 allows for the use of nested SELECT queries. These are often referred to as SQL subqueries and are literally queries contained within queries. Nested SELECT queries can prove to be useful when you want to perform a query based upon the results of another query.

To demonstrate the use of a SQL subquery, let's start Visdata and open the SUBQRY.MDB database that can be found in the \\TYSDBVB5\SOURCE\DATA directory of the CD that shipped with this book. This database contains a sample listing of authors, publishers, and book sales activity (notice that this database is very similar to the BOOKS5.MDB database) for a fictitious publisher. Our goal in this exercise is to extract the phone numbers of all the authors who sold more than 500 books in the first quarter.

As you examine the table structure of this database in Visdata, you notice the BookSales table contains the sales records by quarter, but the phone number is contained in the Authors table. We therefore need to query the BookSales table to find all the authors who sold more than 500 books in the first quarter, and then use that result set to find the writers' phone numbers in the Authors table. We need to build a nested SELECT query.

To do this, enter the following code into the SQL Statement window of Visdata. Execute your statement. The result set should look similar to Figure 8.35.

SELECT * FROM Authors WHERE AUID IN (SELECT AUID
FROM Booksales WHERE Sales>500 AND Qtr=1)


Figure 8.35. The results of an SQL subquery.

An SQL subquery has three main components--the comparison, the expression, and the SQL statement. The comparison in our example is the SELECT FROM Authors query. The expression is the IN keyword. The SQL statement is the SELECT statement within the parentheses.


NOTE: The SQL statement on which you base the comparison statement must be a SELECT statement. This statement must also be enclosed in parentheses and is referred to as a subquery.

In the exercise, the SELECT statement contained within the parentheses (the sqlstatement, or subquery) is executed first to determine which authors sold more than 500 books in the first quarter. The SELECT statement outside of the parentheses (the main query) is then executed upon the result set created by the subquery. The IN keyword instructs the SELECT FROM Authors statement to take only those records that were extracted by the subquery.

Other keywords that can be used in the expression include ANY and ALL. Also, numeric expression such as > and < can be used in conjunction with the keyword to make comparisons.

For example, if you use the syntax > ANY in comparing the main query with the subquery (WHERE AUID > ANY), your result set displays all records from the main query that have a value greater than any value extracted from the subquery. Using > ALL (WHERE AUID > All) extracts only those records that are greater in value than every record extracted by the subquery.

Without the ability to use nested SQL statements, the preceding exercise would have required you to perform a JOIN on the two tables, or build a table to store the subqueries result set, and then execute the main query on the table. The use of nested SQL SELECT statements can be a great time saver.

Summary

Today you learned how to create basic SQL statements that select data from existing tables. You learned that the most fundamental form of the SQL statement is the SELECT_FROM clause. This clause is used to select one or more columns from a table and display the results of that statement in a result set, or view.

You also learned about the optional clauses that you can add to the SELECT_FROM clause:

You also learned about additional SQL keywords that you can use to control the contents of the result set:

Finally, you learned about the SQL subquery, and how to nest SELECT statements to extract data from a table based upon the results of another SELECT statement.

Quiz

1. What does SQL stand for? How is SQL pronounced?

2. What SQL statement enables you to select data from table fields?

3. What wildcard character do you use in a SELECT_FROM statement to include all fields of a table in your result?

4. What clause do you use in an SQL statement to sort the displayed data?

5.
Identify two functions that a WHERE clause performs in an SQL statement?

6. How do you rename the column headings in an SQL statement?

7.
What are SQL aggregate functions? List the SQL aggregate functions available through the Microsoft Access Jet database engine.

8. What are the drawbacks of using Visual Basic functions in SQL statements?

9. What is the difference between the DISTINCT and DISTINCTROW SQL clauses?

10. What clause should you always use with the TOP n or TOP n PERCENT clause?

11.
What are the three join types available in Microsoft Jet SQL? Briefly explain how each is used.

12. When would you use a UNION query?

Exercises

As a corporate MIS staff member, you are given the task of assisting the Accounting Department in extracting data from its accounts payable and accounts receivable systems. As part of your analysis, you determine that these systems possess the following data tables and fields:

CustomerMaster

CustomerID

Name

Address

City

State

Zip

Phone

CustomerType

CustomerType

CustomerType

Description

OpenInvoice

InvoiceNo

CustomerID

Date

Description

Amount

Suppliers

SupplierID

Name

Address

City

State

Zip

Phone

Use this information to answer the questions that follow:

1. Write an SQL statement to list all of the customers. Include their IDs, names, addresses, phone numbers, and customer types.

2. Display all of the information in the Open Invoice table, but display CustomerID as Account.

3. Display the same information requested in Exercise 2, but sort the data by customer and then by invoice number within each customer.

4. Display all suppliers that can be found within New York City. Display their IDs, names, addresses, and phone numbers.

5. Display the Customer types, names, and addresses for all customers with a customer type of ABC.

6. Select and display customer IDs and names for customer names beginning with AME.

7. Display the CustomerID and Name of all customers who have an open invoice. Sort your information by CustomerID.

8. Select and display the five largest outstanding invoices.

9. Display a listing of names and phone numbers of all customers and vendors who reside in Ohio.