Teach Yourself SQL in 21 Days, Second Edition

Previous chapterNext chapterContents


- Day 5 -
Clauses in SQL

Objectives

Today's topic is clauses--not the kind that distribute presents during the holidays, but the ones you use with a SELECT statement. By the end of the day you will understand and be able to use the following clauses:

To get a feel for where these functions fit in, examine the general syntax for a SELECT statement:

SYNTAX:
SELECT [DISTINCT | ALL] { *
                    | { [schema.]{table | view | snapshot}.*
                    | expr }  [ [AS] c_alias ]
                   [, { [schema.]{table | view | snapshot}.*
                    | expr } [ [AS] c_alias ]  ] ... }
FROM [schema.]{table | view | snapshot}[@dblink] [t_alias]
[, [schema.]{table | view | snapshot}[@dblink] [t_alias] ] ...
    [WHERE condition ]
    [GROUP BY expr [, expr] ... [HAVING condition] ]
    [{UNION | UNION ALL | INTERSECT | MINUS} SELECT command ]
    [ORDER BY {expr|position} [ASC | DESC]
           [, {expr|position} [ASC | DESC]] ...]


NOTE: In my experience with SQL, the ANSI standard is really more of an ANSI "suggestion." The preceding syntax will generally work with any SQL engine, but you may find some slight variations.


NOTE: You haven't yet had to deal with a complicated syntax diagram. Because many people find syntax diagrams more puzzling than illuminating when learning something new, this book has used simple examples to illustrate particular points. However, we are now at the point where a syntax diagram can help tie the familiar concepts to today's new material.

Don't worry about the exact syntax--it varies slightly from implementation to implementation anyway. Instead, focus on the relationships. At the top of this statement is SELECT, which you have used many times in the last few days. SELECT is followed by FROM, which should appear with every SELECT statement you typed. (You learn a new use for FROM tomorrow.) WHERE, GROUP BY, HAVING, and ORDER BY all follow. (The other clauses in the diagram--UNION, UNION ALL, INTERSECT, and MINUS--were covered in Day 3, "Expressions, Conditions, and Operators.") Each clause plays an important part in selecting and manipulating data.


NOTE: We have used two implementations of SQL to prepare today's examples. One implementation has an SQL> prompt and line numbers (Personal Oracle7), and the other (Borland's ISQL) does not. You will also notice that the output displays vary slightly, depending on the implementation.

The WHERE Clause

Using just SELECT and FROM, you are limited to returning every row in a table. For example, using these two key words on the CHECKS table, you get all seven rows:

INPUT:
SQL> SELECT *
  2  FROM CHECKS;
OUTPUT:
  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
       3 Ma Bell                200.32 Cellular Phone
       4 Local Utilities            98 Gas
       5 Joes Stale $ Dent         150 Groceries
      16 Cash                       25 Wild Night Out
      17 Joans Gas                25.1 Gas
7 rows selected.

With WHERE in your vocabulary, you can be more selective. To find all the checks you wrote with a value of more than 100 dollars, write this:

INPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  WHERE AMOUNT > 100;

The WHERE clause returns the four instances in the table that meet the required condition:

OUTPUT:
  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
       3 Ma Bell                200.32 Cellular Phone
       5 Joes Stale $ Dent         150 Groceries

WHERE can also solve other popular puzzles. Given the following table of names and locations, you can ask that popular question, Where's Waldo?

INPUT:
SQL> SELECT *
  2  FROM PUZZLE;
OUTPUT:
NAME            LOCATION
-------------- --------------
TYLER           BACKYARD
MAJOR           KITCHEN
SPEEDY          LIVING ROOM
WALDO           GARAGE
LADDIE          UTILITY CLOSET
ARNOLD          TV ROOM
6 rows selected.
INPUT:
SQL> SELECT LOCATION AS "WHERE'S WALDO?"
  2  FROM PUZZLE
  3  WHERE NAME = 'WALDO';
OUTPUT:
WHERE'S WALDO?
--------------
GARAGE

Sorry, we couldn't resist. We promise no more corny queries. (We're saving those for that SQL bathroom humor book everyone's been wanting.) Nevertheless, this query shows that the column used in the condition of the WHERE statement does not have to be mentioned in the SELECT clause. In this example you selected the location column but used WHERE on the name, which is perfectly legal. Also notice the AS on the SELECT line. AS is an optional assignment operator, assigning the alias WHERE'S WALDO? to LOCATION. You might never see the AS again, because it involves extra typing. In most implementations of SQL you can type

INPUT:
SQL> SELECT LOCATION "WHERE'S WALDO?"
  2  FROM PUZZLE
  3  WHERE NAME ='WALDO';

and get the same result as the previous query without using AS:

OUTPUT:
WHERE'S WALDO?
--------------
GARAGE

After SELECT and FROM, WHERE is the third most frequently used SQL term.

The STARTING WITH Clause

STARTING WITH is an addition to the WHERE clause that works exactly like LIKE(<exp>%). Compare the results of the following query:

INPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE LIKE('Ca%');
OUTPUT:
PAYEE                         AMOUNT REMARKS
==================== =============== ==============
Cash                              25 Wild Night Out
Cash                              60 Trip to Boston
Cash                              34 Trip to Dayton

with the results from this query:

INPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE STARTING WITH('Ca');
OUTPUT:
PAYEE                         AMOUNT REMARKS
==================== =============== ==============
Cash                              25 Wild Night Out
Cash                              60 Trip to Boston
Cash                              34 Trip to Dayton

The results are identical. You can even use them together, as shown here:

INPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE STARTING WITH('Ca')
OR
REMARKS LIKE 'G%';
OUTPUT:
PAYEE                         AMOUNT REMARKS
==================== =============== ===============
Local Utilities                   98 Gas
Joes Stale $ Dent                150 Groceries
Cash                              25 Wild Night Out
Joans Gas                       25.1 Gas
Cash                              60 Trip to Boston
Cash                              34 Trip to Dayton
Joans Gas                      15.75 Gas


WARNING: STARTING WITH is a common feature of many implementations of SQL. Check your implementation before you grow fond of it.

Order from Chaos: The ORDER BY Clause

From time to time you will want to present the results of your query in some kind of order. As you know, however, SELECT FROM gives you a listing, and unless you have defined a primary key (see Day 10, "Creating Views and Indexes"), your query comes out in the order the rows were entered. Consider a beefed-up CHECKS table:

INPUT:
SQL> SELECT * FROM CHECKS;
OUTPUT:
  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
       3 Ma Bell                200.32 Cellular Phone
       4 Local Utilities            98 Gas
       5 Joes Stale $ Dent         150 Groceries
      16 Cash                       25 Wild Night Out
      17 Joans Gas                25.1 Gas
       9 Abes Cleaners           24.35 X-Tra Starch       
      20 Abes Cleaners            10.5 All Dry Clean
       8 Cash                       60 Trip to Boston
      21 Cash                       34 Trip to Dayton
11 rows selected.
ANALYSIS:

You're going to have to trust me on this one, but the order of the output is exactly the same order as the order in which the data was entered. After you read Day 8, "Manipulating Data," and know how to use INSERT to create tables, you can test how data is ordered by default on your own.

The ORDER BY clause gives you a way of ordering your results. For example, to order the preceding listing by check number, you would use the following ORDER BY clause:

INPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  ORDER BY CHECK#;
OUTPUT:
  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
       3 Ma Bell                200.32 Cellular Phone
       4 Local Utilities            98 Gas
       5 Joes Stale $ Dent         150 Groceries
       8 Cash                       60 Trip to Boston
       9 Abes Cleaners           24.35 X-Tra Starch
      16 Cash                       25 Wild Night Out
      17 Joans Gas                25.1 Gas
      20 Abes Cleaners            10.5 All Dry Clean
      21 Cash                       34 Trip to Dayton
11 rows selected.

Now the data is ordered the way you want it, not the way in which it was entered. As the following example shows, ORDER requires BY; BY is not optional.

INPUT/OUTPUT:
SQL> SELECT * FROM CHECKS ORDER CHECK#;

SELECT * FROM CHECKS ORDER CHECK#
                           *
ERROR at line 1:
ORA-00924: missing BY keyword

What if you want to list the data in reverse order, with the highest number or letter first? You're in luck! The following query generates a list of PAYEEs that stars at the end of the alphabet:

INPUT/OUTPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  ORDER BY PAYEE DESC;

  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       2 Reading R.R.           245.34 Train to Chicago
       1 Ma Bell                   150 Have sons next time
       3 Ma Bell                200.32 Cellular Phone
       4 Local Utilities            98 Gas
       5 Joes Stale $ Dent         150 Groceries
      17 Joans Gas                25.1 Gas
      16 Cash                       25 Wild Night Out
       8 Cash                       60 Trip to Boston
      21 Cash                       34 Trip to Dayton
       9 Abes Cleaners           24.35 X-Tra Starch
      20 Abes Cleaners            10.5 All Dry Clean

11 rows selected.
ANALYSIS:

The DESC at the end of the ORDER BY clause orders the list in descending order instead of the default (ascending) order. The rarely used, optional keyword ASC appears in the following statement:

INPUT:
SQL> SELECT PAYEE, AMOUNT
  2  FROM CHECKS
  3  ORDER BY CHECK# ASC;
OUTPUT:
PAYEE                   AMOUNT
-------------------- ---------
Ma Bell                    150
Reading R.R.            245.34
Ma Bell                 200.32
Local Utilities             98
Joes Stale $ Dent          150
Cash                        60
Abes Cleaners            24.35
Cash                        25
Joans Gas                 25.1
Abes Cleaners             10.5
Cash                        34

11 rows selected.
ANALYSIS:

The ordering in this list is identical to the ordering of the list at the beginning of the section (without ASC) because ASC is the default. This query also shows that the expression used after the ORDER BY clause does not have to be in the SELECT statement. Although you selected only PAYEE and AMOUNT, you were still able to order the list by CHECK#.

You can also use ORDER BY on more than one field. To order CHECKS by PAYEE and REMARKS, you would query as follows:

INPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  ORDER BY PAYEE, REMARKS;
OUTPUT:
  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
      20 Abes Cleaners            10.5 All Dry Clean
       9 Abes Cleaners           24.35 X-Tra Starch
       8 Cash                       60 Trip to Boston
      21 Cash                       34 Trip to Dayton
      16 Cash                       25 Wild Night Out
      17 Joans Gas                25.1 Gas
       5 Joes Stale $ Dent         150 Groceries
       4 Local Utilities            98 Gas
       3 Ma Bell                200.32 Cellular Phone
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
ANALYSIS:

Notice the entries for Cash in the PAYEE column. In the previous ORDER BY, the CHECK#s were in the order 16, 21, 8. Adding the field REMARKS to the ORDER BY clause puts the entries in alphabetical order according to REMARKS. Does the order of multiple columns in the ORDER BY clause make a difference? Try the same query again but reverse PAYEE and REMARKS:

INPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  ORDER BY REMARKS, PAYEE;
OUTPUT:
  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- --------------------
      20 Abes Cleaners            10.5 All Dry Clean
       3 Ma Bell                200.32 Cellular Phone
      17 Joans Gas                25.1 Gas
       4 Local Utilities            98 Gas
       5 Joes Stale $ Dent         150 Groceries
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
       8 Cash                       60 Trip to Boston
      21 Cash                       34 Trip to Dayton
      16 Cash                       25 Wild Night Out
       9 Abes Cleaners           24.35 X-Tra Starch

11 rows selected.
ANALYSIS:

As you probably guessed, the results are completely different. Here's how to list one column in alphabetical order and list the second column in reverse alphabetical order:

INPUT/OUTPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  ORDER BY PAYEE ASC, REMARKS DESC;

  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       9 Abes Cleaners           24.35 X-Tra Starch
      20 Abes Cleaners            10.5 All Dry Clean
      16 Cash                       25 Wild Night Out
      21 Cash                       34 Trip to Dayton
       8 Cash                       60 Trip to Boston
      17 Joans Gas                25.1 Gas
       5 Joes Stale $ Dent         150 Groceries
       4 Local Utilities            98 Gas
       1 Ma Bell                   150 Have sons next time
       3 Ma Bell                200.32 Cellular Phone
       2 Reading R.R.           245.34 Train to Chicago

11 rows selected.
ANALYSIS:

In this example PAYEE is sorted alphabetically, and REMARKS appears in descending order. Note how the remarks in the three checks with a PAYEE of Cash are sorted.


TIP: If you know that a column you want to order your results by is the first column in a table, then you can type ORDER BY 1 in place of spelling out the column name. See the following example.

INPUT/OUTPUT:
SQL> SELECT *
  2  FROM CHECKS
  3  ORDER BY 1;

  CHECK# PAYEE                  AMOUNT REMARKS
-------- -------------------- -------- ------------------
       1 Ma Bell                   150 Have sons next time
       2 Reading R.R.           245.34 Train to Chicago
       3 Ma Bell                200.32 Cellular Phone
       4 Local Utilities            98 Gas
       5 Joes Stale $ Dent         150 Groceries
       8 Cash                       60 Trip to Boston
       9 Abes Cleaners           24.35 X-Tra Starch
      16 Cash                       25 Wild Night Out
      17 Joans Gas                25.1 Gas
      20 Abes Cleaners            10.5 All Dry Clean
      21 Cash                       34 Trip to Dayton

11 rows selected.
ANALYSIS:

This result is identical to the result produced by the SELECT statement that you used earlier today:

SELECT * FROM CHECKS ORDER BY CHECK#;

The GROUP BY Clause

On Day 3 you learned how to use aggregate functions (COUNT, SUM, AVG, MIN, and MAX). If you wanted to find the total amount of money spent from the slightly changed CHECKS table, you would type:

INPUT:
SELECT *
FROM CHECKS;

Here's the modified table:

OUTPUT:
CHECKNUM PAYEE                AMOUNT  REMARKS
======== =========== ===============  ======================

       1 Ma Bell                 150  Have sons next time
       2 Reading R.R.         245.34  Train to Chicago
       3 Ma Bell              200.33  Cellular Phone
       4 Local Utilities          98  Gas
       5 Joes Stale $ Dent       150  Groceries
      16 Cash                     25  Wild Night Out
      17 Joans Gas              25.1  Gas
       9 Abes Cleaners         24.35  X-Tra Starch
      20 Abes Cleaners          10.5  All Dry Clean
       8 Cash                     60  Trip to Boston
      21 Cash                     34  Trip to Dayton
      30 Local Utilities        87.5  Water
      31 Local Utilities          34  Sewer
      25 Joans Gas             15.75  Gas

Then you would type:

INPUT/OUTPUT:
SELECT SUM(AMOUNT)
FROM CHECKS;

            SUM
===============

        1159.87
ANALYSIS:

This statement returns the sum of the column AMOUNT. What if you wanted to find out how much you have spent on each PAYEE? SQL helps you with the GROUP BY clause. To find out whom you have paid and how much, you would query like this:

INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT)
FROM CHECKS
GROUP BY PAYEE;

PAYEE                            SUM
==================== ===============

Abes Cleaners              34.849998
Cash                             119
Joans Gas                  40.849998
Joes Stale $ Dent                150
Local Utilities                219.5
Ma Bell                    350.33002
Reading R.R.                  245.34
ANALYSIS:

The SELECT clause has a normal column selection, PAYEE, followed by the aggregate function SUM(AMOUNT). If you had tried this query with only the FROM CHECKS that follows, here's what you would see:

INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT)
FROM CHECKS;

Dynamic SQL Error
-SQL error code = -104
-invalid column reference
ANALYSIS:

SQL is complaining about the combination of the normal column and the aggregate function. This condition requires the GROUP BY clause. GROUP BY runs the aggregate function described in the SELECT statement for each grouping of the column that follows the GROUP BY clause. The table CHECKS returned 14 rows when queried with SELECT * FROM CHECKS. The query on the same table, SELECT PAYEE, SUM(AMOUNT) FROM CHECKS GROUP BY PAYEE, took the 14 rows in the table and made seven groupings, returning the SUM of each grouping.

Suppose you wanted to know how much you gave to whom with how many checks. Can you use more than one aggregate function?

INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT), COUNT(PAYEE)
FROM CHECKS
GROUP BY PAYEE;

PAYEE                            SUM       COUNT
==================== =============== ===========

Abes Cleaners              34.849998           2
Cash                             119           3
Joans Gas                  40.849998           2
Joes Stale $ Dent                150           1
Local Utilities                219.5           3
Ma Bell                    350.33002           2
Reading R.R.                  245.34           1
ANALYSIS:

This SQL is becoming increasingly useful! In the preceding example, you were able to perform group functions on unique groups using the GROUP BY clause. Also notice that the results were ordered by payee. GROUP BY also acts like the ORDER BY clause. What would happen if you tried to group by more than one column? Try this:

INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT), COUNT(PAYEE)
FROM CHECKS
GROUP BY PAYEE, REMARKS;

PAYEE                            SUM       COUNT
==================== =============== ===========

Abes Cleaners                   10.5           1
Abes Cleaners                  24.35           1
Cash                              60           1
Cash                              34           1
Cash                              25           1
Joans Gas                  40.849998           2
Joes Stale $ Dent                150           1
Local Utilities                   98           1
Local Utilities                   34           1
Local Utilities                 87.5           1
Ma Bell                       200.33           1
Ma Bell                          150           1
Reading R.R.                  245.34           1
ANALYSIS:

The output has gone from 7 groupings of 14 rows to 13 groupings. What is different about the one grouping with more than one check associated with it? Look at the entries for Joans Gas:

INPUT/OUTPUT:
SELECT PAYEE, REMARKS
FROM CHECKS
WHERE PAYEE = 'Joans Gas';

PAYEE                REMARKS
==================== ====================

Joans Gas            Gas
Joans Gas            Gas
ANALYSIS:

You see that the combination of PAYEE and REMARKS creates identical entities, which SQL groups together into one line with the GROUP BY clause. The other rows produce unique combinations of PAYEE and REMARKS and are assigned their own unique groupings.

The next example finds the largest and smallest amounts, grouped by REMARKS:

INPUT/OUTPUT:
SELECT MIN(AMOUNT), MAX(AMOUNT)
FROM CHECKS
GROUP BY REMARKS;

            MIN             MAX
=============== ===============

         245.34          245.34
           10.5            10.5
         200.33          200.33
          15.75              98
            150             150
            150             150
             34              34
             60              60
             34              34
           87.5            87.5
             25              25
          24.35           24.35

Here's what will happen if you try to include in the select statement a column that has several different values within the group formed by GROUP BY:

INPUT/OUTPUT:
SELECT PAYEE, MAX(AMOUNT), MIN(AMOUNT)
FROM CHECKS
GROUP BY REMARKS;

Dynamic SQL Error
-SQL error code = -104
-invalid column reference
ANALYSIS:

This query tries to group CHECKS by REMARK. When the query finds two records with the same REMARK but different PAYEEs, such as the rows that have GAS as a REMARK but have PAYEEs of LOCAL UTILITIES and JOANS GAS, it throws an error.

The rule is, Don't use the SELECT statement on columns that have multiple values for the GROUP BY clause column. The reverse is not true. You can use GROUP BY on columns not mentioned in the SELECT statement. For example:

INPUT/OUTPUT:
SELECT PAYEE, COUNT(AMOUNT)
FROM CHECKS
GROUP BY PAYEE, AMOUNT;

PAYEE                      COUNT
==================== ===========

Abes Cleaners                  1
Abes Cleaners                  1
Cash                           1
Cash                           1
Cash                           1
Joans Gas                      1
Joans Gas                      1
Joes Stale $ Dent              1
Local Utilities                1
Local Utilities                1
Local Utilities                1
Ma Bell                        1
Ma Bell                        1
Reading R.R.                   1
ANALYSIS:

This silly query shows how many checks you had written for identical amounts to the same PAYEE. Its real purpose is to show that you can use AMOUNT in the GROUP BY clause, even though it is not mentioned in the SELECT clause. Try moving AMOUNT out of the GROUP BY clause and into the SELECT clause, like this:

INPUT/OUTPUT:
SELECT PAYEE, AMOUNT, COUNT(AMOUNT)
FROM CHECKS
GROUP BY PAYEE;

Dynamic SQL Error
-SQL error code = -104
-invalid column reference
ANALYSIS:

SQL cannot run the query, which makes sense if you play the part of SQL for a moment. Say you had to group the following lines:

INPUT/OUTPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE ='Cash';

PAYEE                         AMOUNT REMARKS
==================== =============== ===============

Cash                              25 Wild Night Out
Cash                              60 Trip to Boston
Cash                              34 Trip to Dayton

If the user asked you to output all three columns and group by PAYEE only, where would you put the unique remarks? Remember you have only one row per group when you use GROUP BY. SQL can't do two things at once, so it complains: Error #31: Can't do two things at once.

The HAVING Clause

How can you qualify the data used in your GROUP BY clause? Use the table ORGCHART and try this:

INPUT:
SELECT * FROM ORGCHART;
OUTPUT:
NAME            TEAM          SALARY   SICKLEAVE ANNUALLEAVE
=============== ======== =========== =========== ===========

ADAMS           RESEARCH    34000.00          34          12
WILKES          MARKETING   31000.00          40           9
STOKES          MARKETING   36000.00          20          19
MEZA            COLLECTIONS 40000.00          30          27
MERRICK         RESEARCH    45000.00          20          17
RICHARDSON      MARKETING   42000.00          25          18
FURY            COLLECTIONS 35000.00          22          14
PRECOURT        PR          37500.00          24          24

If you wanted to group the output into divisions and show the average salary in each division, you would type:

INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM;

TEAM                    AVG
=============== ===========

COLLECTIONS        37500.00
MARKETING          36333.33
PR                 37500.00
RESEARCH           39500.00

The following statement qualifies this query to return only those departments with average salaries under 38000:

INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
WHERE AVG(SALARY) < 38000
GROUP BY TEAM;

Dynamic SQL Error
-SQL error code = -104

-Invalid aggregate reference
ANALYSIS:

This error occurred because WHERE does not work with aggregate functions. To make this query work, you need something new: the HAVING clause. If you type the following query, you get what you ask for:

INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING AVG(SALARY) < 38000;
TEAM                    AVG
=============== ===========

COLLECTIONS        37500.00
MARKETING          36333.33
PR                 37500.00
ANALYSIS:

HAVING enables you to use aggregate functions in a comparison statement, providing for aggregate functions what WHERE provides for individual rows. Does HAVING work with nonaggregate expressions? Try this:

INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING SALARY < 38000;

TEAM                    AVG
=============== ===========

PR                 37500.00
ANALYSIS:

Why is this result different from the last query? The HAVING AVG(SALARY) < 38000 clause evaluated each grouping and returned only those with an average salary of under 38000, just what you expected. HAVING SALARY < 38000, on the other hand, had a different outcome. Take on the role of the SQL engine again. If the user asks you to evaluate and return groups of divisions where SALARY < 38000, you would examine each group and reject those where an individual SALARY is greater than 38000. In each division except PR, you would find at least one salary greater than 38000:

INPUT/OUTPUT:
SELECT NAME, TEAM, SALARY
FROM ORGCHART
ORDER BY TEAM;

NAME            TEAM                 SALARY
=============== =============== ===========

FURY            COLLECTIONS        35000.00
MEZA            COLLECTIONS        40000.00
WILKES          MARKETING          31000.00
STOKES          MARKETING          36000.00
RICHARDSON      MARKETING          42000.00
PRECOURT        PR                 37500.00
ADAMS           RESEARCH           34000.00
MERRICK         RESEARCH           45000.00
ANALYSIS:

Therefore, you would reject all other groups except PR. What you really asked was Select all groups where no individual makes more than 38000. Don't you just hate it when the computer does exactly what you tell it to?


WARNING: Some implementations of SQL return an error if you use anything other than an aggregate function in a HAVING clause. Don't bet the farm on using the previous example until you check the implementation of the particular SQL you use.

Can you use more than one condition in your HAVING clause? Try this:

INPUT:
SELECT TEAM, AVG(SICKLEAVE),AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM
HAVING AVG(SICKLEAVE)>25 AND
AVG(ANNUALLEAVE)<20;
ANALYSIS:

The following table is grouped by TEAM. It shows all the teams with SICKLEAVE averages above 25 days and ANNUALLEAVE averages below 20 days.

OUTPUT:
TEAM                    AVG         AVG
=============== =========== ===========

MARKETING                28          15
RESEARCH                 27          15

You can also use an aggregate function in the HAVING clause that was not in the SELECT statement. For example:

INPUT/OUTPUT:
SELECT TEAM, AVG(SICKLEAVE),AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM
HAVING COUNT(TEAM) > 1;

TEAM                    AVG         AVG
=============== =========== ===========

COLLECTIONS              26          21
MARKETING                28          15
RESEARCH                 27          15
ANALYSIS:

This query returns the number of TEAMs with more than one member. COUNT(TEAM) is not used in the SELECT statement but still functions as expected in the HAVING clause.

The other logical operators all work well within the HAVING clause. Consider this:

INPUT/OUTPUT:
SELECT TEAM,MIN(SALARY),MAX(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING AVG(SALARY) > 37000
OR
MIN(SALARY) > 32000;

TEAM                    MIN         MAX
=============== =========== ===========

COLLECTIONS        35000.00    40000.00
PR                 37500.00    37500.00
RESEARCH           34000.00    45000.00

The operator IN also works in a HAVING clause, as demonstrated here:

INPUT/OUTPUT:
SELECT TEAM,AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING TEAM IN ('PR','RESEARCH');

TEAM                    AVG
=============== ===========

PR                 37500.00
RESEARCH           39500.00

Combining Clauses

Nothing exists in a vacuum, so this section takes you through some composite examples that demonstrate how combinations of clauses perform together.

Example 5.1

Find all the checks written for Cash and Gas in the CHECKS table and order them by REMARKS.

INPUT:
SELECT PAYEE, REMARKS
FROM CHECKS
WHERE PAYEE = 'Cash'
OR REMARKS LIKE'Ga%'
ORDER BY REMARKS;
OUTPUT:
PAYEE                REMARKS
==================== ====================

Joans Gas            Gas
Joans Gas            Gas
Local Utilities      Gas
Cash                 Trip to Boston
Cash                 Trip to Dayton
Cash                 Wild Night Out
ANALYSIS:

Note the use of LIKE to find the REMARKS that started with Ga. With the use of OR, data was returned if the WHERE clause met either one of the two conditions.

What if you asked for the same information and group it by PAYEE? The query would look something like this:

INPUT:
SELECT PAYEE, REMARKS
FROM CHECKS
WHERE PAYEE = 'Cash'
OR REMARKS LIKE'Ga%'
GROUP BY PAYEE
ORDER BY REMARKS;
ANALYSIS:

This query would not work because the SQL engine would not know what to do with the remarks. Remember that whatever columns you put in the SELECT clause must also be in the GROUP BY clause--unless you don't specify any columns in the SELECT clause.

Example 5.2

Using the table ORGCHART, find the salary of everyone with less than 25 days of sick leave. Order the results by NAME.

INPUT:
SELECT NAME, SALARY
FROM ORGCHART
WHERE SICKLEAVE < 25
ORDER BY NAME;
OUTPUT:
NAME                 SALARY
=============== ===========
FURY               35000.00
MERRICK            45000.00
PRECOURT           37500.00
STOKES             36000.00
ANALYSIS:

This query is straightforward and enables you to use your new-found skills with WHERE and ORDER BY.

Example 5.3

Again, using ORGCHART, display TEAM, AVG(SALARY), AVG(SICKLEAVE), and AVG(ANNUALLEAVE) on each team:

INPUT:
SELECT TEAM,
AVG(SALARY),
AVG(SICKLEAVE),
AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM;
OUTPUT:
TEAM                    AVG         AVG         AVG
=============== =========== =========== ===========
COLLECTIONS        37500.00          26          21
MARKETING          36333.33          28          15
PR                 37500.00          24          24
RESEARCH           39500.00          26          15

An interesting variation on this query follows. See if you can figure out what happened:

INPUT/OUTPUT:
SELECT TEAM,
AVG(SALARY),
AVG(SICKLEAVE),
AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM
ORDER BY NAME;

TEAM                    AVG         AVG         AVG
=============== =========== =========== ===========
RESEARCH           39500.00          27          15
COLLECTIONS        37500.00          26          21
PR                 37500.00          24          24

MARKETING          36333.33          28          15

A simpler query using ORDER BY might offer a clue:

INPUT/OUTPUT:
SELECT NAME, TEAM
FROM ORGCHART
ORDER BY NAME, TEAM;

NAME            TEAM
=============== ===========
ADAMS           RESEARCH
FURY            COLLECTIONS
MERRICK         RESEARCH
MEZA            COLLECTIONS
PRECOURT        PR
RICHARDSON      MARKETING
STOKES          MARKETING
WILKES          MARKETING
ANALYSIS:

When the SQL engine got around to ordering the results of the query, it used the NAME column (remember, it is perfectly legal to use a column not specified in the SELECT statement), ignored duplicate TEAM entries, and came up with the order RESEARCH, COLLECTIONS, PR, and MARKETING. Including TEAM in the ORDER BY clause is unnecessary, because you have unique values in the NAME column. You can get the same result by typing this statement:

INPUT/OUTPUT:
SELECT NAME, TEAM
FROM ORGCHART
ORDER BY NAME;

NAME            TEAM
=============== ============
ADAMS           RESEARCH
FURY            COLLECTIONS
MERRICK         RESEARCH
MEZA            COLLECTIONS
PRECOURT        PR
RICHARDSON      MARKETING
STOKES          MARKETING
WILKES          MARKETING

While you are looking at variations, don't forget you can also reverse the order:

INPUT/OUTPUT:
SELECT NAME, TEAM
FROM ORGCHART
ORDER BY NAME DESC;

NAME            TEAM
=============== ============
WILKES          MARKETING
STOKES          MARKETING
RICHARDSON      MARKETING
PRECOURT        PR
MEZA            COLLECTIONS
MERRICK         RESEARCH
FURY            COLLECTIONS
ADAMS           RESEARCH

Example 5.4: The Big Finale

Is it possible to use everything you have learned in one query? It is, but the results will be convoluted because in many ways you are working with apples and oranges--or aggregates and nonaggregates. For example, WHERE and ORDER BY are usually found in queries that act on single rows, such as this:

INPUT/OUTPUT:
SELECT *
FROM ORGCHART
ORDER BY NAME DESC;
NAME            TEAM          SALARY   SICKLEAVE ANNUALLEAVE
=============== ======== =========== =========== ===========
WILKES          MARKETING   31000.00          40           9
STOKES          MARKETING   36000.00          20          19
RICHARDSON      MARKETING   42000.00          25          18
PRECOURT        PR          37500.00          24          24
MEZA            COLLECTIONS 40000.00          30          27
MERRICK         RESEARCH    45000.00          20          17
FURY            COLLECTIONS 35000.00          22          14
ADAMS           RESEARCH    34000.00          34          12

GROUP BY and HAVING are normally seen in the company of aggregates:

INPUT/OUTPUT:
SELECT PAYEE,
SUM(AMOUNT) TOTAL,
COUNT(PAYEE) NUMBER_WRITTEN
FROM CHECKS
GROUP BY PAYEE
HAVING SUM(AMOUNT) > 50;

PAYEE                          TOTAL NUMBER_WRITTEN
==================== =============== ==============
Cash                             119              3
Joes Stale $ Dent                150              1
Local Utilities                219.5              3
Ma Bell                    350.33002              2
Reading R.R.                  245.34              1

You have seen that combining these two groups of clauses can have unexpected results, including the following:

INPUT:
SELECT PAYEE,
SUM(AMOUNT) TOTAL,
COUNT(PAYEE) NUMBER_WRITTEN
FROM CHECKS
WHERE AMOUNT >= 100
GROUP BY PAYEE
HAVING SUM(AMOUNT) > 50;
OUTPUT:
PAYEE                          TOTAL NUMBER_WRITTEN
==================== =============== ==============
Joes Stale $ Dent                150              1
Ma Bell                    350.33002              2
Reading R.R.                  245.34              1

Compare these two result sets and examine the raw data:

INPUT/OUTPUT:
SELECT PAYEE, AMOUNT
FROM CHECKS
ORDER BY PAYEE;

PAYEE                         AMOUNT
==================== ===============
Abes Cleaners                   10.5
Abes Cleaners                  24.35
Cash                              25
Cash                              34
Cash                              60
Joans Gas                      15.75
Joans Gas                       25.1
Joes Stale $ Dent                150
Local Utilities                   34
Local Utilities                 87.5
Local Utilities                   98
Ma Bell                          150
Ma Bell                       200.33
Reading R.R.                  245.34
ANALYSIS:

You see how the WHERE clause filtered out all the checks less than 100 dollars before the GROUP BY was performed on the query. We are not trying to tell you not to mix these groups--you may have a requirement that this sort of construction will meet. However, you should not casually mix aggregate and nonaggregate functions. The previous examples have been tables with only a handful of rows. (Otherwise, you would need a cart to carry this book.) In the real world you will be working with thousands and thousands (or billions and billions) of rows, and the subtle changes caused by mixing these clauses might not be so apparent.

Summary

Today you learned all the clauses you need to exploit the power of a SELECT statement. Remember to be careful what you ask for because you just might get it. Your basic SQL education is complete. You already know enough to work effectively with single tables. Tomorrow (Day 6, "Joining Tables") you will have the opportunity to work with multiple tables.

Q&A

Q I thought we covered some of these functions earlier this week? If so, why are we covering them again?

A We did indeed cover WHERE on Day 3. You needed a knowledge of WHERE to understand how certain operators worked. WHERE appears again today because it is a clause, and today's topic is clauses.

Workshop

The Workshop provides quiz questions to help solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix F, "Answers to Quizzes and Exercises."

Quiz

1. Which clause works just like LIKE(<exp>%)?

2. What is the function of the GROUP BY clause, and what other clause does it act like?

3. Will this SELECT work?

INPUT:
SQL> SELECT NAME, AVG(SALARY), DEPARTMENT
     FROM PAY_TBL
     WHERE DEPARTMENT = 'ACCOUNTING'
     ORDER BY NAME
     GROUP BY DEPARTMENT, SALARY;
4. When using the HAVING clause, do you always have to use a GROUP BY also?

5. Can you use ORDER BY on a column that is not one of the columns in the SELECT statement?

Exercises

1. Using the ORGCHART table from the preceding examples, find out how many people on each team have 30 or more days of sick leave.

2. Using the CHECKS table, write a SELECT that will return the following:

OUTPUT:



CHECK#_____PAYEE_______AMOUNT

   1     MA BELL         150


Previous chapterNext chapterContents


© Copyright, Macmillan Computer Publishing. All rights reserved.