-- 15 --

More Sophisticated Queries

You have already seen how the SELECT statement can retrieve records from a single table, but retrieving records is only one of the many features of this versatile statement. This chapter examines the more advanced features of the SELECT statement: the GROUP BY clause, the HAVING clause, and the join operation. This chapter shows you how to harness the full power of the SELECT statement.

The syntax of the SELECT statement is fairly difficult to decipher, and you can't really understand the use of its clauses just by studying the syntax diagrams shown in the Oracle documentation. I use the many examples in this chapter to show you what works and what doesn't.

Built-In Group Functions

The first subject for discussion is built-in functions that operate on groups of rows. Be warned: This topic is a good example of SQL's quirky characteristics. Even though these functions are group functions, they do not require the use of a GROUP BY clause.

Each of these functions returns a single value.

The COUNT Function

The COUNT function comes in two flavors: COUNT(*), which counts all the rows in a table that satisfy any specified criteria, and COUNT(column-name), which counts all the rows in a table that have a non-null value for column-name and satisfy any specified criteria.

The following code fragment uses COUNT(*) to count the number of customers who live in California:

SQL> select count(*)

  2  from Customer

  3  where

  4  State = 'CA';

 COUNT(*)

---------

        7

To count the number of customers who have a fax number, you need to supply Fax_Number as an argument to the COUNT function.

SQL> select count(Fax_Number)

  2  from Customer;

COUNT(FAX_NUMBER)

-----------------

                7

You can query the Customer table to verify that it contains only seven customers with fax numbers.

SQL> select Fax_Number

  2  from Customer;

FAX_NUMBER

----------

7145550123

5035551283

8015558194

3015558331

8085558183

8105558356

8105554199

15 rows selected.

The blank lines returned by the SELECT statement represent the null value returned for customer rows in which Fax_Number is NULL.

Observe that the COUNT function returns a single row--;even if the count is zero.

SQL> select count(*)

  2  from Employee

  3  where

  4  Monthly_Salary > 100000;

 COUNT(*)

---------

        0

Obtaining Maximum and Minimum Values

You can combine the MAX and MIN functions in a single SELECT statement. Here is an example:

SQL> select max(Initial_Retail_Value), min(Initial_Retail_Value)

  2  from Product;

MAX(INITIAL_RETAIL_VALUE) MIN(INITIAL_RETAIL_VALUE)

------------------------- -------------------------

                     2150                       185

You can use MAX and MIN with character values.

SQL> select min(Product_ID), max(Product_ID)

  2  from Product;

MIN(PRODUCT_ MAX(PRODUCT_

------------ ------------

A2001        TR901

You can also use MAX and MIN with date values.

SQL> select min(Hire_Date), max(Hire_Date)

  2  from Employee;

MIN(HIRE_ MAX(HIRE_

--------- ---------

10-APR-82 18-MAY-93

Using AVG and SUM

AVG and SUM work in the same way as MIN and MAX. For instance, you can retrieve the average and total of the estimated repair cost for each item brought in by customers.

SQL> select avg(Estimated_Cost), sum(Estimated_Cost)

  2  from Repair_Item;

AVG(ESTIMATED_COST) SUM(ESTIMATED_COST)

------------------- -------------------

              167.4                 837

Here's a look at the data that the average calculation is based upon.

SQL> select Estimated_Cost

  2  from Repair_Item;

ESTIMATED_COST

--------------

           231

            83

           109

           102

           312

6 rows selected.

As you see, even though the Repair_Item table contains six rows, Estimated_Cost has only five non-null values. The AVG function does not count null values in calculating the average: (231+83+109+102+312)/5 = 167.4.

Combining Group Functions with Other Columns

You cannot combine group functions and columns in the select list of a SELECT statement without using a GROUP BY clause. If you try to do so, Oracle returns the following error:

SQL> select max(Initial_Retail_Value), Manufacturer_ID

  2  from Product;

select max(Initial_Retail_Value), Manufacturer_ID

                                  *

ERROR at line 1:

ORA-00937: not a single-group group function

What is Oracle trying to tell you? It helps to think in these terms: A group function returns a single row, but Manufacturer_ID returns as many rows as exist in the table. It simply doesn't make sense to return both of these values at the same time--;instead, Oracle returns an error message. However, by using a GROUP BY clause, you can combine group functions and columns in the select list. I explain the use of the GROUP BY clause in the next major section.

Looking for Distinct Rows

The SELECT statement has an optional keyword that I haven't mentioned yet: DISTINCT. This keyword follows SELECT and instructs Oracle to return only rows that have distinct values for the specified columns. As an example, you can obtain a list of the states in which customers live by using the following statement:

SQL> select distinct state from customer;

ST

--

AZ

CA

IL

MA

MI

OR

WA

7 rows selected.

The DISTINCT option is very useful for finding a column's set of values. It offers a method that you can use for quickly determining how values are clustered in a table. If you don't specify the DISTINCT option, Oracle retrieves all rows that satisfy the criteria in the WHERE clause. By default, Oracle uses the ALL option to retrieve all rows. Specifically, the following SELECT statements are equivalent:

select state from customer;

and

select ALL state from customer;

Grouping Rows

The GROUP BY clause is another section of the SELECT statement. This optional clause tells Oracle to group rows based on the distinct values that exist for the specified columns. In addition, the HAVING clause can be used in conjunction with the GROUP BY clause to further restrict the retrieved rows. This topic is best explained by example, but first here's a quick look at the syntax:

SELECT select-list

FROM table-list

[WHERE

condition [AND | OR]

...

condition]

[GROUP BY column1, column2, ..., columnN]

[HAVING condition]

[ORDER BY column1, column2, ...]

The variables are as follows: select-list is a set of columns and expressions from the tables listed in table-list.

table-list is the tables from which rows are retrieved.

condition is a valid Oracle SQL condition.

column1
through columnN are columns contained in table-list.



Tip

You can easily mix these elements into a SELECT statement that Oracle will reject. In addition, you can construct statements that Oracle processes without errors but whose results are difficult to interpret. If you encounter these problems, the best solution is to go back to basics. Study your SELECT statement. If it no longer makes sense, use SQL*Plus to break it down, element by element, until it does. Analyze your use of group functions, the GROUP BY and HAVING clauses, and any join conditions.


Here's a simple example that uses the GROUP BY clause to display a list of Manufacturer IDs found in the Product table.

SQL> select Manufacturer_ID

  2  from Product

  3  group by Manufacturer_ID;

MANUFAC

-------

GOL201

MIT501

SEN101

TES801

You can also add an ORDER BY clause to the statement so that the Manufacturer IDs are displayed in reverse alphabetic order.

SQL> select Manufacturer_ID

  2  from Product

  3  group by Manufacturer_ID

  4  order by Manufacturer_ID desc;

MANUFAC

-------

TES801

SEN101

MIT501

GOL201

If you want to count the number of rows that exist for each Manufacturer ID, you can use the following method:

SQL> select Manufacturer_ID, count(*)

  2  from Product

  3  group by Manufacturer_ID;

MANUFAC  COUNT(*)

------- ---------

GOL201          1

MIT501          2

SEN101          6

TES801          6

Now consider the use of both the GROUP BY and HAVING clauses. You can use the HAVING clause to retrieve only those Manufacturer IDs that have exactly two rows in the Product table. For example:

SQL> select Manufacturer_ID, count(*)

  2  from Product

  3  group by Manufacturer_ID

  4  having count(*) = 2;

MANUFAC  COUNT(*)

------- ---------

MIT501          2

You cannot combine group functions with columns in the select list unless you use the GROUP BY clause. For instance, you can obtain the average initial retail value of products grouped by manufacturer.

SQL> select Manufacturer_ID, avg(Initial_Retail_Value)

  2  from Product

  3  group by Manufacturer_ID;

MANUFAC AVG(INITIAL_RETAIL_VALUE)

------- -------------------------

GOL201                        100

MIT501                        350

SEN101                     776.25

TES801                        551

You can also use the HAVING and GROUP BY clauses to identify duplicate rows. For example, you might not be able to define a primary key for an existing table because the primary key columns contain duplicate values, as shown in the following code:

SQL> alter table Patient add

  2  constraint Patient_PK

  3  primary key (Patient_ID);

alter table Patient add

*

ERROR at line 1:

ORA-02299: cannot add or enable constraint (FRAYED_WIRES.PATIENT_PK)-

duplicate keys found

The following query uses the HAVING and GROUP BY clauses to identify the values of Patient_ID that appear in more than one row, identify which rows are bogus, and add the primary key:

SQL> select Patient_ID

  2  from Patient

  3  having count(*) > 1

  4  group by Patient_ID;

PATIEN

------

GG9999

SQL> select *

  2  from Patient

  3  where

  4  Patient_ID = 'GG9999';

PATIEN BODY_TEMP_DEG_F FEVER_CLASS

------ --------------- --------------------

GG9999           107.6 LETHAL FEVER

GG9999             107

SQL> delete from Patient

  2  where Patient_ID = 'GG9999' and Body_Temp_Deg_F = 107;

1 row deleted.

SQL> alter table Patient add

  2  constraint Patient_PK

  3  primary key (Patient_ID);

Table altered.

Dealing with Hierarchical Information

I've included the topic of hierarchical information in this chapter--;instead of in an earlier discussion of SQL--;to avoid confusion. Let me begin with an explanation of what I mean by hierarchical information. The classic example of relational database tables--;Employee and Department (or EMP and DEPT)--;is hierarchical: A department has many employees. This example is actually a single-level hierarchy. A much better example of hierarchical data is a manufacturing bill of materials (BOM) that describes all the parts, subassemblies, and assemblies that compose a finished product. For example, an aircraft manufacturer might have a BOM that consists of thousands of levels and millions of parts.

Can relational databases in general and SQL in particular support this type of hierarchy? The answer is mixed. The good news is that SQL does provide some support for hierarchical data via the CONNECT BY clause. The bad news is that the support is quite limited, and the syntax is not intuitive. Nevertheless, the following example shows you how to use SQL to navigate through hierarchical data.

To illustrate the use of the CONNECT BY clause, I'll use a slight variation of the familiar Product tables. For this example, the table consists of three columns:

This table design is based upon the concept that every product belongs to the finished product or an assembly--;except for the finished product itself. I've created some data to demonstrate this concept. A camera, the X1000, is composed of several parts and one subassembly, B200. The rows are

SQL> select Assembly_ID, Product_ID, Description

  2  from Product_Assembly

  3  order by Assembly_ID, Product_ID;

ASSEMBLY_ID          PRODUCT_ID   DESCRIPTION

-------------------- ------------ ---------------------------------------------

B200                 I101         Titanium alloy, teflon-coated iris

B200                 S42          Variable-speed shutter, standard

B200                 W123         Manual film advance, winder

X1000                B200         Black body, stainless steel frame camera body

X1000                F100         Blue filter - standard

X1000                F55          Xenon flash unit

X1000                L100         100mm lens - standard

X1000                S04          4 foot camera strap, leather

                     X1000        Complete camera

9 rows selected.

Given the small quantity of data presented, you can almost eyeball the organization of the parts. However, the purpose of this example is to demonstrate how you can retrieve the organization of an arbitrarily large hierarchy.

The SELECT statement contains two new clauses: START WITH and CONNECT BY. The START WITH clause identifies the top-level row for which Oracle should retrieve subordinate rows--;in this example, I want to find the subordinate rows of the X1000. The CONNECT BY clause tells Oracle how to present the rows, and the PRIOR operator tells Oracle to present the rows so that each child row's Assembly_ID is equal to its parent row's Product_ID.

SQL> select lpad(' ',2*(level-1)) || Assembly_ID Assembly_ID,

  2  Product_ID, Description

  3  from Product_Assembly

  4  start with Product_ID = 'X1000'

  5  connect by prior Product_ID = Assembly_ID;

ASSEMBLY_ID          PRODUCT_ID   DESCRIPTION

-------------------- ------------ -------------------------------------------

                     X1000        Complete camera

  X1000              L100         100mm lens - standard

  X1000              B200         Black body, stainless steel frame camera body

    B200             S42          Variable-speed shutter, standard

    B200             I101         Titanium alloy, teflon-coated iris

    B200             W123         Manual film advance, winder

  X1000              S04          4 foot camera strap, leather

  X1000              F100         Blue filter - standard

  X1000              F55          Xenon flash unit

9 rows selected.

To indent each row to indicate its hierarchy level, I embedded LEVEL inside the LPAD function so that the number of blanks returned by LPAD corresponds to the hierarchy level. (LEVEL is a pseudocolumn that returns the hierarchy level for each row--;from 1 for the highest level to N for the most detailed level.) LPAD is concatenated with Assembly_ID.



Tip

Be cautious in how you construct your data model. Be absolutely certain that you have to support hierarchical information before you employ CONNECT BY and START WITH.


To look at a particular subassembly, specify the Assembly_ID in the START WITH clause. The following SELECT statement illustrates this procedure; it also shows the value of LEVEL for each row.

SQL> select lpad(' ',2*(level-1)) || Assembly_ID Assembly_ID,

  2  Product_ID, Level, Description

  3  from Product_Assembly

  4  start with Product_ID = 'B200'

  5  connect by prior Product_ID = Assembly_ID;

ASSEMBLY_ID     PRODUCT_ID  LEVEL DESCRIPTION

--------------- ----------------- ----------------------------------

X1000           B200            1 Black body, stainless steel frame camera body

  B200          S42             2 Variable-speed shutter, standard

  B200          I101            2 Titanium alloy, teflon-coated iris

  B200          W123            2 Manual film advance, winder

The hierarchical SELECT statement must obey the following restrictions. First, a hierarchical SELECT statement cannot join two or more tables (covered later in this chapter). Second, if you specify an ORDER BY clause, you will destroy the hierarchical ordering of the rows returned by the query.

The EXISTS Operator

Another SQL operator that you should be familiar with is the EXISTS operator. EXISTS operates on a subquery and returns a Boolean value:

Consider the following example based on the Employee and Employee_Dependent tables. Your company's health insurance provider needs to calculate a new rate for your company. The provider requests a list of employees' hire dates, but only for employees with dependents. Using the EXISTS operator, you submit this query:

SQL> select Last_Name, First_Name, Hire_Date

  2  from Employee E

  3  where

  4  exists

  5  (select * from Employee_Dependent D

  6  where

  7  E.Employee_ID = D.Employee_ID);

LAST_NAME                      FIRST_NAME           HIRE_DATE

------------------------------ -------------------- ---------

SMITH                          JEAN                 10-APR-82

GILROY                         MAX                  22-SEP-92

The EXISTS operator is useful in situations in which you're not interested in the column values returned by the subquery. Notice how table aliases--;E for Employee and D for Employee_Dependent--;were used to reduce the amount of typing and improve the readability of the query. You can also preface the EXISTS operator with the logical operator NOT. For instance, you can change the previous query to return only the employees without dependents.

SQL> select Last_Name, First_Name, Hire_Date

  2  from Employee E

  3  where

  4  not exists

  5  (select * from Employee_Dependent D

  6  where

  7  E.Employee_ID = D.Employee_ID);

LAST_NAME                      FIRST_NAME           HIRE_DATE

------------------------------ -------------------- ---------

HERNANDEZ                      RANDY                18-NOV-83

GLEASON                        PAUL                 05-APR-84

BARRETT                        SARAH                16-JAN-89

HIGGINS                        BEN                  11-FEB-89

YEN                            CINDY                09-JUN-91

CARSON                         BETH                 12-DEC-92

SWANSON                        HARRY                18-MAY-93

7 rows selected.

Now that you have a working knowledge of SQL, you are ready to delve into the world of joins. A join can retrieve rows from two or more tables that share a common set of values. A relational database would be of little value if it weren't for the join operation.

The Join Operation

The join operation is the mechanism that allows tables to be related to one another. A join operation retrieves columns from two or more tables. For example, if you were joining two tables, the retrieval criteria would specify the condition that a column in the first table--;which is defined as a foreign key--;is equal to a column in the second table--;which is the primary key referenced by the foreign key. A join operation in which the join's WHERE clause contains additional conditions is called an equi-join. The general syntax for the SELECT statement enables you to join more than two tables.

SELECT select-list

FROM table1, table2, ... , tableN

WHERE table1.column1 = table2.column2 and

...

table2.column3 = tableN.columnN

...

additional-conditions

The variables are as follows: select-list is the set of columns and expressions from table1 through tableN.

table1 through tableN are the tables from which column values are retrieved.

column1 through columnN are the columns in table1 through tableN that are related.

additional-conditions
are optional query criteria.

Please note that you are not required to reference column1 through columnN in the select-list.

A Simple Two-Table Join

A simple two-table join illustrates the use of this syntax. As you recall, the repair store database includes the Repair_Header table, which contains basic information about a customer's repair ID, and the Customer table that contains customer data. In order to prevent the storage of redundant data, the Repair_Header table stores only one piece of information about the customer: the Customer ID. By joining the Repair_Header table with the Customer table based on the Customer_ID column, you can retrieve additional customer data.

SQL> select Repair_id, Last_Name, First_Name

  2  from Repair_Header, Customer

  3  where

  4  Repair_Header.Customer_ID = Customer.Customer_ID

  5  order by Repair_ID;

REPAIR_ID LAST_NAME                      FIRST_NAME

--------- ------------------------------ ------------------------------

      501 Martinez                       Steve

      502 Smyth                          Julie

      503 Horace                         Michelle

      504 Pareski                        Monica

      505 Moran                          Sarah

Here's an analysis of each element of the SELECT statement. The select list consists of the three columns of interest: Repair_ID, Last_Name, and First_Name. The list of tables is made up of Repair_Header and Customer. The WHERE clause instructs Personal Oracle7 to return only rows in which Customer_ID in the Repair_Header table can be matched with a row in the Customer table that has the same value for Customer_ID.

The columns in the select-list can appear in any order; for example, the following query is perfectly acceptable:

SQL> select Last_Name, First_Name, Repair_ID

  2  from Repair_Header, Customer

  3  where

  4  Repair_Header.Customer_ID = Customer.Customer_ID

  5  order by Last_Name;

LAST_NAME                      FIRST_NAME                     REPAIR_ID

------------------------------ ------------------------------ ---------

Horace                         Michelle                             503

Martinez                       Steve                                501

Moran                          Sarah                                505

Pareski                        Monica                               504

Smyth                          Julie                                502

The query lists Last_Name, First_Name, and Repair_ID, even though Repair_Header is listed before Customer in the table list.

Ambiguous Columns

You need to keep in mind that each reference to a column in a join must be unambiguous. In this context unambiguous means that if the column exists in more than one table referenced in the join, the column name is qualified by the table name. Oracle returns an error message if you reference a column ambiguously.

SQL> select Customer_ID, Last_Name, First_Name, Repair_ID

  2  from Repair_Header, Customer

  3  where

  4  Repair_Header.Customer_ID = Customer.Customer_ID

  5  order by Last_Name;

select Customer_ID, Last_Name, First_Name, Repair_ID

       *

ERROR at line 1:

ORA-00918: column ambiguously defined

Oracle returns the error message because Customer_ID, which is referenced in the first line of the SELECT statement, appears in both the Repair_Header and Customer tables. To correct this problem, you must qualify the Customer_ID column with the table name.

SQL> select Customer.Customer_ID, Last_Name, First_Name, Repair_ID

  2  from Repair_Header, Customer

  3  where

  4  Repair_Header.Customer_ID = Customer.Customer_ID

  5  order by Last_Name;

CUSTOMER_ID LAST_NAME            FIRST_NAME      REPAIR_ID

----------- -------------------- --------------- ---------

       1005 Horace               Michelle              503

       1002 Martinez             Steve                 501

       1008 Moran                Sarah                 505

       1006 Pareski              Monica                504

       1003 Smyth                Julie                 502

Beware of the Cartesian Product

When you're first learning to join multiple tables, a common error is to forget to provide a join condition in the WHERE clause. If you forget a join condition, you'll notice two things: the query takes considerably longer to execute, and the number of retrieved records is much larger than you expected. The technical term for this result is the Cartesian product.

Consider the example of the Repair_Header and Customer tables. If you join these two tables without specifying a join condition, here's what happens:

SQL> select Customer.Customer_ID, Last_Name, First_Name, Repair_ID

  2  from Repair_Header, Customer

  3  order by Last_Name;

CUSTOMER_ID LAST_NAME              FIRST_NAME                     REPAIR_ID

----------- ---------------------- ------------------------------ ------

       1009 Chen                   Laura                                501

       1009 Chen                   Laura                                502

       1009 Chen                   Laura                                503

       1009 Chen                   Laura                                506

       1009 Chen                   Laura                                505

       1009 Chen                   Laura                                504

       6102 Fleming                Harry                                501

       6102 Fleming                Harry                                502

       6102 Fleming                Harry                                504

       6102 Fleming                Harry                                503

       6102 Fleming                Harry                                506

       6102 Fleming                Harry                                505

       2222 Fray                                                        501

       2222 Fray                                                        503

       2222 Fray                                                        505

       2222 Fray                                                        506

       2222 Fray                                                        504

       2222 Fray                                                        502

       6105 Hernandez              Ruby                                 501

.

.

.

       6101 Sorrel                 JAMES                                502

       6101 Sorrel                 JAMES                                503

       6101 Sorrel                 JAMES                                504

       6101 Sorrel                 JAMES                                505

       6101 Sorrel                 JAMES                                506

90 rows selected.

In this example the Repair_Header table has 6 rows, and the Customer table has 15 rows. The code sample does not have a join condition instructing Oracle to retrieve the columns from each table based on Customer_ID. As a result, Oracle returns all possible combinations of the two tables, which is 6 times 15, or 90 rows.

Multiple-Table Joins

As you see, the syntax for the SELECT statement doesn't limit the number of tables that can be joined. As an example, you can join these three tables--;Repair_Header, Repair_Item, and Product--;to retrieve a description of each repair item.

SQL> select R.Repair_ID, Customer_ID, Item_Number, Description

  2  from Repair_Header R, Repair_Item I, Product P

  3  where

  4  R.Repair_ID = I.Repair_ID and

  5  I.Product_ID = P.Product_ID and

  6  I.Manufacturer_ID = P.Manufacturer_ID

  7  order by R.Repair_ID, Item_Number;

REPAIR_ID CUSTOMER_ID ITEM_NUMBER DESCRIPTION

--------- ----------- ----------- ----------------------------------------

      501        1002           1 CD PLAYER, SINGLE-DISK

      502        1003           1 CCD Camera

      503        1005           1 Pre-amp, 120 W per channel

      504        1006           1 PRE AMPLIFIER 150 WATTS/CHANNEL

      505        1008           1 AMP, PROFESSIONAL 800W RMS PER CHANNEL

      506        1010           1 Tuner

      506        1010           2 PREAMP, 460 W/RMS

7 rows selected.

Notice that an alias has been specified for each of the three tables:

Although table aliases are optional, you should use them in multiple-table joins because they reduce the size of the SELECT statement and simplify its appearance.

Self-Joins

Another form of a two-table join is the self-join. This type of join is used when a table has a foreign key that references its own primary key. A good example of the self-join is the Product_Assembly table discussed earlier in this chapter. The primary key of the Product_Assembly table is Product_ID. Every product--;except a complete product--;belongs to an assembly. Therefore, Assembly_ID is a foreign key that references Product_ID (see Figure 15.1).

Figure 15.1. Example of a table's primary key referenced by its foreign key.

As an excellent example of the purpose of a self-join, suppose that you want to retrieve the description of each product and the description of the assembly to which it belongs. As the following example illustrates, the join statement must define an alias for both copies of the table. The join condition--;P1.Assembly_ID = P2.Product_ID--;causes Oracle to retrieve product information from P1 and assembly information from P2.

SQL> select P1.Description || ' is part of ' ||  P2.Description

                                                 "Assembly Breakdown"

  2  from Product_Assembly P1, Product_Assembly P2

  3  where

  4  P1.Assembly_ID = P2.Product_ID

  5  order by P1.Assembly_ID, P1.Product_ID;

Assembly Breakdown

------------------------------------------------------------------------------

Titanium alloy, teflon-coated iris is part of Black body, stainless steel fram

Variable-speed shutter, standard is part of Black body, stainless steel frame

Manual film advance, winder is part of Black body, stainless steel frame camer

Black body, stainless steel frame camera body is part of Complete camera

Blue filter - standard is part of Complete camera

Xenon flash unit is part of Complete camera

100mm lens - standard is part of Complete camera

4 foot camera strap, leather is part of Complete camera

8 rows selected.

Outer Joins

You've seen that a multiple-table join returns only columns from each table where the join conditions are met. Consider the following example in which you have two tables--;an Employee table and an Employee_ Dependent table. An employee may have zero, one, or several dependents. Accordingly, a row in the Employee table may have zero or more related rows in the Employee_Dependent table. Here are the contents of the tables:

SQL> select Employee_ID, Last_Name, First_Name

  2  from Employee

  3  order by Employee_ID;

EMPLOYEE_ID LAST_NAME                      FIRST_NAME

----------- ------------------------------ --------------------

       1001 SMITH                          JEAN

       1002 HERNANDEZ                      RANDY

       1003 GLEASON                        PAUL

       1004 BARRETT                        SARAH

       1005 HIGGINS                        BEN

       1006 YEN                            CINDY

       1007 GILROY                         MAX

       1008 CARSON                         BETH

       1009 SWANSON                        HARRY

9 rows selected.

SQL> select Employee_ID, Last_Name, First_Name, Relationship

  2  from Employee_Dependent

  3  order by Employee_ID;

EMPLOYEE_ID LAST_NAME                      FIRST_NAME           RELATIONSHIP

----------- ------------------------------ -------------------- ---------------

       1001 SMITH                          JARED                CHILD

       1006 YEN                            RANDY                SPOUSE

       1007 GILROY                         ARTHUR               CHILD

       1007 GILROY                         CATHY                SPOUSE

       1007 GILROY                         HORACE               CHILD

Watch what happens when these two tables are joined by Employee_ID:

SQL> select E.Employee_ID, E.Last_Name, E.First_Name, D.Last_Name, D.First_Name, Relationship

  2  from Employee E, Employee_Dependent D

  3  where

  4  E.Employee_ID = D.Employee_ID

  5  order by E.Employee_ID, D.Last_Name, D.First_Name;

EMPLOYEE_ID LAST_NAME     FIRST_NAME   LAST_NAME     FIRST_NAME   RELATIONSHIP

----------- ------------- ------------ ------------- ------------ ------------

       1001 SMITH         JEAN         SMITH         JARED        CHILD

       1006 YEN           CINDY        YEN           RANDY        SPOUSE

       1007 GILROY        MAX          GILROY        ARTHUR       CHILD

       1007 GILROY        MAX          GILROY        CATHY        SPOUSE

       1007 GILROY        MAX          GILROY        HORACE       CHILD

In the previous example, Oracle retrieves only the rows for employees with dependents. If you want to retrieve employee information whether or not an employee has dependents, you must use the outer join operator by appending (+)--;a plus sign within parentheses--;to the optional column in the join condition.

SQL> select E.Employee_ID, E.Last_Name, E.First_Name, D.Last_Name,

            D.First_Name, Relationship

  2  from Employee E, Employee_Dependent D

  3  where

  4  E.Employee_ID = D.Employee_ID (+)

  5  order by E.Employee_ID, D.Last_Name, D.First_Name;

EMPLOYEE_ID LAST_NAME     FIRST_NAME    LAST_NAME     FIRST_NAME  RELATIONSHIP

----------- ------------- ------------- ------------- ----------- ------------

       1001 SMITH         JEAN          SMITH         JARED       CHILD

       1002 HERNANDEZ     RANDY

       1003 GLEASON       PAUL

       1004 BARRETT       SARAH

       1005 HIGGINS       BEN

       1006 YEN           CINDY         YEN           RANDY       SPOUSE

       1007 GILROY        MAX           GILROY        ARTHUR      CHILD

       1007 GILROY        MAX           GILROY        CATHY       SPOUSE

       1007 GILROY        MAX           GILROY        HORACE      CHILD

       1008 CARSON        BETH

       1009 SWANSON       HARRY

11 rows selected.

If an employee has no dependents, Oracle returns a null for each of the columns selected from the Employee_Dependent table. In fact, you can obtain a list of employees without dependents by looking for Employee_Dependent rows in which Relationship is null.

SQL> select E.Employee_ID, E.Last_Name, E.First_Name

  2  from Employee E, Employee_Dependent D

  3  where

  4  E.Employee_ID = D.Employee_ID (+) and

  5  Relationship is NULL

  6  order by E.Employee_ID, D.Last_Name, D.First_Name;

EMPLOYEE_ID LAST_NAME            FIRST_NAME

----------- -------------------- ---------------

       1002 HERNANDEZ            RANDY

       1003 GLEASON              PAUL

       1004 BARRETT              SARAH

       1005 HIGGINS              BEN

       1008 CARSON               BETH

       1009 SWANSON              HARRY

6 rows selected.

Using Set Operators in the SELECT Statement

The SQL language is a partial implementation of the relational model as envisioned by Codd, the father of relational theory. As part of that implementation, Oracle's version of SQL provides three set operators: INTERSECT, UNION, and MINUS.

The INTERSECT Operator

The INTERSECT operator returns the rows that are common between two sets of rows. The syntax for using the INTERSECT operator is

select-stmt1

INTERSECT

select-stmt2

[order-by-clause]

where select-stmt1 and select-stmt2 are valid SELECT statements and order-by-clause is an ORDER BY clause that references the columns by number rather than by name.

Here are some requirements and considerations for using the INTERSECT operator:

Consider the following typical scenario for using the INTERSECT operator. You're supporting several regional sales offices, all of which use the same database tables. As you would expect, the contents of these tables differ from sales office to sales office. You have been asked to find the records that appear in both the Portland_Product table and the Houston_Product table. For the sake of illustration, I've reduced the contents of these tables to a comprehensible set of data.

SQL> select Product_ID, Manufacturer_ID

  2  from Portland_Product;

PRODUCT_ID   MANUFAC

------------ -------

C2002        MIT501

C2005        MIT501

C3002        SEN101

SQL> select Product_ID, Manufacturer_ID

  2  from Houston_Product;

PRODUCT_ID   MANUFAC

------------ -------

B901         TES801

B801         SEN101

C2002        MIT501

C2005        MIT501

C3002        SEN101

B311         TES801

B9310        SEN101

B384         TES801

8 rows selected.

To determine the common set of rows between the two tables, you submit the following SELECT statement to Oracle:

SQL> select Product_ID, Manufacturer_ID

  2  from Portland_Product

  3  INTERSECT

  4  select Product_ID, Manufacturer_ID

  5  from Houston_Product;

PRODUCT_ID   MANUFAC

------------ -------

C2002        MIT501

C2005        MIT501

C3002        SEN101

If you look at the rows that exist in each table, you can confirm that the two tables do indeed have three rows in common.

The UNION Operator

Sooner or later, you'll find yourself in a situation in which you need to combine the rows from similar tables to produce a report or to create a table for analysis. Even though the tables represent similar information, they may differ considerably. To accomplish this task, you should consider using the UNION operator. The straightforward syntax for this set operator is

select-stmt1

UNION

select-stmt2

[order-by-clause]

where select-stmt1 and select-stmt2 are valid SELECT statements and order-by-clause is an optional ORDER BY clause that references the columns by number rather than by name.

The UNION operator combines the rows returned by the first SELECT statement with the rows returned by the second SELECT statement. Keep the following things in mind when you use the UNION operator:

To illustrate the use of UNION, imagine that you have been given the task of consolidating information from two seismology labs. The first table is from the Department of Geophysics at the University of Northern South Dakota; the other table is from a private research institution, RIND. Here is the structure of the two tables:

SQL> desc UNSD_Event

 Name                            Null?    Type

 ------------------------------- -------- ----

 EVENT_NO                                 NUMBER

 EPICENTER_LATITUDE                       NUMBER

 EPICENTER_LONGITUDE                      NUMBER

 EVENT_MAGNITUDE                          NUMBER

 EVENT_TIME                               DATE

 EVENT_WAVE                               CHAR(1)

 INSTRUMENT_NAME                          VARCHAR2(30)

SQL> desc RIND_Event

 Name                            Null?    Type

 ------------------------------- -------- ----

 LOCATION_LAT                             NUMBER

 LOCATION_LON                             NUMBER

 RICHTER_NUMBER                           NUMBER

 DATE_TIME                                VARCHAR2(30)

 WAVE_TYPE                                CHAR(1)

Stop for a moment to examine the similarities and differences between UNSD_Event and RIND_Event. Both tables store information about seismic events; however, UNSD_Event has two extra columns: Event_No and Instrument_Name. Both tables store the epicenter latitude and longitude, the magnitude, and the wave type (P or S); however, UNSD_Event defines Event_Time as a DATE, whereas RIND_Event uses VARCHAR2 for storing the event date and time in Date_Time. If you try to perform a UNION without converting Event_Time in UNSD_Event and Date_Time in RIND_Event to a common datatype, here is what happens:

SQL> select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude,

            Event_Time, Event_Wave

  2  from UNSD_Event

  3  UNION

  4  select Location_Lat, Location_Lon, Richter_Number, Date_Time, Wave_Type

  5  from RIND_Event;

select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude,

             Event_Time, Event_Wave

             *

ERROR at line 1:

ORA-01790: expression must have same datatype as corresponding expression

You can normalize the two columns by converting Date_Time in the RIND_Event table to a date by using the TO_DATE function.

SQL> select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude,

            Event_Time, Event_Wave

  2  from UNSD_Event

  3  UNION

  4  select Location_Lat, Location_Lon, Richter_Number,

            TO_DATE(Date_Time,'MM-DD-YY HH:MI:SS'), Wave_Type

  5  from RIND_Event;

EPICENTER_LATITUDE EPICENTER_LONGITUDE EVENT_MAGNITUDE EVENT_TIM E

------------------ ------------------- --------------- --------- -

             12.83              189.85             5.8 25-APR-95 P

             22.33              233.31             5.9 03-FEB-95 P

             23.33              179.11             5.3 10-JAN-95 P

             29.84              238.41             6.2 22-MAR-95 S

             31.17              208.33             6.6 19-APR-95 S

             31.84              241.21             6.1 12-MAR-95 S

             37.81              211.84             6.4 11-JAN-95 S

7 rows selected.

If you want to order the results by the Event_Time, simply add this line to the SQL statement:

SQL> select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude,

            Event_Time, Event_Wave

  2  from UNSD_Event

  3  UNION

  4  select Location_Lat, Location_Lon, Richter_Number,

            TO_DATE(Date_Time,'MM-DD-YY HH:MI:SS'), Wave_Type

  5  from RIND_Event

  6  order by 4;

EPICENTER_LATITUDE EPICENTER_LONGITUDE EVENT_MAGNITUDE EVENT_TIM E

------------------ ------------------- --------------- --------- -

             23.33              179.11             5.3 10-JAN-95 P

             37.81              211.84             6.4 11-JAN-95 S

             22.33              233.31             5.9 03-FEB-95 P

             31.84              241.21             6.1 12-MAR-95 S

             29.84              238.41             6.2 22-MAR-95 S

             31.17              208.33             6.6 19-APR-95 S

             12.83              189.85             5.8 25-APR-95 P

7 rows selected.

UNION Versus UNION ALL

Oracle's implementation of SQL supports a variant of the UNION operator called UNION ALL. The difference between UNION and UNION ALL is this: If the two tables in the union have duplicate rows between them, UNION returns only one instance of a row, regardless of the number of duplicates. UNION ALL, on the other hand, returns all duplicates that exist between the two tables. For example, if you apply the UNION operator on the Houston_Product and Portland_Product tables, the results are

SQL> select Product_ID, Manufacturer_ID, Description

  2  from Houston_Product

  3  UNION

  4  select Product_ID, Manufacturer_ID, Description

  5  from Portland_Product

  6  order by 1;

PRODUCT_ID   MANUFAC DESCRIPTION

------------ ------- ---------------------------------------------------------

B311         TES801  Pre-amp, 120 W per channel

B384         TES801  PREAMP, 460 W/RMS

B801         SEN101  PRE AMPLIFIER 150 WATTS/CHANNEL

B901         TES801  Preamplifier, 200 W PER CHANNEL

B9310        SEN101  Pre amp, 250 W/channel

C2002        MIT501  CD PLAYER, SINGLE-DISK

C2005        MIT501  5-DISK CD PLAYER

C2005        MIT501  50-DISK CD PLAYER

C3002        SEN101  JUKEBOX, CD - 100 DISK CAPACITY

9 rows selected.

However, if you use the UNION ALL operator instead of the UNION operator, an additional two rows are returned. The reason is that for Product_IDs 2002 and 3002, the three specified columns--;Product_ID, Manufacturer_ID, and Description--;have the same values in both tables.

SQL> select Product_ID, Manufacturer_ID, Description

  2  from Houston_Product

  3  UNION ALL

  4  select Product_ID, Manufacturer_ID, Description

  5  from Portland_Product

  6  order by 1;

PRODUCT_ID   MANUFAC DESCRIPTION

------------ ------- -------------------------------------------------------

B311         TES801  Pre-amp, 120 W per channel

B384         TES801  PREAMP, 460 W/RMS

B801         SEN101  PRE AMPLIFIER 150 WATTS/CHANNEL

B901         TES801  Preamplifier, 200 W PER CHANNEL

B9310        SEN101  Pre amp, 250 W/channel

C2002        MIT501  CD PLAYER, SINGLE-DISK

C2002        MIT501  CD PLAYER, SINGLE-DISK

C2005        MIT501  5-DISK CD PLAYER

C2005        MIT501  50-DISK CD PLAYER

C3002        SEN101  JUKEBOX, CD - 100 DISK CAPACITY

C3002        SEN101  JUKEBOX, CD - 100 DISK CAPACITY

11 rows selected.

The MINUS Operator

In addition to the INTERSECT and UNION operators, Oracle also provides the MINUS operator for comparing one set of rows to another set. The syntax for using the MINUS operator resembles the syntax for the UNION operator.

select-stmt1

MINUS

select-stmt2

[order-by-clause]

where select-stmt1 and select-stmt2 are valid SELECT statements and order-by-clause is an ORDER BY clause that references the columns by number rather than by name.

The requirements and considerations for using the MINUS operator are essentially the same as those for the INTERSECT and UNION operators. The following example uses the MINUS operator, to compare the contents of two tables: Portland_Product and Houston_Product. The first step is to determine which Product and Manufacturer IDs exist only in Portland_Product.

SQL> select Product_ID, Manufacturer_ID

  2  from Portland_Product

  3  MINUS

  4  select Product_ID, Manufacturer_ID

  5  from Houston_Product;

no rows selected

As expected, Oracle doesn't return any rows. Again, this result occurs because the Product and Manufacturer IDs in Portland_Product--;C2002 and MIT501, C2005 and MIT501, and C3002 and SEN101--;all exist in Houston_Product. Now, however, exchange the two SELECT statements to obtain the rows that exist in Houston_Product only.

SQL> select Product_ID, Manufacturer_ID

  2  from Houston_Product

  3  MINUS

  4  select Product_ID, Manufacturer_ID

  5  from Portland_Product;

PRODUCT_ID   MANUFAC

------------ -------

B311         TES801

B384         TES801

B801         SEN101

B901         TES801

B9310        SEN101

This example demonstrates that MINUS won't tell you all the differences between the two tables--;it returns only those rows in the first set that can't be found in the second set. In fact, both tables appear to have information about the same Product IDs: C2002, C2005, and C3002. However, if you also retrieve the Description column from both tables, you can see additional differences.

SQL> select Product_ID, Manufacturer_ID, Description

  2  from Portland_Product

  3  where

  4  Product_ID like 'C%'

  5  MINUS

  6  select Product_ID, Manufacturer_ID, Description

  7  from Houston_Product

  8  where

  9  Product_ID like 'C%';

PRODUCT_ID   MANUFAC DESCRIPTION

------------ ------- ----------------------------------------

C2005        MIT501  50-DISK CD PLAYER

SQL> select Product_ID, Manufacturer_ID, Description

  2  from Houston_Product

  3  where

  4  Product_ID like 'C%'

  5  MINUS

  6  select Product_ID, Manufacturer_ID, Description

  7  from Portland_Product

  8  where

  9  Product_ID like 'C%';

PRODUCT_ID   MANUFAC DESCRIPTION

------------ ------- ----------------------------------------

C2005        MIT501  5-DISK CD PLAYER

The Product_ID C2005 row in the Portland_Product table describes product C2005 as a 50-DISK CD PLAYER, whereas the description in the equivalent row in the Houston_Product table is 5-DISK CD PLAYER. The MINUS operator only reveals differences between the contents of selected columns of the two tables. Also, as you can see from the previous example, you can specify a WHERE clause in the SELECT statements used by the MINUS operator.

Creating a Table by Selecting from Another Table

When you're developing a new database application, you'll usually go through a phase of experimenting with various table designs. Whether you use legacy data or artificial data, it's nice to have some simple methods for experimentation. One DDL statement that you might want to use is CREATE TABLE--;with a subquery. The syntax is

CREATE TABLE table-name

[(column-definition1, ... , column-definitionN)]

AS

select-statement

where table-name is the name of the new Oracle table; column-definition1 through column-definitionN are optional column definitions that are used to specify different column names to be associated with the values returned by the subquery; and select-statement is a valid SELECT statement whose select list is used in creating the new table.

In the following example the CREATE TABLE statement creates a new table whose columns are a subset of an existing table:

SQL> create table Repair_Header_2

  2  as

  3  select Repair_ID, Customer_ID, Employee_ID, Deposit_Amount, Deposit_Method

  4  from Repair_Header;

Table created.

SQL> select count(*) from Repair_Header_2;

 COUNT(*)

---------

        6

Notice that Oracle doesn't tell you how many rows are inserted into the new table. In fact, you can create a new table that is empty by specifying an impossible condition.

SQL> create table Repair_Header_3

  2  as

  3  select Repair_ID, Customer_ID, Employee_ID, Deposit_Amount, Deposit_Method

  4  from Repair_Header

  5  where

  6  1 = 2;

Table created.

SQL> select count(*) from Repair_Header_3;

 COUNT(*)

---------

        0



Tip

Oracle rejects the CREATE TABLE statement if the subquery references a LONG or LONG RAW column. If you need to copy rows from a LONG column in one table to another, use the SQL*Plus COPY command. It does support the copying of LONG column values.


The subquery referenced in a CREATE TABLE statement can be a multiple-table join, as shown in this example:

SQL> create table Join_Repair_Header_and_Item

  2  as

  3  select H.Repair_ID, Customer_ID, Employee_ID, Item_Number,

  4  Product_ID, Manufacturer_ID

  5  from Repair_Header H, Repair_Item I

  6  where

  7  H.Repair_ID = I.Repair_ID;

Table created.

SQL> desc Join_Repair_Header_and_Item

 Name                            Null?    Type

 ------------------------------- -------- ----

 REPAIR_ID                       NOT NULL NUMBER(6)

 CUSTOMER_ID                     NOT NULL NUMBER(4)

 EMPLOYEE_ID                     NOT NULL NUMBER(4)

 ITEM_NUMBER                     NOT NULL NUMBER

 PRODUCT_ID                               VARCHAR2(12)

 MANUFACTURER_ID                          VARCHAR2(6)



Note

The CREATE TABLE statement copies only one kind of table or column constraint: the NOT NULL constraint. If you want the new table to have all the existing constraints, you'll have to specify them with an ALTER TABLE statement.


Using SQL to Generate SQL Scripts

If you're willing to experiment, you'll find creative uses for SQL with SQL*Plus. One example is the use of SQL*Plus to generate SQL*Plus scripts. Suppose that you want a general script for counting the rows in all your tables. Unfortunately, SQL doesn't allow the use of a wildcard for specifying table names.

SQL> select count(*) from *;

select count(*) from *

                     *

ERROR at line 1:

RA-00903: invalid table name

To count the number of rows in each of your tables, you'd have to submit a SELECT statement to Oracle for each table.

SQL> select count(*) from Repair_Header;

 COUNT(*)

---------

        6

SQL> select count(*) from Repair_Item;

 COUNT(*)

---------

        6

.

.

.

SQL> select count(*) from Customer;

 COUNT(*)

---------

       15

This process can be quite tedious if you have many tables. To obtain a list of your tables, you would have to query the Oracle data dictionary view USER_TABLES. (You'll find information on other Oracle data dictionary views in Chapter 33, "Oracle Internals.") For now, here is the query you'd use to obtain a list of the tables:

SQL> select table_name from user_tables;

TABLE_NAME

------------------------------

ACTION_CODE

CONDITION_CODE

CUSTOMER

CUSTOMER_SUBSET

DEPOT

DEPOT_ESTIMATE

EMPLOYEE

ITEM_STATUS_CODE

.

.

.

The next step is to concatenate three strings--;a literal string, Table_Name, and a ; to terminate the statement.

SQL> select 'select count(*) from ' || table_name || ';'

  2  from user_tables;

'SELECTCOUNT(*)FROM'||TABLE_NAME||';'

----------------------------------------------------

select count(*) from ACTION_CODE;

select count(*) from CONDITION_CODE;

select count(*) from CUSTOMER;

select count(*) from CUSTOMER_SUBSET;

select count(*) from DEPOT;

14 rows selected.

The preceding SELECT statement returns a SELECT statement for each table that the Oracle user owns. You can spool these commands to a file so that SQL*Plus can execute them.

SQL> set feedback off

SQL> set heading off

SQL> set echo off

SQL> spool c:\book\chap14\countem.sql

SQL> select 'select count(*) from ' || table_name || ';'

  2  from user_tables;

select count(*) from ACTION_CODE;

select count(*) from CONDITION_CODE;

select count(*) from CUSTOMER;

select count(*) from CUSTOMER_SUBSET;

select count(*) from DEPOT;

SQL> set feedback on

SQL> set heading on

SQL> set echo on

SQL> start count_em

SQL>

SQL> select count(*) from ACTION_CODE;

 COUNT(*)

---------

        0

1 row selected.

SQL> select count(*) from CONDITION_CODE;

 COUNT(*)

---------

        9

1 row selected.

SQL> select count(*) from CUSTOMER;

 COUNT(*)

---------

       15

1 row selected.

SQL> select count(*) from CUSTOMER_SUBSET;

 COUNT(*)

---------

        1

1 row selected.

SQL> select count(*) from DEPOT;

 COUNT(*)

---------

        0

1 row selected.

This SQL*Plus script is fairly flexible. Let me explain the significance of each Set command. First, FEEDBACK is turned OFF so that SQL*Plus does not display the number of rows. Next, HEADING is set to OFF so that no column headings become part of the output--;you want only the SELECT statements, nothing more. Finally, ECHO is set to OFF so that the output does not capture the SELECT statement that generates the other SELECT statements. A little experimentation with these settings will show you why they are set to these values.

After the SELECT statements are generated and spooled to a file (c:\book\chap14\count_em.sql), FEEDBACK, HEADING, and ECHO are set to ON so that each SELECT statement appears before its results.



Note

In the previous SQL*Plus script, the SELECT statement that appears after the spool command won't actually appear because echo has been turned off. I presented the SQL*Plus output in this way to clarify the execution of the SQL*Plus commands and the SELECT statements.


Summary

Remember the following concepts: