You have already seen how the SELECT statement can retrieve records from a single table, but retrieving records is only one of the many features of this versatile statement. This chapter examines the more advanced features of the SELECT statement: the GROUP BY clause, the HAVING clause, and the join operation. This chapter shows you how to harness the full power of the SELECT statement.
The syntax of the SELECT statement is fairly difficult to decipher, and you can't really understand the use of its clauses just by studying the syntax diagrams shown in the Oracle documentation. I use the many examples in this chapter to show you what works and what doesn't.
The first subject for discussion is built-in functions that operate on groups of rows. Be warned: This topic is a good example of SQL's quirky characteristics. Even though these functions are group functions, they do not require the use of a GROUP BY clause.
Each of these functions returns a single value.
The COUNT function comes in two flavors: COUNT(*), which counts all the rows in a table that satisfy any specified criteria, and COUNT(column-name), which counts all the rows in a table that have a non-null value for column-name and satisfy any specified criteria.
The following code fragment uses COUNT(*) to count the number of customers who live in California:
SQL> select count(*) 2 from Customer 3 where 4 State = 'CA'; COUNT(*) --------- 7
To count the number of customers who have a fax number, you need to supply Fax_Number as an argument to the COUNT function.
SQL> select count(Fax_Number) 2 from Customer; COUNT(FAX_NUMBER) ----------------- 7
You can query the Customer table to verify that it contains only seven customers with fax numbers.
SQL> select Fax_Number 2 from Customer; FAX_NUMBER ---------- 7145550123 5035551283 8015558194 3015558331 8085558183 8105558356 8105554199 15 rows selected.
The blank lines returned by the SELECT statement represent the null value returned for customer rows in which Fax_Number is NULL.
Observe that the COUNT function returns a single row--;even if the count is zero.
SQL> select count(*) 2 from Employee 3 where 4 Monthly_Salary > 100000; COUNT(*) --------- 0
You can combine the MAX and MIN functions in a single SELECT statement. Here is an example:
SQL> select max(Initial_Retail_Value), min(Initial_Retail_Value) 2 from Product; MAX(INITIAL_RETAIL_VALUE) MIN(INITIAL_RETAIL_VALUE) ------------------------- -------------------------
2150 185
You can use MAX and MIN with character values.
SQL> select min(Product_ID), max(Product_ID) 2 from Product; MIN(PRODUCT_ MAX(PRODUCT_ ------------ ------------ A2001 TR901
You can also use MAX and MIN with date values.
SQL> select min(Hire_Date), max(Hire_Date) 2 from Employee; MIN(HIRE_ MAX(HIRE_ --------- --------- 10-APR-82 18-MAY-93
AVG and SUM work in the same way as MIN and MAX. For instance, you can retrieve the average and total of the estimated repair cost for each item brought in by customers.
SQL> select avg(Estimated_Cost), sum(Estimated_Cost) 2 from Repair_Item; AVG(ESTIMATED_COST) SUM(ESTIMATED_COST) ------------------- ------------------- 167.4 837
Here's a look at the data that the average calculation is based upon.
SQL> select Estimated_Cost 2 from Repair_Item; ESTIMATED_COST -------------- 231 83 109 102 312 6 rows selected.
As you see, even though the Repair_Item table contains six rows, Estimated_Cost has only five non-null values. The AVG function does not count null values in calculating the average: (231+83+109+102+312)/5 = 167.4.
You cannot combine group functions and columns in the select list of a SELECT statement without using a GROUP BY clause. If you try to do so, Oracle returns the following error:
SQL> select max(Initial_Retail_Value), Manufacturer_ID 2 from Product; select max(Initial_Retail_Value), Manufacturer_ID * ERROR at line 1: ORA-00937: not a single-group group function
What is Oracle trying to tell you? It helps to think in these terms: A group function returns a single row, but Manufacturer_ID returns as many rows as exist in the table. It simply doesn't make sense to return both of these values at the same time--;instead, Oracle returns an error message. However, by using a GROUP BY clause, you can combine group functions and columns in the select list. I explain the use of the GROUP BY clause in the next major section.
The SELECT statement has an optional keyword that I haven't mentioned yet: DISTINCT. This keyword follows SELECT and instructs Oracle to return only rows that have distinct values for the specified columns. As an example, you can obtain a list of the states in which customers live by using the following statement:
SQL> select distinct state from customer; ST -- AZ CA IL MA MI OR WA 7 rows selected.
The DISTINCT option is very useful for finding a column's set of values. It offers a method that you can use for quickly determining how values are clustered in a table. If you don't specify the DISTINCT option, Oracle retrieves all rows that satisfy the criteria in the WHERE clause. By default, Oracle uses the ALL option to retrieve all rows. Specifically, the following SELECT statements are equivalent:
select state from customer;
and
select ALL state from customer;
The GROUP BY clause is another section of the SELECT statement. This optional clause tells Oracle to group rows based on the distinct values that exist for the specified columns. In addition, the HAVING clause can be used in conjunction with the GROUP BY clause to further restrict the retrieved rows. This topic is best explained by example, but first here's a quick look at the syntax:
SELECT select-list FROM table-list [WHERE condition [AND | OR] ... condition] [GROUP BY column1, column2, ..., columnN] [HAVING condition] [ORDER BY column1, column2, ...]
The variables are as follows: select-list is a set of columns and expressions from the tables listed in table-list.
table-list is the tables from which rows are retrieved.
condition is a valid Oracle SQL condition.
column1 through columnN are columns contained in table-list.
TipYou can easily mix these elements into a SELECT statement that Oracle will reject. In addition, you can construct statements that Oracle processes without errors but whose results are difficult to interpret. If you encounter these problems, the best solution is to go back to basics. Study your SELECT statement. If it no longer makes sense, use SQL*Plus to break it down, element by element, until it does. Analyze your use of group functions, the GROUP BY and HAVING clauses, and any join conditions.
Here's a simple example that uses the GROUP BY clause to display a list of Manufacturer IDs found in the Product table.
SQL> select Manufacturer_ID 2 from Product 3 group by Manufacturer_ID; MANUFAC ------- GOL201 MIT501 SEN101 TES801
You can also add an ORDER BY clause to the statement so that the Manufacturer IDs are displayed in reverse alphabetic order.
SQL> select Manufacturer_ID 2 from Product 3 group by Manufacturer_ID 4 order by Manufacturer_ID desc; MANUFAC ------- TES801 SEN101 MIT501 GOL201
If you want to count the number of rows that exist for each Manufacturer ID, you can use the following method:
SQL> select Manufacturer_ID, count(*) 2 from Product 3 group by Manufacturer_ID; MANUFAC COUNT(*) ------- --------- GOL201 1 MIT501 2 SEN101 6 TES801 6
Now consider the use of both the GROUP BY and HAVING clauses. You can use the HAVING clause to retrieve only those Manufacturer IDs that have exactly two rows in the Product table. For example:
SQL> select Manufacturer_ID, count(*) 2 from Product 3 group by Manufacturer_ID 4 having count(*) = 2; MANUFAC COUNT(*) ------- --------- MIT501 2
You cannot combine group functions with columns in the select list unless you use the GROUP BY clause. For instance, you can obtain the average initial retail value of products grouped by manufacturer.
SQL> select Manufacturer_ID, avg(Initial_Retail_Value) 2 from Product 3 group by Manufacturer_ID; MANUFAC AVG(INITIAL_RETAIL_VALUE) ------- ------------------------- GOL201 100 MIT501 350 SEN101 776.25 TES801 551
You can also use the HAVING and GROUP BY clauses to identify duplicate rows. For example, you might not be able to define a primary key for an existing table because the primary key columns contain duplicate values, as shown in the following code:
SQL> alter table Patient add 2 constraint Patient_PK 3 primary key (Patient_ID); alter table Patient add * ERROR at line 1: ORA-02299: cannot add or enable constraint (FRAYED_WIRES.PATIENT_PK)- duplicate keys found
The following query uses the HAVING and GROUP BY clauses to identify the values of Patient_ID that appear in more than one row, identify which rows are bogus, and add the primary key:
SQL> select Patient_ID 2 from Patient 3 having count(*) > 1 4 group by Patient_ID; PATIEN ------ GG9999 SQL> select * 2 from Patient 3 where 4 Patient_ID = 'GG9999'; PATIEN BODY_TEMP_DEG_F FEVER_CLASS ------ --------------- -------------------- GG9999 107.6 LETHAL FEVER GG9999 107 SQL> delete from Patient 2 where Patient_ID = 'GG9999' and Body_Temp_Deg_F = 107; 1 row deleted. SQL> alter table Patient add 2 constraint Patient_PK 3 primary key (Patient_ID); Table altered.
I've included the topic of hierarchical information in this chapter--;instead of in an earlier discussion of SQL--;to avoid confusion. Let me begin with an explanation of what I mean by hierarchical information. The classic example of relational database tables--;Employee and Department (or EMP and DEPT)--;is hierarchical: A department has many employees. This example is actually a single-level hierarchy. A much better example of hierarchical data is a manufacturing bill of materials (BOM) that describes all the parts, subassemblies, and assemblies that compose a finished product. For example, an aircraft manufacturer might have a BOM that consists of thousands of levels and millions of parts.
Can relational databases in general and SQL in particular support this type of hierarchy? The answer is mixed. The good news is that SQL does provide some support for hierarchical data via the CONNECT BY clause. The bad news is that the support is quite limited, and the syntax is not intuitive. Nevertheless, the following example shows you how to use SQL to navigate through hierarchical data.
To illustrate the use of the CONNECT BY clause, I'll use a slight variation of the familiar Product tables. For this example, the table consists of three columns:
This table design is based upon the concept that every product belongs to the finished product or an assembly--;except for the finished product itself. I've created some data to demonstrate this concept. A camera, the X1000, is composed of several parts and one subassembly, B200. The rows are
SQL> select Assembly_ID, Product_ID, Description 2 from Product_Assembly 3 order by Assembly_ID, Product_ID; ASSEMBLY_ID PRODUCT_ID DESCRIPTION -------------------- ------------ --------------------------------------------- B200 I101 Titanium alloy, teflon-coated iris B200 S42 Variable-speed shutter, standard B200 W123 Manual film advance, winder X1000 B200 Black body, stainless steel frame camera body X1000 F100 Blue filter - standard X1000 F55 Xenon flash unit X1000 L100 100mm lens - standard X1000 S04 4 foot camera strap, leather X1000 Complete camera 9 rows selected.
Given the small quantity of data presented, you can almost eyeball the organization of the parts. However, the purpose of this example is to demonstrate how you can retrieve the organization of an arbitrarily large hierarchy.
The SELECT statement contains two new clauses: START WITH and CONNECT BY. The START WITH clause identifies the top-level row for which Oracle should retrieve subordinate rows--;in this example, I want to find the subordinate rows of the X1000. The CONNECT BY clause tells Oracle how to present the rows, and the PRIOR operator tells Oracle to present the rows so that each child row's Assembly_ID is equal to its parent row's Product_ID.
SQL> select lpad(' ',2*(level-1)) || Assembly_ID Assembly_ID, 2 Product_ID, Description 3 from Product_Assembly 4 start with Product_ID = 'X1000' 5 connect by prior Product_ID = Assembly_ID; ASSEMBLY_ID PRODUCT_ID DESCRIPTION -------------------- ------------ ------------------------------------------- X1000 Complete camera X1000 L100 100mm lens - standard X1000 B200 Black body, stainless steel frame camera body B200 S42 Variable-speed shutter, standard B200 I101 Titanium alloy, teflon-coated iris B200 W123 Manual film advance, winder X1000 S04 4 foot camera strap, leather X1000 F100 Blue filter - standard X1000 F55 Xenon flash unit 9 rows selected.
To indent each row to indicate its hierarchy level, I embedded LEVEL inside the LPAD function so that the number of blanks returned by LPAD corresponds to the hierarchy level. (LEVEL is a pseudocolumn that returns the hierarchy level for each row--;from 1 for the highest level to N for the most detailed level.) LPAD is concatenated with Assembly_ID.
TipBe cautious in how you construct your data model. Be absolutely certain that you have to support hierarchical information before you employ CONNECT BY and START WITH.
To look at a particular subassembly, specify the Assembly_ID in the START WITH clause. The following SELECT statement illustrates this procedure; it also shows the value of LEVEL for each row.
SQL> select lpad(' ',2*(level-1)) || Assembly_ID Assembly_ID, 2 Product_ID, Level, Description 3 from Product_Assembly 4 start with Product_ID = 'B200' 5 connect by prior Product_ID = Assembly_ID; ASSEMBLY_ID PRODUCT_ID LEVEL DESCRIPTION --------------- ----------------- ---------------------------------- X1000 B200 1 Black body, stainless steel frame camera body B200 S42 2 Variable-speed shutter, standard B200 I101 2 Titanium alloy, teflon-coated iris B200 W123 2 Manual film advance, winder
The hierarchical SELECT statement must obey the following restrictions. First, a hierarchical SELECT statement cannot join two or more tables (covered later in this chapter). Second, if you specify an ORDER BY clause, you will destroy the hierarchical ordering of the rows returned by the query.
Another SQL operator that you should be familiar with is the EXISTS operator. EXISTS operates on a subquery and returns a Boolean value:
Consider the following example based on the Employee and Employee_Dependent tables. Your company's health insurance provider needs to calculate a new rate for your company. The provider requests a list of employees' hire dates, but only for employees with dependents. Using the EXISTS operator, you submit this query:
SQL> select Last_Name, First_Name, Hire_Date 2 from Employee E 3 where 4 exists 5 (select * from Employee_Dependent D 6 where 7 E.Employee_ID = D.Employee_ID); LAST_NAME FIRST_NAME HIRE_DATE ------------------------------ -------------------- --------- SMITH JEAN 10-APR-82 GILROY MAX 22-SEP-92
The EXISTS operator is useful in situations in which you're not interested in the column values returned by the subquery. Notice how table aliases--;E for Employee and D for Employee_Dependent--;were used to reduce the amount of typing and improve the readability of the query. You can also preface the EXISTS operator with the logical operator NOT. For instance, you can change the previous query to return only the employees without dependents.
SQL> select Last_Name, First_Name, Hire_Date 2 from Employee E 3 where 4 not exists 5 (select * from Employee_Dependent D 6 where 7 E.Employee_ID = D.Employee_ID); LAST_NAME FIRST_NAME HIRE_DATE ------------------------------ -------------------- --------- HERNANDEZ RANDY 18-NOV-83 GLEASON PAUL 05-APR-84 BARRETT SARAH 16-JAN-89 HIGGINS BEN 11-FEB-89 YEN CINDY 09-JUN-91 CARSON BETH 12-DEC-92 SWANSON HARRY 18-MAY-93 7 rows selected.
Now that you have a working knowledge of SQL, you are ready to delve into the world of joins. A join can retrieve rows from two or more tables that share a common set of values. A relational database would be of little value if it weren't for the join operation.
The join operation is the mechanism that allows tables to be related to one another. A join operation retrieves columns from two or more tables. For example, if you were joining two tables, the retrieval criteria would specify the condition that a column in the first table--;which is defined as a foreign key--;is equal to a column in the second table--;which is the primary key referenced by the foreign key. A join operation in which the join's WHERE clause contains additional conditions is called an equi-join. The general syntax for the SELECT statement enables you to join more than two tables.
SELECT select-list FROM table1, table2, ... , tableN WHERE table1.column1 = table2.column2 and ... table2.column3 = tableN.columnN ... additional-conditions
The variables are as follows: select-list is the set of columns and expressions from table1 through tableN.
table1 through tableN are the tables from which column values are retrieved.
column1 through columnN are the columns in table1 through tableN that are related.
additional-conditions are optional query criteria.
Please note that you are not required to reference column1 through columnN in the select-list.
A simple two-table join illustrates the use of this syntax. As you recall, the repair store database includes the Repair_Header table, which contains basic information about a customer's repair ID, and the Customer table that contains customer data. In order to prevent the storage of redundant data, the Repair_Header table stores only one piece of information about the customer: the Customer ID. By joining the Repair_Header table with the Customer table based on the Customer_ID column, you can retrieve additional customer data.
SQL> select Repair_id, Last_Name, First_Name 2 from Repair_Header, Customer 3 where 4 Repair_Header.Customer_ID = Customer.Customer_ID 5 order by Repair_ID; REPAIR_ID LAST_NAME FIRST_NAME --------- ------------------------------ ------------------------------ 501 Martinez Steve 502 Smyth Julie 503 Horace Michelle 504 Pareski Monica 505 Moran Sarah
Here's an analysis of each element of the SELECT statement. The select list consists of the three columns of interest: Repair_ID, Last_Name, and First_Name. The list of tables is made up of Repair_Header and Customer. The WHERE clause instructs Personal Oracle7 to return only rows in which Customer_ID in the Repair_Header table can be matched with a row in the Customer table that has the same value for Customer_ID.
The columns in the select-list can appear in any order; for example, the following query is perfectly acceptable:
SQL> select Last_Name, First_Name, Repair_ID 2 from Repair_Header, Customer 3 where 4 Repair_Header.Customer_ID = Customer.Customer_ID 5 order by Last_Name; LAST_NAME FIRST_NAME REPAIR_ID ------------------------------ ------------------------------ --------- Horace Michelle 503 Martinez Steve 501 Moran Sarah 505 Pareski Monica 504 Smyth Julie 502
The query lists Last_Name, First_Name, and Repair_ID, even though Repair_Header is listed before Customer in the table list.
You need to keep in mind that each reference to a column in a join must be unambiguous. In this context unambiguous means that if the column exists in more than one table referenced in the join, the column name is qualified by the table name. Oracle returns an error message if you reference a column ambiguously.
SQL> select Customer_ID, Last_Name, First_Name, Repair_ID 2 from Repair_Header, Customer 3 where 4 Repair_Header.Customer_ID = Customer.Customer_ID 5 order by Last_Name; select Customer_ID, Last_Name, First_Name, Repair_ID * ERROR at line 1: ORA-00918: column ambiguously defined
Oracle returns the error message because Customer_ID, which is referenced in the first line of the SELECT statement, appears in both the Repair_Header and Customer tables. To correct this problem, you must qualify the Customer_ID column with the table name.
SQL> select Customer.Customer_ID, Last_Name, First_Name, Repair_ID 2 from Repair_Header, Customer 3 where 4 Repair_Header.Customer_ID = Customer.Customer_ID 5 order by Last_Name; CUSTOMER_ID LAST_NAME FIRST_NAME REPAIR_ID ----------- -------------------- --------------- --------- 1005 Horace Michelle 503 1002 Martinez Steve 501 1008 Moran Sarah 505 1006 Pareski Monica 504
1003 Smyth Julie 502
When you're first learning to join multiple tables, a common error is to forget to provide a join condition in the WHERE clause. If you forget a join condition, you'll notice two things: the query takes considerably longer to execute, and the number of retrieved records is much larger than you expected. The technical term for this result is the Cartesian product.
Consider the example of the Repair_Header and Customer tables. If you join these two tables without specifying a join condition, here's what happens:
SQL> select Customer.Customer_ID, Last_Name, First_Name, Repair_ID 2 from Repair_Header, Customer 3 order by Last_Name; CUSTOMER_ID LAST_NAME FIRST_NAME REPAIR_ID ----------- ---------------------- ------------------------------ ------ 1009 Chen Laura 501 1009 Chen Laura 502 1009 Chen Laura 503 1009 Chen Laura 506 1009 Chen Laura 505 1009 Chen Laura 504 6102 Fleming Harry 501 6102 Fleming Harry 502 6102 Fleming Harry 504 6102 Fleming Harry 503 6102 Fleming Harry 506 6102 Fleming Harry 505 2222 Fray 501 2222 Fray 503 2222 Fray 505 2222 Fray 506 2222 Fray 504 2222 Fray 502 6105 Hernandez Ruby 501 . . . 6101 Sorrel JAMES 502 6101 Sorrel JAMES 503 6101 Sorrel JAMES 504 6101 Sorrel JAMES 505 6101 Sorrel JAMES 506 90 rows selected.
In this example the Repair_Header table has 6 rows, and the Customer table has 15 rows. The code sample does not have a join condition instructing Oracle to retrieve the columns from each table based on Customer_ID. As a result, Oracle returns all possible combinations of the two tables, which is 6 times 15, or 90 rows.
As you see, the syntax for the SELECT statement doesn't limit the number of tables that can be joined. As an example, you can join these three tables--;Repair_Header, Repair_Item, and Product--;to retrieve a description of each repair item.
SQL> select R.Repair_ID, Customer_ID, Item_Number, Description 2 from Repair_Header R, Repair_Item I, Product P 3 where 4 R.Repair_ID = I.Repair_ID and 5 I.Product_ID = P.Product_ID and 6 I.Manufacturer_ID = P.Manufacturer_ID 7 order by R.Repair_ID, Item_Number; REPAIR_ID CUSTOMER_ID ITEM_NUMBER DESCRIPTION --------- ----------- ----------- ---------------------------------------- 501 1002 1 CD PLAYER, SINGLE-DISK 502 1003 1 CCD Camera 503 1005 1 Pre-amp, 120 W per channel 504 1006 1 PRE AMPLIFIER 150 WATTS/CHANNEL 505 1008 1 AMP, PROFESSIONAL 800W RMS PER CHANNEL 506 1010 1 Tuner 506 1010 2 PREAMP, 460 W/RMS 7 rows selected.
Notice that an alias has been specified for each of the three tables:
Although table aliases are optional, you should use them in multiple-table joins because they reduce the size of the SELECT statement and simplify its appearance.
Another form of a two-table join is the self-join. This type of join is
used when a table has a foreign key that references its own primary key. A good example
of the self-join is the Product_Assembly table discussed earlier in this chapter.
The primary key of the Product_Assembly table is Product_ID. Every product--;except
a complete product--;belongs to an assembly. Therefore, Assembly_ID is a foreign
key that references Product_ID (see Figure 15.1).
Figure 15.1. Example of a table's primary
key referenced by its foreign key.
As an excellent example of the purpose of a self-join, suppose that you want to retrieve
the description of each product and the description of the assembly to which it belongs.
As the following example illustrates, the join statement must define an alias for
both copies of the table. The join condition--;P1.Assembly_ID = P2.Product_ID--;causes
Oracle to retrieve product information from P1 and assembly information
from P2.
SQL> select P1.Description || ' is part of ' || P2.Description "Assembly Breakdown" 2 from Product_Assembly P1, Product_Assembly P2 3 where 4 P1.Assembly_ID = P2.Product_ID 5 order by P1.Assembly_ID, P1.Product_ID; Assembly Breakdown ------------------------------------------------------------------------------ Titanium alloy, teflon-coated iris is part of Black body, stainless steel fram Variable-speed shutter, standard is part of Black body, stainless steel frame Manual film advance, winder is part of Black body, stainless steel frame camer Black body, stainless steel frame camera body is part of Complete camera Blue filter - standard is part of Complete camera Xenon flash unit is part of Complete camera 100mm lens - standard is part of Complete camera 4 foot camera strap, leather is part of Complete camera 8 rows selected.
You've seen that a multiple-table join returns only columns from each table where the join conditions are met. Consider the following example in which you have two tables--;an Employee table and an Employee_ Dependent table. An employee may have zero, one, or several dependents. Accordingly, a row in the Employee table may have zero or more related rows in the Employee_Dependent table. Here are the contents of the tables:
SQL> select Employee_ID, Last_Name, First_Name 2 from Employee 3 order by Employee_ID; EMPLOYEE_ID LAST_NAME FIRST_NAME ----------- ------------------------------ -------------------- 1001 SMITH JEAN 1002 HERNANDEZ RANDY 1003 GLEASON PAUL 1004 BARRETT SARAH 1005 HIGGINS BEN 1006 YEN CINDY 1007 GILROY MAX 1008 CARSON BETH 1009 SWANSON HARRY 9 rows selected. SQL> select Employee_ID, Last_Name, First_Name, Relationship 2 from Employee_Dependent 3 order by Employee_ID; EMPLOYEE_ID LAST_NAME FIRST_NAME RELATIONSHIP ----------- ------------------------------ -------------------- --------------- 1001 SMITH JARED CHILD 1006 YEN RANDY SPOUSE 1007 GILROY ARTHUR CHILD 1007 GILROY CATHY SPOUSE 1007 GILROY HORACE CHILD
Watch what happens when these two tables are joined by Employee_ID:
SQL> select E.Employee_ID, E.Last_Name, E.First_Name, D.Last_Name, D.First_Name, Relationship 2 from Employee E, Employee_Dependent D 3 where 4 E.Employee_ID = D.Employee_ID 5 order by E.Employee_ID, D.Last_Name, D.First_Name; EMPLOYEE_ID LAST_NAME FIRST_NAME LAST_NAME FIRST_NAME RELATIONSHIP ----------- ------------- ------------ ------------- ------------ ------------ 1001 SMITH JEAN SMITH JARED CHILD 1006 YEN CINDY YEN RANDY SPOUSE 1007 GILROY MAX GILROY ARTHUR CHILD 1007 GILROY MAX GILROY CATHY SPOUSE 1007 GILROY MAX GILROY HORACE CHILD
In the previous example, Oracle retrieves only the rows for employees with dependents. If you want to retrieve employee information whether or not an employee has dependents, you must use the outer join operator by appending (+)--;a plus sign within parentheses--;to the optional column in the join condition.
SQL> select E.Employee_ID, E.Last_Name, E.First_Name, D.Last_Name, D.First_Name, Relationship 2 from Employee E, Employee_Dependent D 3 where 4 E.Employee_ID = D.Employee_ID (+) 5 order by E.Employee_ID, D.Last_Name, D.First_Name; EMPLOYEE_ID LAST_NAME FIRST_NAME LAST_NAME FIRST_NAME RELATIONSHIP ----------- ------------- ------------- ------------- ----------- ------------ 1001 SMITH JEAN SMITH JARED CHILD 1002 HERNANDEZ RANDY 1003 GLEASON PAUL 1004 BARRETT SARAH 1005 HIGGINS BEN 1006 YEN CINDY YEN RANDY SPOUSE 1007 GILROY MAX GILROY ARTHUR CHILD 1007 GILROY MAX GILROY CATHY SPOUSE 1007 GILROY MAX GILROY HORACE CHILD 1008 CARSON BETH 1009 SWANSON HARRY 11 rows selected.
If an employee has no dependents, Oracle returns a null for each of the columns selected from the Employee_Dependent table. In fact, you can obtain a list of employees without dependents by looking for Employee_Dependent rows in which Relationship is null.
SQL> select E.Employee_ID, E.Last_Name, E.First_Name 2 from Employee E, Employee_Dependent D 3 where 4 E.Employee_ID = D.Employee_ID (+) and 5 Relationship is NULL 6 order by E.Employee_ID, D.Last_Name, D.First_Name; EMPLOYEE_ID LAST_NAME FIRST_NAME ----------- -------------------- --------------- 1002 HERNANDEZ RANDY 1003 GLEASON PAUL 1004 BARRETT SARAH 1005 HIGGINS BEN 1008 CARSON BETH 1009 SWANSON HARRY 6 rows selected.
The SQL language is a partial implementation of the relational model as envisioned by Codd, the father of relational theory. As part of that implementation, Oracle's version of SQL provides three set operators: INTERSECT, UNION, and MINUS.
The INTERSECT operator returns the rows that are common between two sets of rows. The syntax for using the INTERSECT operator is
select-stmt1 INTERSECT select-stmt2 [order-by-clause]
where select-stmt1 and select-stmt2 are valid SELECT statements and order-by-clause is an ORDER BY clause that references the columns by number rather than by name.
Here are some requirements and considerations for using the INTERSECT operator:
Consider the following typical scenario for using the INTERSECT operator. You're supporting several regional sales offices, all of which use the same database tables. As you would expect, the contents of these tables differ from sales office to sales office. You have been asked to find the records that appear in both the Portland_Product table and the Houston_Product table. For the sake of illustration, I've reduced the contents of these tables to a comprehensible set of data.
SQL> select Product_ID, Manufacturer_ID 2 from Portland_Product; PRODUCT_ID MANUFAC ------------ ------- C2002 MIT501 C2005 MIT501 C3002 SEN101 SQL> select Product_ID, Manufacturer_ID 2 from Houston_Product; PRODUCT_ID MANUFAC ------------ ------- B901 TES801 B801 SEN101 C2002 MIT501 C2005 MIT501 C3002 SEN101 B311 TES801 B9310 SEN101 B384 TES801 8 rows selected.
To determine the common set of rows between the two tables, you submit the following SELECT statement to Oracle:
SQL> select Product_ID, Manufacturer_ID 2 from Portland_Product 3 INTERSECT 4 select Product_ID, Manufacturer_ID 5 from Houston_Product; PRODUCT_ID MANUFAC ------------ ------- C2002 MIT501 C2005 MIT501 C3002 SEN101
If you look at the rows that exist in each table, you can confirm that the two tables do indeed have three rows in common.
Sooner or later, you'll find yourself in a situation in which you need to combine the rows from similar tables to produce a report or to create a table for analysis. Even though the tables represent similar information, they may differ considerably. To accomplish this task, you should consider using the UNION operator. The straightforward syntax for this set operator is
select-stmt1 UNION select-stmt2 [order-by-clause]
where select-stmt1 and select-stmt2 are valid SELECT statements and order-by-clause is an optional ORDER BY clause that references the columns by number rather than by name.
The UNION operator combines the rows returned by the first SELECT statement with the rows returned by the second SELECT statement. Keep the following things in mind when you use the UNION operator:
To illustrate the use of UNION, imagine that you have been given the task of consolidating information from two seismology labs. The first table is from the Department of Geophysics at the University of Northern South Dakota; the other table is from a private research institution, RIND. Here is the structure of the two tables:
SQL> desc UNSD_Event Name Null? Type ------------------------------- -------- ---- EVENT_NO NUMBER EPICENTER_LATITUDE NUMBER EPICENTER_LONGITUDE NUMBER EVENT_MAGNITUDE NUMBER EVENT_TIME DATE EVENT_WAVE CHAR(1) INSTRUMENT_NAME VARCHAR2(30) SQL> desc RIND_Event Name Null? Type ------------------------------- -------- ---- LOCATION_LAT NUMBER LOCATION_LON NUMBER RICHTER_NUMBER NUMBER DATE_TIME VARCHAR2(30) WAVE_TYPE CHAR(1)
Stop for a moment to examine the similarities and differences between UNSD_Event and RIND_Event. Both tables store information about seismic events; however, UNSD_Event has two extra columns: Event_No and Instrument_Name. Both tables store the epicenter latitude and longitude, the magnitude, and the wave type (P or S); however, UNSD_Event defines Event_Time as a DATE, whereas RIND_Event uses VARCHAR2 for storing the event date and time in Date_Time. If you try to perform a UNION without converting Event_Time in UNSD_Event and Date_Time in RIND_Event to a common datatype, here is what happens:
SQL> select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude, Event_Time, Event_Wave 2 from UNSD_Event 3 UNION 4 select Location_Lat, Location_Lon, Richter_Number, Date_Time, Wave_Type 5 from RIND_Event; select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude, Event_Time, Event_Wave * ERROR at line 1: ORA-01790: expression must have same datatype as corresponding expression
You can normalize the two columns by converting Date_Time in the RIND_Event table to a date by using the TO_DATE function.
SQL> select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude, Event_Time, Event_Wave 2 from UNSD_Event 3 UNION 4 select Location_Lat, Location_Lon, Richter_Number, TO_DATE(Date_Time,'MM-DD-YY HH:MI:SS'), Wave_Type 5 from RIND_Event; EPICENTER_LATITUDE EPICENTER_LONGITUDE EVENT_MAGNITUDE EVENT_TIM E ------------------ ------------------- --------------- --------- - 12.83 189.85 5.8 25-APR-95 P 22.33 233.31 5.9 03-FEB-95 P 23.33 179.11 5.3 10-JAN-95 P 29.84 238.41 6.2 22-MAR-95 S 31.17 208.33 6.6 19-APR-95 S 31.84 241.21 6.1 12-MAR-95 S 37.81 211.84 6.4 11-JAN-95 S 7 rows selected.
If you want to order the results by the Event_Time, simply add this line to the SQL statement:
SQL> select Epicenter_Latitude, Epicenter_Longitude, Event_Magnitude, Event_Time, Event_Wave 2 from UNSD_Event 3 UNION 4 select Location_Lat, Location_Lon, Richter_Number, TO_DATE(Date_Time,'MM-DD-YY HH:MI:SS'), Wave_Type 5 from RIND_Event 6 order by 4; EPICENTER_LATITUDE EPICENTER_LONGITUDE EVENT_MAGNITUDE EVENT_TIM E ------------------ ------------------- --------------- --------- - 23.33 179.11 5.3 10-JAN-95 P 37.81 211.84 6.4 11-JAN-95 S 22.33 233.31 5.9 03-FEB-95 P 31.84 241.21 6.1 12-MAR-95 S 29.84 238.41 6.2 22-MAR-95 S 31.17 208.33 6.6 19-APR-95 S 12.83 189.85 5.8 25-APR-95 P 7 rows selected.
Oracle's implementation of SQL supports a variant of the UNION operator called UNION ALL. The difference between UNION and UNION ALL is this: If the two tables in the union have duplicate rows between them, UNION returns only one instance of a row, regardless of the number of duplicates. UNION ALL, on the other hand, returns all duplicates that exist between the two tables. For example, if you apply the UNION operator on the Houston_Product and Portland_Product tables, the results are
SQL> select Product_ID, Manufacturer_ID, Description 2 from Houston_Product 3 UNION 4 select Product_ID, Manufacturer_ID, Description 5 from Portland_Product 6 order by 1; PRODUCT_ID MANUFAC DESCRIPTION ------------ ------- --------------------------------------------------------- B311 TES801 Pre-amp, 120 W per channel B384 TES801 PREAMP, 460 W/RMS B801 SEN101 PRE AMPLIFIER 150 WATTS/CHANNEL B901 TES801 Preamplifier, 200 W PER CHANNEL B9310 SEN101 Pre amp, 250 W/channel C2002 MIT501 CD PLAYER, SINGLE-DISK C2005 MIT501 5-DISK CD PLAYER C2005 MIT501 50-DISK CD PLAYER C3002 SEN101 JUKEBOX, CD - 100 DISK CAPACITY 9 rows selected.
However, if you use the UNION ALL operator instead of the UNION operator, an additional two rows are returned. The reason is that for Product_IDs 2002 and 3002, the three specified columns--;Product_ID, Manufacturer_ID, and Description--;have the same values in both tables.
SQL> select Product_ID, Manufacturer_ID, Description 2 from Houston_Product 3 UNION ALL 4 select Product_ID, Manufacturer_ID, Description 5 from Portland_Product 6 order by 1; PRODUCT_ID MANUFAC DESCRIPTION ------------ ------- ------------------------------------------------------- B311 TES801 Pre-amp, 120 W per channel B384 TES801 PREAMP, 460 W/RMS B801 SEN101 PRE AMPLIFIER 150 WATTS/CHANNEL B901 TES801 Preamplifier, 200 W PER CHANNEL B9310 SEN101 Pre amp, 250 W/channel C2002 MIT501 CD PLAYER, SINGLE-DISK C2002 MIT501 CD PLAYER, SINGLE-DISK C2005 MIT501 5-DISK CD PLAYER C2005 MIT501 50-DISK CD PLAYER C3002 SEN101 JUKEBOX, CD - 100 DISK CAPACITY C3002 SEN101 JUKEBOX, CD - 100 DISK CAPACITY 11 rows selected.
In addition to the INTERSECT and UNION operators, Oracle also provides the MINUS operator for comparing one set of rows to another set. The syntax for using the MINUS operator resembles the syntax for the UNION operator.
select-stmt1 MINUS select-stmt2 [order-by-clause]
where select-stmt1 and select-stmt2 are valid SELECT statements and order-by-clause is an ORDER BY clause that references the columns by number rather than by name.
The requirements and considerations for using the MINUS operator are essentially the same as those for the INTERSECT and UNION operators. The following example uses the MINUS operator, to compare the contents of two tables: Portland_Product and Houston_Product. The first step is to determine which Product and Manufacturer IDs exist only in Portland_Product.
SQL> select Product_ID, Manufacturer_ID 2 from Portland_Product 3 MINUS 4 select Product_ID, Manufacturer_ID 5 from Houston_Product; no rows selected
As expected, Oracle doesn't return any rows. Again, this result occurs because the Product and Manufacturer IDs in Portland_Product--;C2002 and MIT501, C2005 and MIT501, and C3002 and SEN101--;all exist in Houston_Product. Now, however, exchange the two SELECT statements to obtain the rows that exist in Houston_Product only.
SQL> select Product_ID, Manufacturer_ID 2 from Houston_Product 3 MINUS 4 select Product_ID, Manufacturer_ID 5 from Portland_Product; PRODUCT_ID MANUFAC ------------ ------- B311 TES801 B384 TES801 B801 SEN101 B901 TES801 B9310 SEN101
This example demonstrates that MINUS won't tell you all the differences between the two tables--;it returns only those rows in the first set that can't be found in the second set. In fact, both tables appear to have information about the same Product IDs: C2002, C2005, and C3002. However, if you also retrieve the Description column from both tables, you can see additional differences.
SQL> select Product_ID, Manufacturer_ID, Description 2 from Portland_Product 3 where 4 Product_ID like 'C%' 5 MINUS 6 select Product_ID, Manufacturer_ID, Description 7 from Houston_Product 8 where 9 Product_ID like 'C%'; PRODUCT_ID MANUFAC DESCRIPTION ------------ ------- ---------------------------------------- C2005 MIT501 50-DISK CD PLAYER SQL> select Product_ID, Manufacturer_ID, Description 2 from Houston_Product 3 where 4 Product_ID like 'C%' 5 MINUS 6 select Product_ID, Manufacturer_ID, Description 7 from Portland_Product 8 where 9 Product_ID like 'C%'; PRODUCT_ID MANUFAC DESCRIPTION ------------ ------- ---------------------------------------- C2005 MIT501 5-DISK CD PLAYER
The Product_ID C2005 row in the Portland_Product table describes product C2005 as a 50-DISK CD PLAYER, whereas the description in the equivalent row in the Houston_Product table is 5-DISK CD PLAYER. The MINUS operator only reveals differences between the contents of selected columns of the two tables. Also, as you can see from the previous example, you can specify a WHERE clause in the SELECT statements used by the MINUS operator.
When you're developing a new database application, you'll usually go through a phase of experimenting with various table designs. Whether you use legacy data or artificial data, it's nice to have some simple methods for experimentation. One DDL statement that you might want to use is CREATE TABLE--;with a subquery. The syntax is
CREATE TABLE table-name [(column-definition1, ... , column-definitionN)] AS select-statement
where table-name is the name of the new Oracle table; column-definition1 through column-definitionN are optional column definitions that are used to specify different column names to be associated with the values returned by the subquery; and select-statement is a valid SELECT statement whose select list is used in creating the new table.
In the following example the CREATE TABLE statement creates a new table whose columns are a subset of an existing table:
SQL> create table Repair_Header_2 2 as 3 select Repair_ID, Customer_ID, Employee_ID, Deposit_Amount, Deposit_Method 4 from Repair_Header; Table created. SQL> select count(*) from Repair_Header_2; COUNT(*) --------- 6
Notice that Oracle doesn't tell you how many rows are inserted into the new table. In fact, you can create a new table that is empty by specifying an impossible condition.
SQL> create table Repair_Header_3 2 as 3 select Repair_ID, Customer_ID, Employee_ID, Deposit_Amount, Deposit_Method 4 from Repair_Header 5 where 6 1 = 2; Table created. SQL> select count(*) from Repair_Header_3; COUNT(*) --------- 0
TipOracle rejects the CREATE TABLE statement if the subquery references a LONG or LONG RAW column. If you need to copy rows from a LONG column in one table to another, use the SQL*Plus COPY command. It does support the copying of LONG column values.
The subquery referenced in a CREATE TABLE statement can be a multiple-table join, as shown in this example:
SQL> create table Join_Repair_Header_and_Item 2 as 3 select H.Repair_ID, Customer_ID, Employee_ID, Item_Number, 4 Product_ID, Manufacturer_ID 5 from Repair_Header H, Repair_Item I 6 where 7 H.Repair_ID = I.Repair_ID; Table created. SQL> desc Join_Repair_Header_and_Item Name Null? Type ------------------------------- -------- ---- REPAIR_ID NOT NULL NUMBER(6) CUSTOMER_ID NOT NULL NUMBER(4) EMPLOYEE_ID NOT NULL NUMBER(4) ITEM_NUMBER NOT NULL NUMBER PRODUCT_ID VARCHAR2(12) MANUFACTURER_ID VARCHAR2(6)
NoteThe CREATE TABLE statement copies only one kind of table or column constraint: the NOT NULL constraint. If you want the new table to have all the existing constraints, you'll have to specify them with an ALTER TABLE statement.
If you're willing to experiment, you'll find creative uses for SQL with SQL*Plus. One example is the use of SQL*Plus to generate SQL*Plus scripts. Suppose that you want a general script for counting the rows in all your tables. Unfortunately, SQL doesn't allow the use of a wildcard for specifying table names.
SQL> select count(*) from *; select count(*) from * * ERROR at line 1: RA-00903: invalid table name
To count the number of rows in each of your tables, you'd have to submit a SELECT statement to Oracle for each table.
SQL> select count(*) from Repair_Header; COUNT(*) --------- 6 SQL> select count(*) from Repair_Item; COUNT(*) --------- 6 . . . SQL> select count(*) from Customer; COUNT(*) --------- 15
This process can be quite tedious if you have many tables. To obtain a list of your tables, you would have to query the Oracle data dictionary view USER_TABLES. (You'll find information on other Oracle data dictionary views in Chapter 33, "Oracle Internals.") For now, here is the query you'd use to obtain a list of the tables:
SQL> select table_name from user_tables; TABLE_NAME ------------------------------ ACTION_CODE CONDITION_CODE CUSTOMER CUSTOMER_SUBSET DEPOT DEPOT_ESTIMATE EMPLOYEE ITEM_STATUS_CODE . . .
The next step is to concatenate three strings--;a literal string, Table_Name, and a ; to terminate the statement.
SQL> select 'select count(*) from ' || table_name || ';' 2 from user_tables; 'SELECTCOUNT(*)FROM'||TABLE_NAME||';' ---------------------------------------------------- select count(*) from ACTION_CODE; select count(*) from CONDITION_CODE; select count(*) from CUSTOMER; select count(*) from CUSTOMER_SUBSET; select count(*) from DEPOT; 14 rows selected.
The preceding SELECT statement returns a SELECT statement for each table that the Oracle user owns. You can spool these commands to a file so that SQL*Plus can execute them.
SQL> set feedback off SQL> set heading off SQL> set echo off SQL> spool c:\book\chap14\countem.sql SQL> select 'select count(*) from ' || table_name || ';' 2 from user_tables; select count(*) from ACTION_CODE; select count(*) from CONDITION_CODE; select count(*) from CUSTOMER; select count(*) from CUSTOMER_SUBSET; select count(*) from DEPOT; SQL> set feedback on SQL> set heading on SQL> set echo on SQL> start count_em SQL> SQL> select count(*) from ACTION_CODE; COUNT(*) --------- 0 1 row selected. SQL> select count(*) from CONDITION_CODE; COUNT(*) --------- 9 1 row selected. SQL> select count(*) from CUSTOMER; COUNT(*) --------- 15 1 row selected. SQL> select count(*) from CUSTOMER_SUBSET; COUNT(*) --------- 1 1 row selected. SQL> select count(*) from DEPOT; COUNT(*) --------- 0 1 row selected.
This SQL*Plus script is fairly flexible. Let me explain the significance of each Set command. First, FEEDBACK is turned OFF so that SQL*Plus does not display the number of rows. Next, HEADING is set to OFF so that no column headings become part of the output--;you want only the SELECT statements, nothing more. Finally, ECHO is set to OFF so that the output does not capture the SELECT statement that generates the other SELECT statements. A little experimentation with these settings will show you why they are set to these values.
After the SELECT statements are generated and spooled to a file (c:\book\chap14\count_em.sql), FEEDBACK, HEADING, and ECHO are set to ON so that each SELECT statement appears before its results.
NoteIn the previous SQL*Plus script, the SELECT statement that appears after the spool command won't actually appear because echo has been turned off. I presented the SQL*Plus output in this way to clarify the execution of the SQL*Plus commands and the SELECT statements.
Remember the following concepts: