Teach Yourself SQL in 21 Days, Second Edition

Previous chapterNext chapterContents


- Day 6 -
Joining Tables

Objectives

Today you will learn about joins. This information will enable you to gather and manipulate data across several tables. By the end of the day, you will understand and be able to do the following:

Introduction

One of the most powerful features of SQL is its capability to gather and manipulate data from across several tables. Without this feature you would have to store all the data elements necessary for each application in one table. Without common tables you would need to store the same data in several tables. Imagine having to redesign, rebuild, and repopulate your tables and databases every time your user needed a query with a new piece of information. The JOIN statement of SQL enables you to design smaller, more specific tables that are easier to maintain than larger tables.

Multiple Tables in a Single SELECT Statement

Like Dorothy in The Wizard of Oz, you have had the power to join tables since Day 2, "Introduction to the Query: The SELECT Statement," when you learned about SELECT and FROM. Unlike Dorothy, you don't have to click you heels together three times to perform a join. Use the following two tables, named, cleverly enough, TABLE1 and TABLE2.


NOTE: The queries in today's examples were produced using Borland's ISQL tool. You will notice some differences between these queries and the ones that we used earlier in the book. For example, these queries do not begin with an SQL prompt. Another difference is that ISQL does not require a semicolon at the end of the statement. (The semicolon is optional in ISQL.) But the SQL basics are still the same.

INPUT:
SELECT *
FROM TABLE1
OUTPUT:
ROW        REMARKS
========== =======

row 1      Table 1
row 2      Table 1
row 3      Table 1
row 4      Table 1
row 5      Table 1
row 6      Table 1
INPUT:
SELECT *
FROM TABLE2
OUTPUT:
ROW        REMARKS
========== ========

row 1      table 2
row 2      table 2
row 3      table 2
row 4      table 2
row 5      table 2
row 6      table 2

To join these two tables, type this:

INPUT:
SELECT *
FROM TABLE1,TABLE2
OUTPUT:
ROW        REMARKS    ROW        REMARKS
========== ========== ========== ========

row 1      Table 1    row 1      table 2
row 1      Table 1    row 2      table 2
row 1      Table 1    row 3      table 2
row 1      Table 1    row 4      table 2
row 1      Table 1    row 5      table 2
row 1      Table 1    row 6      table 2
row 2      Table 1    row 1      table 2
row 2      Table 1    row 2      table 2
row 2      Table 1    row 3      table 2
row 2      Table 1    row 4      table 2
row 2      Table 1    row 5      table 2
row 2      Table 1    row 6      table 2
row 3      Table 1    row 1      table 2
row 3      Table 1    row 2      table 2
row 3      Table 1    row 3      table 2
row 3      Table 1    row 4      table 2
row 3      Table 1    row 5      table 2
row 3      Table 1    row 6      table 2
row 4      Table 1    row 1      table 2
row 4      Table 1    row 2      table 2
row 4      Table 1    row 3      table 2
row 4      Table 1    row 4      table 2
row 4      Table 1    row 5      table 2
row 4      Table 1    row 6      table 2
row 5      Table 1    row 1      table 2
row 5      Table 1    row 2      table 2
row 5      Table 1    row 3      table 2
row 5      Table 1    row 4      table 2
row 5      Table 1    row 5      table 2
row 5      Table 1    row 6      table 2
row 6      Table 1    row 1      table 2
row 6      Table 1    row 2      table 2
row 6      Table 1    row 3      table 2
row 6      Table 1    row 4      table 2
row 6      Table 1    row 5      table 2
row 6      Table 1    row 6      table 2

Thirty-six rows! Where did they come from? And what kind of join is this?

ANALYSIS:

A close examination of the result of your first join shows that each row from TABLE1 was added to each row from TABLE2. An extract from this join shows what happened:

OUTPUT:
ROW        REMARKS    ROW        REMARKS
=====     ==========  =========  ========

row 1      Table 1    row 1      table 2
row 1      Table 1    row 2      table 2
row 1      Table 1    row 3      table 2
row 1      Table 1    row 4      table 2
row 1      Table 1    row 5      table 2
row 1      Table 1    row 6      table 2

Notice how each row in TABLE2 was combined with row 1 in TABLE1. Congratulations! You have performed your first join. But what kind of join? An inner join? an outer join? or what? Well, actually this type of join is called a cross-join. A cross-join is not normally as useful as the other joins covered today, but this join does illustrate the basic combining property of all joins: Joins bring tables together.

Suppose you sold parts to bike shops for a living. When you designed your database, you built one big table with all the pertinent columns. Every time you had a new requirement, you added a new column or started a new table with all the old data plus the new data required to create a specific query. Eventually, your database would collapse from its own weight--not a pretty sight. An alternative design, based on a relational model, would have you put all related data into one table. Here's how your customer table would look:

INPUT:
SELECT *
FROM CUSTOMER
OUTPUT:
NAME       ADDRESS    STATE  ZIP        PHONE       REMARKS
========== ========== ====== ========== ========= ==========

TRUE WHEEL 55O HUSKER NE     58702      555-4545    NONE
BIKE SPEC  CPT SHRIVE LA     45678      555-1234    NONE
LE SHOPPE  HOMETOWN   KS     54678      555-1278    NONE
AAA BIKE   10 OLDTOWN NE     56784      555-3421    JOHN-MGR
JACKS BIKE 24 EGLIN   FL     34567      555-2314    NONE
ANALYSIS:

This table contains all the information you need to describe your customers. The items you sold would go into another table:

INPUT:
SELECT *
FROM PART
OUTPUT:
    PARTNUM DESCRIPTION                PRICE
=========== ==================== ===========

         54 PEDALS                     54.25
         42 SEATS                      24.50
         46 TIRES                      15.25
         23 MOUNTAIN BIKE             350.45
         76 ROAD BIKE                 530.00
         10 TANDEM                   1200.00

And the orders you take would have their own table:

INPUT:
SELECT *
FROM ORDERS
OUTPUT:
  ORDEREDON NAME           PARTNUM    QUANTITY REMARKS
=========== ========== =========== =========== =======

15-MAY-1996 TRUE WHEEL          23           6 PAID
19-MAY-1996 TRUE WHEEL          76           3 PAID
 2-SEP-1996 TRUE WHEEL          10           1 PAID
30-JUN-1996 TRUE WHEEL          42           8 PAID
30-JUN-1996 BIKE SPEC           54          10 PAID
30-MAY-1996 BIKE SPEC           10           2 PAID
30-MAY-1996 BIKE SPEC           23           8 PAID
17-JAN-1996 BIKE SPEC           76          11 PAID
17-JAN-1996 LE SHOPPE           76           5 PAID
 1-JUN-1996 LE SHOPPE           10           3 PAID
 1-JUN-1996 AAA BIKE            10           1 PAID
 1-JUL-1996 AAA BIKE            76           4 PAID
 1-JUL-1996 AAA BIKE            46          14 PAID
11-JUL-1996 JACKS BIKE          76          14 PAID

One advantage of this approach is that you can have three specialized people or departments responsible for maintaining their own data. You don't need a database administrator who is conversant with all aspects of your project to shepherd one gigantic, multidepartmental database. Another advantage is that in the age of networks, each table could reside on a different machine. People who understand the data could maintain it, and it could reside on an appropriate machine (rather than that nasty corporate mainframe protected by legions of system administrators).

Now join PARTS and ORDERS:

INPUT/OUTPUT:
SELECT  O.ORDEREDON, O.NAME, O.PARTNUM,
P.PARTNUM, P.DESCRIPTION
FROM ORDERS O, PART P

  ORDEREDON NAME           PARTNUM     PARTNUM DESCRIPTION
=========== ========== ===========   ========= ============

15-MAY-1996 TRUE WHEEL          23          54 PEDALS
19-MAY-1996 TRUE WHEEL          76          54 PEDALS
 2-SEP-1996 TRUE WHEEL          10          54 PEDALS
30-JUN-1996 TRUE WHEEL          42          54 PEDALS
30-JUN-1996 BIKE SPEC           54          54 PEDALS
30-MAY-1996 BIKE SPEC           10          54 PEDALS
30-MAY-1996 BIKE SPEC           23          54 PEDALS
17-JAN-1996 BIKE SPEC           76          54 PEDALS
17-JAN-1996 LE SHOPPE           76          54 PEDALS
 1-JUN-1996 LE SHOPPE           10          54 PEDALS
 1-JUN-1996 AAA BIKE            10          54 PEDALS
 1-JUL-1996 AAA BIKE            76          54 PEDALS
 1-JUL-1996 AAA BIKE            46          54 PEDALS
11-JUL-1996 JACKS BIKE          76          54 PEDALS
...
ANALYSIS:

The preceding code is just a portion of the result set. The actual set is 14 (number of rows in ORDERS) x 6 (number of rows in PART), or 84 rows. It is similar to the result from joining TABLE1 and TABLE2 earlier today, and it is still one statement shy of being useful. Before we reveal that statement, we need to regress a little and talk about another use for the alias.

Finding the Correct Column

When you joined TABLE1 and TABLE2, you used SELECT *, which returned all the columns in both tables. In joining ORDERS to PART, the SELECT statement is a bit more complicated:

SELECT  O.ORDEREDON, O.NAME, O.PARTNUM,
P.PARTNUM, P.DESCRIPTION

SQL is smart enough to know that ORDEREDON and NAME exist only in ORDERS and that DESCRIPTION exists only in PART, but what about PARTNUM, which exists in both? If you have a column that has the same name in two tables, you must use an alias in your SELECT clause to specify which column you want to display. A common technique is to assign a single character to each table, as you did in the FROM clause:

FROM ORDERS O, PART P

You use that character with each column name, as you did in the preceding SELECT clause. The SELECT clause could also be written like this:

SELECT  ORDEREDON, NAME, O.PARTNUM, P.PARTNUM, DESCRIPTION

But remember, someday you might have to come back and maintain this query. It doesn't hurt to make it more readable. Now back to the missing statement.

Equi-Joins

An extract from the PART/ORDERS join provides a clue as to what is missing:

30-JUN-1996 TRUE WHEEL          42          54 PEDALS
30-JUN-1996 BIKE SPEC           54          54 PEDALS
30-MAY-1996 BIKE SPEC           10          54 PEDALS

Notice the PARTNUM fields that are common to both tables. What if you wrote the following?

INPUT:
SELECT  O.ORDEREDON, O.NAME, O.PARTNUM,
P.PARTNUM, P.DESCRIPTION
FROM ORDERS O, PART P
WHERE O.PARTNUM = P.PARTNUM
OUTPUT:
  ORDEREDON NAME           PARTNUM     PARTNUM DESCRIPTION
=========== ========== ===========   ========= ==============

 1-JUN-1996 AAA BIKE            10          10 TANDEM
30-MAY-1996 BIKE SPEC           10          10 TANDEM
 2-SEP-1996 TRUE WHEEL          10          10 TANDEM
 1-JUN-1996 LE SHOPPE           10          10 TANDEM
30-MAY-1996 BIKE SPEC           23          23 MOUNTAIN BIKE
15-MAY-1996 TRUE WHEEL          23          23 MOUNTAIN BIKE
30-JUN-1996 TRUE WHEEL          42          42 SEATS
 1-JUL-1996 AAA BIKE            46          46 TIRES
30-JUN-1996 BIKE SPEC           54          54 PEDALS
 1-JUL-1996 AAA BIKE            76          76 ROAD BIKE
17-JAN-1996 BIKE SPEC           76          76 ROAD BIKE
19-MAY-1996 TRUE WHEEL          76          76 ROAD BIKE
11-JUL-1996 JACKS BIKE          76          76 ROAD BIKE
17-JAN-1996 LE SHOPPE           76          76 ROAD BIKE
ANALYSIS:

Using the column PARTNUM that exists in both of the preceding tables, you have just combined the information you had stored in the ORDERS table with information from the PART table to show a description of the parts the bike shops have ordered from you. The join that was used is called an equi-join because the goal is to match the values of a column in one table to the corresponding values in the second table.

You can further qualify this query by adding more conditions in the WHERE clause. For example:

INPUT/OUTPUT:
SELECT  O.ORDEREDON, O.NAME, O.PARTNUM,
P.PARTNUM, P.DESCRIPTION
FROM ORDERS O, PART P
WHERE O.PARTNUM = P.PARTNUM
AND O.PARTNUM = 76

  ORDEREDON NAME           PARTNUM     PARTNUM DESCRIPTION
=========== ========== ===========  ========== ============

 1-JUL-1996 AAA BIKE            76          76 ROAD BIKE
17-JAN-1996 BIKE SPEC           76          76 ROAD BIKE
19-MAY-1996 TRUE WHEEL          76          76 ROAD BIKE
11-JUL-1996 JACKS BIKE          76          76 ROAD BIKE
17-JAN-1996 LE SHOPPE           76          76 ROAD BIKE

The number 76 is not very descriptive, and you wouldn't want your sales people to have to memorize a part number. (We have had the misfortune to see many data information systems in the field that require the end user to know some obscure code for something that had a perfectly good name. Please don't write one of those!) Here's another way to write the query:

INPUT/OUTPUT:
SELECT  O.ORDEREDON, O.NAME, O.PARTNUM,
P.PARTNUM, P.DESCRIPTION
FROM ORDERS O, PART P
WHERE O.PARTNUM = P.PARTNUM
AND P.DESCRIPTION = 'ROAD BIKE'

  ORDEREDON NAME           PARTNUM     PARTNUM DESCRIPTION
=========== ========== ===========  ========== ============

 1-JUL-1996 AAA BIKE            76          76 ROAD BIKE
17-JAN-1996 BIKE SPEC           76          76 ROAD BIKE
19-MAY-1996 TRUE WHEEL          76          76 ROAD BIKE
11-JUL-1996 JACKS BIKE          76          76 ROAD BIKE
17-JAN-1996 LE SHOPPE           76          76 ROAD BIKE

Along the same line, take a look at two more tables to see how they can be joined. In this example the employee_id column should obviously be unique. You could have employees with the same name, they could work in the same department, and earn the same salary. However, each employee would have his or her own employee_id. To join these two tables, you would use the employee_id column.

EMPLOYEE_TABLE EMPLOYEE_PAY_TABLE
employee_id employee_id
last_name salary
first_name department
middle_name supervisor
  marital_status

INPUT:
SELECT E.EMPLOYEE_ID, E.LAST_NAME, EP.SALARY
FROM EMPLOYEE_TBL E,
     EMPLOYEE_PAY_TBL EP
WHERE E.EMPLOYEE_ID = EP.EMPLOYEE_ID
  AND E.LAST_NAME = 'SMITH';
OUTPUT:
E.EMPLOYEE_ID  E.LAST_NAME  EP.SALARY
=============  ===========  =========
        13245  SMITH         35000.00


TIP: When you join two tables without the use of a WHERE clause, you are performing a Cartesian join. This join combines all rows from all the tables in the FROM clause. If each table has 200 rows, then you will end up with 40,000 rows in your results (200 x 200). Always join your tables in the WHERE clause unless you have a real need to join all the rows of all the selected tables.

Back to the original tables. Now you are ready to use all this information about joins to do something really useful: finding out how much money you have made from selling road bikes:

INPUT/OUTPUT:
SELECT SUM(O.QUANTITY * P.PRICE) TOTAL
FROM ORDERS O, PART P
WHERE O.PARTNUM = P.PARTNUM
AND P.DESCRIPTION = 'ROAD BIKE'

      TOTAL
===========

   19610.00
ANALYSIS:

With this setup, the sales people can keep the ORDERS table updated, the production department can keep the PART table current, and you can find your bottom line without redesigning your database.


NOTE: Notice the consistent use of table and column aliases in the SQL statement examples. You will save many, many keystrokes by using aliases. They also help to make your statement more readable.

Can you join more than one table? For example, to generate information to send out an invoice, you could type this statement:

INPUT/OUTPUT:
SELECT C.NAME, C.ADDRESS, (O.QUANTITY * P.PRICE) TOTAL
FROM ORDER O, PART P, CUSTOMER C
WHERE O.PARTNUM = P.PARTNUM
AND O.NAME = C.NAME

NAME       ADDRESS          TOTAL
========== ========== ===========

TRUE WHEEL 55O HUSKER     1200.00
BIKE SPEC  CPT SHRIVE     2400.00
LE SHOPPE  HOMETOWN       3600.00
AAA BIKE   10 OLDTOWN     1200.00
TRUE WHEEL 55O HUSKER     2102.70
BIKE SPEC  CPT SHRIVE     2803.60
TRUE WHEEL 55O HUSKER      196.00
AAA BIKE   10 OLDTOWN      213.50
BIKE SPEC  CPT SHRIVE      542.50
TRUE WHEEL 55O HUSKER     1590.00
BIKE SPEC  CPT SHRIVE     5830.00
JACKS BIKE 24 EGLIN       7420.00
LE SHOPPE  HOMETOWN       2650.00
AAA BIKE   10 OLDTOWN     2120.00

You could make the output more readable by writing the statement like this:

INPUT/OUTPUT:
SELECT C.NAME, C.ADDRESS,
O.QUANTITY * P.PRICE TOTAL
FROM ORDERS O, PART P, CUSTOMER C
WHERE O.PARTNUM = P.PARTNUM
AND O.NAME = C.NAME
ORDER BY C.NAME

NAME       ADDRESS          TOTAL
========== ========== ===========

AAA BIKE   10 OLDTOWN      213.50
AAA BIKE   10 OLDTOWN     2120.00
AAA BIKE   10 OLDTOWN     1200.00
BIKE SPEC  CPT SHRIVE      542.50
BIKE SPEC  CPT SHRIVE     2803.60
BIKE SPEC  CPT SHRIVE     5830.00
BIKE SPEC  CPT SHRIVE     2400.00
JACKS BIKE 24 EGLIN       7420.00
LE SHOPPE  HOMETOWN       2650.00
LE SHOPPE  HOMETOWN       3600.00
TRUE WHEEL 55O HUSKER      196.00
TRUE WHEEL 55O HUSKER     2102.70
TRUE WHEEL 55O HUSKER     1590.00
TRUE WHEEL 55O HUSKER     1200.00


NOTE: Notice that when joining the three tables (ORDERS, PART, and CUSTOMER) that the ORDERS table was used in two joins and the other tables were used only once. Tables that will return the fewest rows with the given conditions are commonly referred to as driving tables, or base tables. Tables other than the base table in a query are usually joined to the base table for more efficient data retrieval. Consequently, the ORDERS table is the base table in this example. In most databases a few base tables join (either directly or indirectly) all the other tables. (See Day 15, "Streamlining SQL Statements for Improved Performance," for more on base tables.)

You can make the previous query more specific, thus more useful, by adding the DESCRIPTION column as in the following example:

INPUT/OUTPUT:
SELECT C.NAME, C.ADDRESS,
O.QUANTITY * P.PRICE TOTAL,
P.DESCRIPTION
FROM ORDERS O, PART P, CUSTOMER C
WHERE O.PARTNUM = P.PARTNUM
AND O.NAME = C.NAME
ORDER BY C.NAME

NAME       ADDRESS          TOTAL DESCRIPTION
========== ========== =========== ==============

AAA BIKE   10 OLDTOWN      213.50 TIRES
AAA BIKE   10 OLDTOWN     2120.00 ROAD BIKE
AAA BIKE   10 OLDTOWN     1200.00 TANDEM
BIKE SPEC  CPT SHRIVE      542.50 PEDALS
BIKE SPEC  CPT SHRIVE     2803.60 MOUNTAIN BIKE
BIKE SPEC  CPT SHRIVE     5830.00 ROAD BIKE
BIKE SPEC  CPT SHRIVE     2400.00 TANDEM
JACKS BIKE 24 EGLIN       7420.00 ROAD BIKE
LE SHOPPE  HOMETOWN       2650.00 ROAD BIKE
LE SHOPPE  HOMETOWN       3600.00 TANDEM
TRUE WHEEL 55O HUSKER      196.00 SEATS
TRUE WHEEL 55O HUSKER     2102.70 MOUNTAIN BIKE
TRUE WHEEL 55O HUSKER     1590.00 ROAD BIKE
TRUE WHEEL 55O HUSKER     1200.00 TANDEM
ANALYSIS:

This information is a result of joining three tables. You can now use this information to create an invoice.


NOTE: In the example at the beginning of the day, SQL grouped TABLE1 and TABLE2 to create a new table with X (rows in TABLE1) x Y (rows in TABLE2) number of rows. A physical table is not created by the join, but rather in a virtual sense. The join between the two tables produces a new set that meets all conditions in the WHERE clause, including the join itself. The SELECT statement has reduced the number of rows displayed, but to evaluate the WHERE clause SQL still creates all the possible rows. The sample tables in today's examples have only a handful of rows. Your actual data may have thousands of rows. If you are working on a platform with lots of horsepower, using a multiple-table join might not visibly affect performance. However, if you are working in a slower environment, joins could cause a significant slowdown.

We aren't telling you not to use joins, because you have seen the advantages to be gained from a relational design. Just be aware of the platform you are using and your customer's requirements for speed versus reliability.


Non-Equi-Joins

Because SQL supports an equi-join, you might assume that SQL also has a non-equi-join. You would be right! Whereas the equi-join uses an = sign in the WHERE statement, the non-equi-join uses everything but an = sign. For example:

INPUT:
SELECT O.NAME, O.PARTNUM, P.PARTNUM,
O.QUANTITY * P.PRICE TOTAL
FROM ORDERS O, PART P
WHERE O.PARTNUM > P.PARTNUM
OUTPUT:
NAME           PARTNUM     PARTNUM       TOTAL
========== =========== =========== ===========

TRUE WHEEL          76          54      162.75
BIKE SPEC           76          54      596.75
LE SHOPPE           76          54      271.25
AAA BIKE            76          54      217.00
JACKS BIKE          76          54      759.50
TRUE WHEEL          76          42       73.50
BIKE SPEC           54          42      245.00
BIKE SPEC           76          42      269.50
LE SHOPPE           76          42      122.50
AAA BIKE            76          42       98.00
AAA BIKE            46          42      343.00
JACKS BIKE          76          42      343.00
TRUE WHEEL          76          46       45.75
BIKE SPEC           54          46      152.50
BIKE SPEC           76          46      167.75
LE SHOPPE           76          46       76.25
AAA BIKE            76          46       61.00
JACKS BIKE          76          46      213.50
TRUE WHEEL          76          23     1051.35
TRUE WHEEL          42          23     2803.60
...
ANALYSIS:

This listing goes on to describe all the rows in the join WHERE O.PARTNUM > P.PARTNUM. In the context of your bicycle shop, this information doesn't have much meaning, and in the real world the equi-join is far more common than the non-equi-join. However, you may encounter an application in which a non-equi-join produces the perfect result.

Outer Joins versus Inner Joins

Just as the non-equi-join balances the equi-join, an outer join complements the inner join. An inner join is where the rows of the tables are combined with each other, producing a number of new rows equal to the product of the number of rows in each table. Also, the inner join uses these rows to determine the result of the WHERE clause. An outer join groups the two tables in a slightly different way. Using the PART and ORDERS tables from the previous examples, perform the following inner join:

INPUT:
SELECT P.PARTNUM, P.DESCRIPTION,P.PRICE,
O.NAME, O.PARTNUM
FROM PART P
JOIN ORDERS O ON ORDERS.PARTNUM = 54
OUTPUT:
PARTNUM DESCRIPTION                PRICE NAME           PARTNUM
======= ==================== =========== ========== ===========

     54 PEDALS                     54.25 BIKE SPEC           54
     42 SEATS                      24.50 BIKE SPEC           54
     46 TIRES                      15.25 BIKE SPEC           54
     23 MOUNTAIN BIKE             350.45 BIKE SPEC           54
     76 ROAD BIKE                 530.00 BIKE SPEC           54
     10 TANDEM                   1200.00 BIKE SPEC           54


NOTE: The syntax you used to get this join--JOIN ON--is not ANSI standard. The implementation you used for this example has additional syntax. You are using it here to specify an inner and an outer join. Most implementations of SQL have similar extensions. Notice the absence of the WHERE clause in this type of join.
ANALYSIS:

The result is that all the rows in PART are spliced on to specific rows in ORDERS where the column PARTNUM is 54. Here's a RIGHT OUTER JOIN statement:

INPUT/OUTPUT:
SELECT P.PARTNUM, P.DESCRIPTION,P.PRICE,
O.NAME, O.PARTNUM
FROM PART P
RIGHT OUTER JOIN ORDERS O ON ORDERS.PARTNUM = 54

PARTNUM DESCRIPTION          PRICE   NAME           PARTNUM
======= ==================== ======= ============== =======

 <null> <null>                <null> TRUE WHEEL          23
 <null> <null>                <null> TRUE WHEEL          76
 <null> <null>                <null> TRUE WHEEL          10
 <null> <null>                <null> TRUE WHEEL          42
     54 PEDALS                 54.25 BIKE SPEC           54
     42 SEATS                  24.50 BIKE SPEC           54
     46 TIRES                  15.25 BIKE SPEC           54
     23 MOUNTAIN BIKE         350.45 BIKE SPEC           54
     76 ROAD BIKE             530.00 BIKE SPEC           54
     10 TANDEM               1200.00 BIKE SPEC           54
 <null> <null>                <null> BIKE SPEC           10
 <null> <null>                <null> BIKE SPEC           23
 <null> <null>                <null> BIKE SPEC           76
 <null> <null>                <null> LE SHOPPE           76
 <null> <null>                <null> LE SHOPPE           10
 <null> <null>                <null> AAA BIKE            10
 <null> <null>                <null> AAA BIKE            76
 <null> <null>                <null> AAA BIKE            46
 <null> <null>                <null> JACKS BIKE          76
ANALYSIS:

This type of query is new. First you specified a RIGHT OUTER JOIN, which caused SQL to return a full set of the right table, ORDERS, and to place nulls in the fields where ORDERS.PARTNUM <> 54. Following is a LEFT OUTER JOIN statement:

INPUT/OUTPUT:
SELECT P.PARTNUM, P.DESCRIPTION,P.PRICE,
O.NAME, O.PARTNUM
FROM PART P
LEFT OUTER JOIN ORDERS O ON ORDERS.PARTNUM = 54

PARTNUM DESCRIPTION              PRICE NAME           PARTNUM
======= ================== =========== ========== ===========

     54 PEDALS                   54.25 BIKE SPEC           54
     42 SEATS                    24.50 BIKE SPEC           54
     46 TIRES                    15.25 BIKE SPEC           54
     23 MOUNTAIN BIKE           350.45 BIKE SPEC           54
     76 ROAD BIKE               530.00 BIKE SPEC           54
     10 TANDEM                 1200.00 BIKE SPEC           54
ANALYSIS:

You get the same six rows as the INNER JOIN. Because you specified LEFT (the LEFT table), PART determined the number of rows you would return. Because PART is smaller than ORDERS, SQL saw no need to pad those other fields with blanks.

Don't worry too much about inner and outer joins. Most SQL products determine the optimum JOIN for your query. In fact, if you are placing your query into a stored procedure (or using it inside a program (both stored procedures and Embedded SQL covered on Day 13, "Advanced SQL Topics"), you should not specify a join type even if your SQL implementation provides the proper syntax. If you do specify a join type, the optimizer chooses your way instead of the optimum way.

Some implementations of SQL use the + sign instead of an OUTER JOIN statement. The + simply means "Show me everything even if something is missing." Here's the syntax:

SYNTAX:
SQL> select e.name, e.employee_id, ep.salary,
            ep.marital_status
     from e,ployee_tbl e,
          employee_pay_tbl ep
     where e.employee_id = ep.employee_id(+)
      and e.name like '%MITH';
ANALYSIS:

This statement is joining the two tables. The + sign on the ep.employee_id column will return all rows even if they are empty.

Joining a Table to Itself

Today's final topic is the often-used technique of joining a table to itself. The syntax of this operation is similar to joining two tables. For example, to join table TABLE1 to itself, type this:

INPUT:
SELECT *
FROM TABLE1, TABLE1
OUTPUT:
ROW        REMARKS    ROW        REMARKS
========== ========== ========== ========

row 1      Table 1    row 1      Table 1
row 1      Table 1    row 2      Table 1
row 1      Table 1    row 3      Table 1
row 1      Table 1    row 4      Table 1
row 1      Table 1    row 5      Table 1
row 1      Table 1    row 6      Table 1
row 2      Table 1    row 1      Table 1
row 2      Table 1    row 2      Table 1
row 2      Table 1    row 3      Table 1
row 2      Table 1    row 4      Table 1
row 2      Table 1    row 5      Table 1
row 2      Table 1    row 6      Table 1
row 3      Table 1    row 1      Table 1
row 3      Table 1    row 2      Table 1
row 3      Table 1    row 3      Table 1
row 3      Table 1    row 4      Table 1
row 3      Table 1    row 5      Table 1
row 3      Table 1    row 6      Table 1
row 4      Table 1    row 1      Table 1
row 4      Table 1    row 2      Table 1
...
ANALYSIS:

In its complete form, this join produces the same number of combinations as joining two 6-row tables. This type of join could be useful to check the internal consistency of data. What would happen if someone fell asleep in the production department and entered a new part with a PARTNUM that already existed? That would be bad news for everybody: Invoices would be wrong; your application would probably blow up; and in general you would be in for a very bad time. And the cause of all your problems would be the duplicate PARTNUM in the following table:

INPUT/OUTPUT:
SELECT * FROM PART

    PARTNUM DESCRIPTION                PRICE
=========== ==================== ===========

         54 PEDALS                     54.25
         42 SEATS                      24.50
         46 TIRES                      15.25
         23 MOUNTAIN BIKE             350.45
         76 ROAD BIKE                 530.00
         10 TANDEM                   1200.00
         76 CLIPPLESS SHOE             65.00 <-NOTE SAME #

You saved your company from this bad situation by checking PART before anyone used it:

INPUT/OUTPUT:
SELECT F.PARTNUM, F.DESCRIPTION,
S.PARTNUM,S.DESCRIPTION
FROM PART F, PART S
WHERE F.PARTNUM = S.PARTNUM
AND F.DESCRIPTION <> S.DESCRIPTION

   PARTNUM DESCRIPTION              PARTNUM DESCRIPTION
========== ======================== ======= ============

        76 ROAD BIKE                     76 CLIPPLESS SHOE
        76 CLIPPLESS SHOE                76 ROAD BIKE
ANALYSIS:

Now you are a hero until someone asks why the table has only two entries. You, remembering what you have learned about JOINs, retain your hero status by explaining how the join produced two rows that satisfied the condition WHERE F.PARTNUM = S.PARTNUM AND F.DESCRIPTION <> S.DESCRIPTION. Of course, at some point, the row of data containing the duplicate PARTNUM would have to be corrected.

Summary

Today you learned that a join combines all possible combinations of rows present in the selected tables. These new rows are then available for selection based on the information that you want.

Congratulations--you have learned almost everything there is to know about the SELECT clause. The one remaining item, subqueries, is covered tomorrow (Day 7, "Subqueries: The Embedded SELECT Statement").

Q&A

Q Why cover outer, inner, left, and right joins when I probably won't ever use them?

A A little knowledge is a dangerous thing, and no knowledge can be expensive. You now know enough to understand the basics of what your SQL engine might try while optimizing you queries.

Q How many tables can you join on?

A That depends on the implementation. Some implementations have a 25-table limit, whereas others have no limit. Just remember, the more tables you join on, the slower the response time will be. To be safe, check your implementation to find out the maximum number of tables allowed in a query.

Q Would it be fair to say that when tables are joined, they actually become one table?

A Very simply put, that is just about what happens. When you join the tables, you can select from any of the columns in either table.

Workshop

The Workshop provides quiz questions to help solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix F, "Answers to Quizzes and Exercises."

Quiz

1. How many rows would a two-table join produce if one table had 50,000 rows and the other had 100,000?

2. What type of join appears in the following SELECT statement?

 select e.name, e.employee_id, ep.salary
 from employee_tbl e,
      employee_pay_tbl ep
 where e.employee_id = ep.employee_id;
3. Will the following SELECT statements work?

a. select name, employee_id, salary
from employee_tbl e,
employee_pay_tbl ep
where employee_id = employee_id
and name like '%MITH';

b. select e.name, e.employee_id, ep.salary
from employee_tbl e,
employee_pay_tbl ep
where name like '%MITH';

c. select e.name, e.employee_id, ep.salary
from employee_tbl e,
employee_pay_tbl ep
where e.employee_id = ep.employee_id
and e.name like '%MITH';

4. In the WHERE clause, when joining the tables, should you do the join first or the conditions?

5. In joining tables are you limited to one-column joins, or can you join on more than one column?

Exercises

1. In the section on joining tables to themselves, the last example returned two combinations. Rewrite the query so only one entry comes up for each redundant part number.

2. Rewrite the following query to make it more readable and shorter.

INPUT:
      select orders.orderedon, orders.name, part.partnum,
               part.price, part.description from orders, part
               where orders.partnum = part.partnum and orders.orderedon
               between '1-SEP-96' and '30-SEP-96'
               order by part.partnum;
3. From the PART table and the ORDERS table, make up a query that will return the following:
OUTPUT:



ORDEREDON             NAME               PARTNUM     QUANTITY
==================    ================== =======     ========

2-SEP-96              TRUE WHEEL              10            1


Previous chapterNext chapterContents


© Copyright, Macmillan Computer Publishing. All rights reserved.