Now that you understand the Data Definition Language (DDL) portion of SQL, it's time to apply that new knowledge to a lesson on database theory. Today you learn about the concept of data normalization. You develop a working definition of data normalization and learn about the advantages of normalizing your databases. You also explore each of the five rules of data normalization, including reasons for applying these rules. When you have completed today's lesson, you will be able to identify ways to use data normalization to improve database integrity and performance.
Throughout today's lesson, you normalize a real database using the data definition SQL statements you learned about on Day 13 ("Creating Databases with SQL") and Day 15 ("Updating Databases with SQL"), and by using Visual Basic's Visdata application that you learned about in the first week (see Day 7, "Using the Visdata Program").
The topic of data normalization could easily take up an entire book--and there are several excellent books on it. This lesson approaches data normalization from a practical standpoint rather than a theoretical standpoint.Here you focus on two particular questions: What are the rules? How can these rules help me improve my Visual Basic database applications? To start, let's develop a working definition of data normalization and talk about why it can improve your Visual Basic applications.
Data normalization is a process of refining database structures to improve the speed at which data can be accessed and to increase database integrity. This is not easy. Very often, optimizing a table for speed is not the same as optimizing for integrity. Putting a database together involves discovering the data elements you need and then creating a set of tables to hold those elements. The tables and fields you define make up the structure of the database. The structure you decide upon affects the performance of your database programs. Some database layouts can improve access speed. For example, placing all related information in a single table allows your programs to locate all needed data by looking in one place. On the other hand, you can lay out your database in a way that improves data integrity. For example, placing all the invoice line item data in one table and the invoice address information in another table prevents users from deleting complete addresses when they remove invoice line items from the database. Well-normalized databases strike a balance between speed and integrity.
High-speed tables have few index constraints and can have several, sometimes repetitive, fields in a single record. The few constraints make updates, insertions, and deletions faster. The repetitive fields make it easier to load large amounts of data in a single SQL statement instead of finding additional, related data in subsidiary tables linked through those slower index constraints.
Databases built for maximum integrity have many small data tables. Each of these tables can have several indexes--mostly foreign keys referencing other tables in the database. If a table is built with high integrity in mind, it is difficult to add invalid data to the database without firing off database error messages. Of course, all that integrity checking eats precious ticks off the microchip clock.
Good data normalization results in data tables that make sense in a fundamental way. Well-normalized tables are easy to understand when you look at them. It is easy to see what kind of data they store and what types of updates need to be performed. Usually, it is rather easy to create data entry routines and simple reports directly from well-normalized tables. In fact, the rule of thumb is this: If it's hard to work with a data table, it probably needs more normalization work.
For the rest of this lesson, you use the Visdata application to build data tables. If you have not already looked at the lesson on Day 7, turn there first for information on how to use Visdata to maintain relational databases.
To illustrate the process of normalization, let's start with an existing database
table. The database NORMDAT1.MDB can be found in the TYSDBVB5\SOURCE\DATA
directory of the CD that shipped with this book. Load this into the Visdata application
and open the Table1 data table in design mode. Your screen should look something
like the one in Figure 16.1.
Figure
16.1. Displaying Table1 before normalization.
This data table holds information about employees of a small company. The table contains
fields for the employee ID and employee name, and the ID, name, and location of the
department to which this employee is currently assigned. It also includes fields
for tracking the employee's job skills, including the skill code, the name, the department
in which the skill was learned, and the ability level that the employee has attained
for the designated skill. Up to three different skills can be maintained for each
employee.
This table is rather typical of those you find in existing record-oriented databases. It is designed to quickly give users all the available information on a single employee. It is also a fairly simple task to build a data entry form for this data table. The single form can hold the employee and department fields at the top of the form and the three skill field sets toward the bottom of the form. Figure 16.2 shows a simple data form for this table generated by Visdata.
Access to the information in the table is fast and the creation of a data entry
screen is easy. So this is a well-normalized table, right? Wrong. Three of the five
rules of normalization that you learn in the rest of this lesson are broken, and
the other two are in jeopardy! Some of the problems are obvious, some are not. Let's
go through each of the five rules of normalization and see how applying these rules
can improve the data table.
Figure
16.2. The data entry form for Table1.
The first area in which Table1 needs some work is in the repeating skill fields. Why include columns in the data table called SkillCode1, SkillCode2, SkillCode3, or SkillName1, SkillName2, SkillName3, and so forth? You want to be able to store more than one set of skills for an employee, right? But what if you want to store data on more than three skills acquired by a single employee? What if most of the employees only have one or two skills, and very few have three skills? Why waste the blank space for the third skill? Even more vexing, how easy will it be to locate all employees in the data table that have a particular skill?
NOTE: The first rule of data normalization states that you should make a separate table for each set of related columns and give each table a primary key. Databases that adhere to this first rule of normalization are said to be in the First Normal Form.
The first rule of data normalization is to eliminate repeating groups of data in a data table. Repeating groups of data, such as the skill fields (SkillCodeX, SkillNameX, SkillDeptIDX, and SkillLevelX), usually indicates the need for an additional table. Creating the related table greatly improves readability and allows you to keep as few or as many skill sets for each employee as you need without wasting storage space.
The fields that relate to employee skills need to be separated from the others
in the table. You don't need to put all 12 skill fields in the new table, though.
You only need one of each of the unique data fields. The new database now has not
one, but two data tables. One, called Skills, contains only the skill fields. The
other table, called Employees, contains the rest of the fields. Table 16.1 shows
how the two new tables look.
Table 16.1. Eliminating repeating data.
Skills Table | Employees Table |
EmpID | EmpID |
SkillCode | EmpName |
SkillName | DeptID |
SkillDeptID | DeptName |
SkillLevel | DeptLocation |
Each record in the Skills table must contain a valid value in the EmpID field or it should be rejected. In other words, each time a record is added to the Skills table, the value in the EmpID field should be checked against values in the EmpID field of the Employees table. If no match is found, the Skills record must be corrected before it is written to the database. You remember from the discussion of SQL Data Definition Language statements on Day 13 that this is a FOREIGN KEY CONSTRAINT. The field EmpID in the Skills table is a foreign key that references the field EmpID in the Employees table. Also, the EmpID field in the Employees table should be a primary field to make sure that each record in the Employee table has a unique EmpID value.
Now that you know the fields and index constraints you need, you can use SQL DDL to create two new tables. If you have not already done so, start the Visdata application and open the NORMDAT1.MDB database. Now you create two new tables that bring the database into compliance with the first rule of data normalization.
First, create the table that holds all the basic employee data. This table has all the fields that were in the Table1 table, minus the skill fields. Using the information in Table 16.1 as a guide, enter an SQL DDL statement in the SQL window of Visdata that creates the Employees data table. Your SQL statement should resemble Listing 16.1.
CREATE TABLE Employees (EmpID TEXT(5), EmpName TEXT(30), DeptID TEXT(5), DeptName TEXT(20), DeptLocation TEXT(20),
CONSTRAINT PKEmpID PRIMARY KEY (EmpID)); The EmpID field has been designated as a primary key field. This guarantees that no two records in the Employees data table can have the same EmpID value. You can use the EmpID field in the next table you create (the Skills table) as the reference field that links the two tables. Because you are using the EmpID field as a link, it must be a unique value in the Employees table in order to maintain database integrity. What you are doing here is setting up a one-to-many relationship between the Employees table (the one-side) and the Skills table (the many-side). Any time you establish a one-to-many relationship, you must make sure that the reference field (in this case, the EmpID field) is unique on the one-side of the relationship.
Now that you have built the Employees table, you can create the table that holds all the skills data. Use the information in Table 16.1 to write an SQL DDL statement that creates a table called Skills. Make sure the new table has the field EmpID and that the EmpID field is built with the correct index constraint to enforce one-to-many database integrity. Your SQL statement should look like the one in Listing 16.2.
CREATE TABLE Skills (EmpID TEXT(5), SkillCode TEXT(5), SkillName TEXT(20), SkillDeptID TEXT(5), SkillLevel INTEGER, CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID),
CONSTRAINT FKEmpID FOREIGN KEY (EmpID) REFERENCES Employees(EmpID)); You can see in Listing 16.2 that you have used the FOREIGN KEY_REFERENCES syntax to establish and maintain the table relationship. As you remember from the SQL lessons on Day 13 and Day 15, the FOREIGN KEY_REFERENCES syntax makes sure that any entry in the Skills.EmpID field can be found in the related Employees.EmpID field. If users enter a value in the Skills.EmpID field that cannot be found in any Employees.EmpID field, Visual Basic automatically issues a database error message. This message is generated by Visual Basic, not by your program.
That is how you build tables that adhere to the first rule of data normalization.
To see how these tables look when they have live data in them, use Visdata to load
the TYSDBVB5\ SOURCE\DATA\NORMDAT2.MDB database. This database contains
the Employees and Skills tables with data already loaded into them. Figure 16.3 shows
how Visdata displays the two new tables that have live data.
Figure
16.3. The new Employees and Skills tables
from NORMDAT2.MDB.
NOTE: Before continuing with today's lesson, load the NORMDAT2.MDB database into Visdata.
Another aspect of the Skills table also needs attention. Although moving the repeating skills fields into a separate table improves the database, you still have work to do. The Skills table contains redundant data. That is, data is stored in several places in the database. Redundant data in your database can lead to serious database integrity problems. It's best to eliminate as many occurrences of redundant data as possible.
NOTE: The second rule of data normalization states that if a column depends only on part of a multivalued key, you remove it to a separate table. In other words, if you need to fill in two fields in order to truly identify the record (JobID and JobName), but only one of those fields is needed to perform a lookup in the table, you need a new table. Databases that conform to this rule are said to be in the Second Normal Form.
For example, the Skills table includes a field called SkillCode. This field contains a code that identifies the specific skill (or skills) each employee has acquired. If two employees have gained the same skill, that skill appears twice in the Skills file. The same table also includes a field called SkillName. This field contains a meaningful name for the skill represented by the value in the SkillCode field. This name is much more readable and informative than the SkillCode value. In essence, these two fields contain the same data, represented slightly differently. This is the dreaded redundant data you have to eliminate!
Before you jump into fixing things, first review the details regarding redundant data and how it can adversely affect the integrity of your database.
When you keep copies of data elements in several rows in the same table or in several different tables (such as job names to go with job ID codes), you have a lot of work ahead of you when you want to modify the copied data. If you fail to update one or more of these copies, you can ruin the integrity of your database. Redundant data can lead to what are known as update integrity problems.
Imagine that you have built a huge database of employee skills using the tables you built in the preceding section. All is going great when, suddenly, the Human Resources Department informs you that it has designed a new set of names for the existing skill codes. You now have to go through the entire database and update all the records in the Skills table, searching out the old skill name and updating the SkillName field with the new skill name. Because this is an update for the entire data table, you have to shut down the database until the job is complete in order to make sure no one is editing records while you're performing this update. Also, you probably have to change some Visual Basic code that you built to verify the data entry. All in all, it's a nasty job. If that isn't enough, how about a little power outage in the middle of your update run? Now you have some records with the old names, and some with the new names. Things are really messed up!
Although the update integrity problem is annoying, you can suffer through most of those problems. In fact, almost all database programmers have had to face similar problems before. The more troublesome integrity problem resulting from redundant data comes not during updates, but during deletes. Let's assume you have properly handled the mass update required by the Human Resources Department. Then you discover that there is only one employee in the entire database that has the SkillCode S099 (Advanced Customer Service course). No other employee has attained this high level of training. Now, that employee is leaving the organization. When you delete the employee record from the file, you delete the only reference to the Advanced Customer Service course! There is no longer any record of the Advanced Customer Service course in your entire database, which is a real problem.
The way to reduce these kinds of data integrity problems is to pull out the redundant data and place it in a separate table. You need a single table, called SkillMaster, that contains only the SkillCode and the SkillName data fields. This table is linked to the Skills table through the SkillCode field. Now, when the Human Resources department changes the skill names, you only need to update a single record--the one in the SkillMaster table. Because the Skills table is linked to the SkillMaster table, when you delete the employee with the certification for SkillCode S099, you don't delete the last reference to the skill. It's still in the SkillMaster table.
TIP: Another plus to this type of table separation is in speeding data entry. With only one field to enter, and especially a brief code, data entry operators can more quickly fill in fields on the table's form.
Also, you now have a single table that lists all the unique skills that can be acquired by your employees. You can produce a Skills list for employees and managers to review. If you add fields that group the skills by department, you can even produce a report that shows all the skills by department. This would be very difficult if you were stuck with the file structure you developed in the preceding section.
So let's redefine the Skills table and the SkillMaster table to conform to the
second rule of data normalization. Table 16.2 shows the fields you need for the two
tables.
Table 16.2. The field list for the Skills and SkillMaster tables.
EmpSkills | SkillMaster |
EmpID | SkillCode |
SkillCode | SkillName |
SkillDeptID | |
SkillLevel |
Listing 16.3 shows the two SQL DDL statements that create the EmpSkills and the SkillMaster data tables. Note the use of FOREIGN KEY constraints in the EmpSkills table.
CREATE TABLE SkillMaster (SkillCode TEXT(5), SkillName TEXT(20), CONSTRAINT PKSkillMaster PRIMARY KEY (SkillCode)) CREATE TABLE EmpSkills (EmpID TEXT(5), SkillCode TEXT(5), SkillDeptID TEXT(5), SkillLevel INTEGER, CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID), CONSTRAINT FKEmpID2 FOREIGN KEY (EmpID) REFERENCES Employees(EmpID), CONSTRAINT FKSkillCode FOREIGN KEY (SkillCode) REFERENCES SkillMaster(SkillCode));
Use Visdata to add these two new tables to the NORMDAT2.MDB database. The
database TYSDBVB5\SOURCE\DATA\NORMDAT3.MDB contains a complete database
with the data tables Employees, EmpSkills, and SkillMaster fully populated with data.
This is demonstrated in Figure 16.4.
You now have a database that conforms to the first two rules of data normalization.
You have eliminated repeating data and redundant data. You have one more type of
data to eliminate from your tables. You handle that in the following section.
Figure
16.4. The new Employees, EmpSkills, and
SkillMaster tables.
NOTE: Before continuing with the lesson, load the NORMDAT3.MDB database into Visdata.
By now, you're probably getting the idea. You are looking for hints in the table structure that lead you into traps further down the road. Will this table be easy to update? What happens if you delete records from this table? Is it easy to get a comprehensive list of all the unique records in this table? Asking questions like these can uncover problems that are not so apparent when you first build a table.
When you are building a data table, you should also be concerned about whether a field describes additional information about the key field. In other words, is the field you are about to add to this table truly related to the key field? If not, the field in question should not be added to the table. It probably needs to be in its own table. This process of removing fields that do not describe the key field is how you make your data tables conform to the third rule of data normalization--eliminate columns not dependent on keys.
NOTE: The third rule of data normalization states that if a column does not fully describe the index key, that column should be moved to a separate table. In other words, if the columns in your table don't really need to be in this table, they probably need to be somewhere else. Databases that follow this rule are known to be in the Third Normal Form.
In these database examples, you have data describing the various departments in
the company stored in the Employees table. Although the DeptID field is important
to the Employees description (it describes the department to which the employee belongs),
the department-specific data should not be stored with the employee data. Yes, you
need another table. This table should contain only department-specific data and be
linked to the Employees table through the DeptID field. Table 16.3 lists the modified
Employees table and the new Departments table.
Table 16.3. The modified Employees table and the new Departments
table.
Employees | Departments |
EmpID | DeptID |
EmpName | DeptName |
DeptID | DeptLocation |
CREATE TABLE Departments (DeptID TEXT(5), DeptName TEXT(20), DeptLocation TEXT(20), CONSTRAINT PKDeptID PRIMARY KEY (DeptID))
Now alter the Employees table. You need to do two things:
Listing 16.5 contains the SQL DDL statements to create the modified Employees table.
CREATE TABLE Employees (EmpID TEXT(5), EmpName TEXT(30), DeptID TEXT(5), CONSTRAINT PKEmpID PRIMARY KEY (EmpID), CONSTRAINT FKEmpDept FOREIGN KEY (DeptID) REFERENCES Departments(DeptID))
Now you need to modify the EmpSkills table to add the referential integrity check
on the EmpSkills.SkillDeptID field. The new SQL DDL should look like Listing 16.6.
CREATE TABLE EmpSkills2 (EmpID TEXT(5), SkillCode TEXT(5), SkillDeptID TEXT(5), SkillLevel INTEGER, CONSTRAINT PKEmpSkill2 PRIMARY KEY (SkillCode,EmpID), CONSTRAINT FKSkillMast FOREIGNKEY (SkillCode) REFERENCES SkillMaster(SkillCode), CONSTRAINT FKSkillDept FOREIGN KEY (SkillDeptID) REFERENCES Departments(DeptID));
The database NORMDAT4.MDB contains a complete set of tables that conform
to the third rule of data normalization. Use Visdata to load NORMDAT4.MDB
and review the data tables. Attempt to add some data that does not follow the integrity
rules. Try deleting records. This shows you how Visual Basic issues database error
messages when you try to save a record that breaks the referential integrity rules.
The first three rules of data normalization involve the elimination of repeating,
redundant, or unrelated data fields. The last two rules involve isolating multiple
relationships to improve overall database integrity. The first three rules are usually
all that you need to produce well-designed databases. However, there are times when
additional normalization can improve the quality of your database design. In the
next two sections, you learn rules 4 and 5 of data normalization.
Figure
16.5. The Departments table added to NORMDAT4.MDB.
It is important to note here that one of the results of the third rule of data normalization is that you should not store calculated fields in a data table. Calculated fields are fields that contain derived data such as year-to-date totals, a line in the invoice table that contains the totals of several other rows in the invoice table, and so forth. Calculated fields do not describe the primary key. Calculated fields are derived data. It is a bad practice to store derived data in live data tables.
Derived data can easily fall out of sync with the individual rows that make up the total data. What happens if the individual rows that add up to the total are altered or deleted? How do you make sure the row that holds the total is updated each time any line item row is changed? Storing derived data might seem to be faster, but it is not easier. And dealing with derived data opens your database to possible update and delete integrity problems each time a user touches either the prime data rows or the total data rows. Calculated data should not be stored. It should always be computed using the prime data at the time it is needed.
NOTE: Before continuing with this lesson, load the NORMDAT4.MDB database into Visdata.
The fourth rule of data normalization concerns the handling of independent multiple relationships. This rule is applied whenever you have more than one one-to-many relationship on the same data table. The relationship between the Employees table and the EmpSkills table is a one-to-many relationship. There can be many EmpSkills records related to one Employee record. Let's add an additional attribute of employees to create a database that has more than a single one-to-many relationship.
Assume that the Human Resources Department has decided it needs more than just the skill names and skill levels attained for each employee. Human Resources also wants to add the level of education attained by the employee for that skill. For example, if the employee has an accounting skill and has an associate's degree in bookkeeping, Human Resources wants to store the degree information, too. If an employee has been certified as an electrician and works in the Maintenance Department, the Human Resources group wants to know that.
The first thing you might want to do is add a new column to the EmpSkills table--maybe a field called Degree, maybe even a field for YearCompleted. This makes sense because each skill might have an associated education component. It makes sense, but it is not a good idea. What about the employee who is currently working in the Customer Service Department but has an accounting degree? Just because the employee has a degree does not mean that employee has the skills to perform a particular job or is working in a position directly related to his or her degree. The degree and the job skills are independent of each other. Therefore, even though the skills data and the degree data are related, they should be isolated in separate tables and linked through a foreign key relationship.
NOTE: The fourth rule of data normalization dictates that no table can contain two or more one-to-many or many-to-many relationships that are not directly related. In other words, if the data element is important (the college degree) but not directly related to other elements in the record (the customer service rep with an accounting degree), you need to move the college degree element to a new table. Databases that follow this rule are in the Fourth Normal Form.
Table 16.4 shows a sample Training table that can be used to hold the education
information for each employee. Now the Human Resources department can keep track
of education achievements independent of acquired job skills. Note that the EmpID
directly connects the two relationships. If the Training table has only one entry
per employee, the two relationships are a one-to-one relationship between the Employees
table and the Training table, and a one-to-many relationship between the Employees
table and the EmpSkills table. Of course, if any employee has more than one degree,
both relationships become one-to-many.
Table 16.4. The sample training data table.
EmpID
Degree
YearCompleted InstitutionName
Listing 16.7 is a sample SQL DDL statement that creates the Training data table with
the proper relationship constraint. Enter this statement in the SQL window of Visdata
while you have the NORMDAT4.MDB database open. Check your results against
Figure 16.6.
CREATE TABLE Training (EmpID TEXT(5), Degree TEXT(20), YearCompleted INTEGER, InstitutionName TEXT(30), CONSTRAINT PKTraining PRIMARY KEY (EmpID,Degree), CONSTRAINT FKEmpTrn FOREIGN KEY (EmpID) REFERENCES Employees (EmpID))
The database NORMDAT5.MDB contains a complete version of the database normalized
up to the fourth rule of data normalization. Use Visdata to open the database and
review the table structure.
NOTE: Before continuing with the lesson, load the NORMDAT5.MDB database into Visdata.
Figure
16.6. The Training table shows the degree
achievements for the Employees table.
The last remaining rule of data normalization covers the handling of related multiple relationships in a database. Unlike the fourth rule, which deals with independent, one-to-many, multiple relationships, the fifth rule is used to normalize related, many-to-many multiple relationships. Related, many-to-many multiple relationships do not occur frequently in databases. However, when they do come up, these types of data relations can cause a great deal of confusion and hassle when you're normalizing your database. You won't invoke this rule often, but when you do it pays off!
Imagine that the Maintenance Department decides it wants to keep track of all the large equipment used on the shop floor by various departments. It uses this data to keep track of where the equipment is located. The Maintenance Department also wants to keep a list of suppliers for the equipment in cases of repair or replacement. When you were a novice, you might have decided to design a single table that held the department ID, equipment name, and supplier name. But, as I'm sure you have guessed by now, that is not the correct response. What if the Maintenance Department has more than one supplier for the same type of equipment? What if a single supplier provides more than one of the types of equipment used in the plant? What if some departments are restricted in the suppliers they can use to repair or replace their equipment?
NOTE: The fifth rule of data normalization dictates that you should isolate related multiple relationships within a database. In other words, if several complex relationships exist in your database, separate each of the relationships into its own table. Databases that adhere to this rule are known to be in the Fifth Normal Form.
The following list shows the relationships that have been exposed in this example:
Although each of the preceding business rules are simple, putting them all together in the database design is tough. It's the last item that really complicates things. There is more than one way to solve this kind of puzzle. The one suggested here is just one of the many possibilities.
First, you need to expose all the tables that you need to contain the data. The
preceding list describes two one-to-many relationships (department to equipment,
and department to supplier, with restrictions) and one many-to-many relationship
(equipment to supplier, supplier to equipment). Each of those relationships can be
expressed in simple tables. Two additional tables not mentioned, but certainly needed,
are a table of all the equipment in the building (regardless of its location) and
a table of all the suppliers (regardless of their department affiliation). Table
16.5 shows sample field layouts for the required tables. The Equipment and Supplier
tables are shortened in this example. If you were designing these tables for a real
database project, you would add several other fields.
Table 16.5. The Fifth Rule sample data tables.
Equipment | Supplier |
EquipID | SupplierID |
EquipName | SupplierName |
DatePurchased | SupplierAddress |
CREATE TABLE Equipment (EquipID TEXT (10), EquipName TEXT(30), DatePurchased DATE, CONSTRAINT PKEquipID PRIMARY KEY (EquipID)) CREATE TABLE Supplier (SupplierID TEXT (10), SupplierName TEXT(30), SupplierAddress MEMO, CONSTRAINT PKSupplier PRIMARY KEY (SupplierID))
Figure
16.7. Supplier and Equipment tables in
NORMDAT6.MDB.
The next two data tables describe the relationships between Supplier and Equipment
and between Supplier and Departments. You remember that departments can be restricted
to certain suppliers when repairing or replacing equipment. By setting up a table
such as the DeptSupplier table described next, you can easily maintain a list of
valid suppliers for each department. Similarly, as new suppliers are discovered for
equipment, they can be added to the EquipSupplier table. Refer to Table 16.6 for
a sample list of fields.
Table 16.6. EquipSupplier and DeptSupplier tables.
EquipSupplier | DeptSupplier |
EquipID | DeptID |
SupplierID | SupplierID |
CREATE TABLE EquipSupplier (EquipID TEXT(10), SupplierID TEXT(10), CONSTRAINT PKEqSpl PRIMARY KEY (EquipID,SupplierID), CONSTRAINT FKEqSplEquip FOREIGN KEY (EquipID) REFERENCES Equipment(EquipID), CONSTRAINT FKEqSplSupplier FOREIGN KEY (SupplierID) REFERENCES Supplier(SupplierID)) CREATE TABLE DeptSupplier (DeptID TEXT(5), SupplierID TEXT(10), CONSTRAINT PKDeptSpl PRIMARY KEY (DeptID,SupplierID), CONSTRAINT FKDptSplDept FOREIGN KEY (DeptID) REFERENCES Departments(DeptID), CONSTRAINT FKDptSplSupplier FOREIGN KEY (SupplierID) REFERENCES Supplier(SupplierID))
Notice that, in these two tables, the CONSTRAINT definitions are longer
than the field de-finitions. This is common when you begin to use the power database
integrity aspects of SQL databases.
Finally, you need a single table that expresses the Equipment-Supplier-Department
relationship. This table shows which department has which equipment supplied by which
supplier. More importantly, you can build this final table with tight constraints
that enforce all these business rules. Both the Department-Supplier relationship
and the Equipment-Supplier relationship are validated before the record is saved
to the database. This is a powerful data validation tool--all without writing any
Visual Basic code! Table 16.7 and the SQL DDL statement in Listing 16.10 show how
this table can be constructed. See Figure 16.9 to review the results of executing
these statements.
Table 16.7. The Department-Equipment-Supplier data table.
DeptID
EquipID SupplierID
CREATE TABLE DeptEqpSuplr (DeptID TEXT(5), EquipID TEXT(10), SupplierID TEXT(10), CONSTRAINT PFDeptEq PRIMARY KEY (DeptID, EquipID), CONSTRAINT FKEqSupl FOREIGN KEY (EquipID,SupplierID) REFERENCES EquipSupplier(EquipID,SupplierID), CONSTRAINT FKDeptSupl FOREIGN KEY (DeptID,SupplierID) REFERENCES DeptSupplier(DeptID,SupplierID))
The Microsoft Access database NORMDAT6.MDB contains a set of live data for
the tables described in this section. Use Visdata to open the database and review
the table structure. Try adding or deleting records in ways that would break integrity
rules. Notice that none of the last three tables defined (EquipSupplier, DeptSupplier,
and DeptEqpSuplr) allow edits on any existing record. This is because you defined
the primary key as having all the fields in a record. Because you cannot edit a primary
key value, you must first delete the record, and then add the modified version to
the data table.
Figure
16.9. The EquipSupplier, DeptSupplier,
and DeptEqpSuplr tables.
In today's lesson, you learned how to improve database integrity and access speed using the five rules of data normalization. You learned the following five rules: