When you build a house, you need a blueprint to tell you what the finished product will look like and to define the steps of construction. Without the blueprint, the results of the construction effort will not yield the desired results. Building a computer program is the same way: you need a good design if you are obtain a good final product. This is especially important in building database applications.
In designing a database application, not only must you set up the program's routines for maximum performance, you must also pay attention to the physical and logical layout of the data storage as well. A good database design provides minimum search times when locating specific records. It also stores the data in the most efficient manner possible to keep the database from growing too large. It makes data updates as easy as possible. A good design should also be flexible enough to allow inclusion of new functions required of the program.
This chapter covers the basics of good database design and points out the key areas for making your database application work as well as possible. This chapter is not an exhaustive discussion—several entire books have been written on the subject of database design. However, the information presented here provides you with what you need to know to begin writing database applications.
This chapter uses several examples of "databases." It also presents a sample case that is used in other chapters of this section as you work through designing and developing a database application.
In this chapter, you learn the following:
In the broadest definition, a database is a collection of information, usually organized in a particular order. An familiar example of a database is a phone book. It is a collection of names, addresses, and phone numbers organized in alphabetical order. As many people know, some phone books list phone numbers by street address. This presentation contains the same data as the original phone book, but presents it in a different order.
A phone book is an example of a “flat file” database, a where a single record stores all the information for each entry, and a single table contains the records. This is in contrast to relational databases, where the information is stored in multiple tables that are related by key fields.
A relational database management system (RDBMS) is used to store information in a manner that allows people to look at it in different ways. An RDBMS consists of a database, tables, records, fields, indexes, queries, and views. Table 1.1 defines the key elements of a database.
Table 1.1 Elements of a Relational Database Management System
Element | Description |
Database | A group of data tables that contain related information. Note that a database may consist of only a single table. |
Table | A group of data records, each containing the same type of information. In the example of the phone book, the book itself is a data table. |
Record | A single entry in a table; the entry consists of a number of data fields. In a phone book, a record is one of the single-line entries. |
Field | A specific item of data contained in a record. In a phone book, at least four fields can be identified: last name, first name, address, and phone number. |
Index | A special type of table that contains the values of a key field or fields (defined by the user) and pointers to the location of the actual record. These values and pointers are stored in a specific order (again defined by the user) and may be used to present the data in the database in that order. For the phone book example, one index may be used to sort the information by last name and first name; another index may be used to sort the information by street address. If you want, you can also create an index to sort the information by phone number. |
Query | A SQL command designed to retrieve a certain group of records from one or more tables or to perform an operation on a table. Although SQL commands can be executed directly from a program, a query allows you to name the command and store it in the database itself—useful if the SQL commands are used often, as are commands that retrieve records for a specific monthly report. When a query is stored in the database, it is usually compiled. Compiling queries gives your program a performance improvement over just issuing the SQL statement because the database engine does not have to interpret (or parse) the SQL command. |
Filter | A filter is not actually a part of the database but it is used in conjunction with indexes and sort orders to determine what data is displayed or processed. A filter is a condition imposed on the data, such as "Last name starts with M." |
View | A view of the data consists of the number of records seen (or processed) and the order in which they are displayed (or processed). A view is typically controlled by filters and indexes. Two examples of views of phone book data are an alphabetical listing of all people who live on Main Street and the first ten people whose last names start with S. |
Structured Query Language (SQL, pronounced "SEE-kwel") is a set of commands that manipulate information in a database. A single SQL command can replace many lines of traditional program code.
Database applications can be used to handle a wide variety of data processing needs. Most programs require a mix of user interface functions and calculational functions. How each of these functions uses the data in the database impacts the design in areas of table relations, index expressions, and data validation.
Functions in which the user directly manipulates the data in the database are defined as user-intensive functions. An example is an order entry function, in which the user enters a customer's name, address, purchase information, and payment information. The design objective for this type of function is to make the data entry as quick and error free as possible.
Process-intensive functions are those in which data from a table or tables is used to perform a series of calculations, with the results stored in an output table or printed to a report. Process-intensive functions usually require little or no user interaction, other than to possibly start the processes. The design objective for this type of function is to make calculations as quick as possible and to store output data as efficiently as possible for later retrieval and further processing. An example of this is a bank transaction. When you make a withdrawal, the teller enters your account number and the amount. The computer then looks up your account number, verifies that you have sufficient funds, and debits your account for the amount requested. These functions take place without any further interaction from the teller.
To help explain the concepts involved in database design, this chapter uses a sample application. The task is to develop a database system for an aquarium supply store called Triton's Treasures. The owner of Triton's Treasures, Mr. Herman Crabb, wants to handle sales, inventory tracking, special order processing, and a customer mailing list. Mr. Crabb has also decided to expand his business beyond the walk-in retail outlet. He is setting up a mail order and local delivery portion of the business for people who are outside his local area, or are in the area but too busy to come to his shop. The database design must take into account this business expansion.
Modeling the application involves determining the functions to be performed, the inputs and outputs of the application, and the performance requirements of the application. A well-defined application model leads to a good database design.
The first step in designing the application is to determine what the user wants the application to do. This is often called determining the functional specifications. Functional specifications can usually be expressed in terms of an action, such as "produce a report of all items sold during the last week" or "calculate the amortization schedule of a loan given the loan amount, interest rate, and length of the loan." From this type of information, the developer can determine the data necessary to achieve the desired functions.
In designing an application, the information for the functional specifications is gathered by talking to the users of the application. This usually involves not only talking to the key person on the project (the department manager or store owner), but also to others who will work with the application, such as the department employees or store clerks. In some cases, it may also be useful to talk to other people with similar business needs. For example, to learn about data handling for local deliveries, you or Mr. Crabb may want to talk to the person who owns May's Flowers, a floral delivery service.
The easiest place to start determining the functional specifications is frequently with the desired output of the application. Once the outputs are known, you can start constructing the data and processes required to achieve that output.
Often, clients have examples of reports that they currently use, such as the one shown in figure 1.1. Using these reports as a starting point, clients can show you what information must be added to the reports, or request different ways of displaying the information so that they can do their work more efficiently. This kind of investigation provides valuable insight into the needs of the clients and the information they currently have and will need in the future. It also provides the developer with a first look at how the database may have to be organized.
Information for functional specifications can often be obtained from sample reports.
As you may guess, the process of determining the functional specifications for an application is mainly about observation and listening. It is also necessary to see possibilities for providing additional functions that the database can perform and presenting these to the client.
After you determine what functions the application must perform, it is useful to work through each function to determine the processes involved in performing the function. This involves determining where the user input is required, any table lookups that may be needed, calculations to be performed, and outputs to be generated.
Many people use a process flow diagram (also called a flow chart) to show where all of the pieces of the process fit in. Once the processes are defined, this diagram can be used to add the types of data required for each step to produce a data flow diagram. The data flow diagram can help the developer determine the organization of data tables and what indexes (if any) are required for the tables. Figure 1.2 shows a generic process flow diagram.
A process flow diagram aids in the design of the database.
For the sample case, suppose that you have determined that Mr. Crabb must get the following outputs from the system:
To find out more about handling deliveries, Mr Crabb talked to the owner of May's Flowers and found that it would be beneficial to have the delivery list grouped into geographic regions. This arrangement allows the delivery drivers to reduce travel time by making all the deliveries in a specific region before moving on to the next region. Grouping the delivery list can make the drivers' work more efficient—and Mr. Crabb can perceive some real value from the application. To achieve the desired results, you may decide to group the local deliveries by ZIP code (an already defined set of geographic regions) or break down the ZIP-code groups further if a number of businesses are in one particular location, such as a mall or office tower. Figure 1.3 shows the process flow diagram for this task. You must also provide some sort of regional grouping for the mail order portion of the business to help determine shipping charges, because many charges are based on the distance between shipper and receiver.
The process flow diagram for the delivery scheduling task helps isolate the data required to achieve the task.
Part of the design process is defining the data that must be in the database. Of equal importance is determining what data does not have to be kept. Superfluous data can bloat the database and diminish performance.
Using the functional specifications for the application, the developer can begin to define the data that goes into the database. This is done by examining each task to be performed by the application and finding the pieces of data required for that task.
Begin by looking at one of the tasks of the sample case, the scheduling of local deliveries. Each day, the delivery driver needs a report of the locations of each delivery and a list of the items to be delivered to each customer. Obtaining the list of items involves not only printing the order for each customer, but also printing the list by item number to make retrieval from inventory more efficient.
To group the deliveries by geographic region, you need to know what region the customer is in. You also need the customer's phone number and address so that you can notify the customer of the timing of the delivery and get directions to the location if necessary.
As you can see, a lot of different data may be required just to fulfill one task of the application. Fortunately, as you see later, the data required for many tasks can be placed in similar formats and in the same data tables.
Formatting data basically consists of determining whether the item of data is a character, a number, a date, or a logical (yes/no) value. Beyond the basic description of the data, certain other criteria must be determined, such as whether a number is an integer or a decimal number and whether a character string is of a fixed or variable length. Further definition of the formats may be required depending on the database engine being used. The native database engine for Visual Basic (the Jet engine) is capable of supporting all these data types. A detailed description of the Jet database engine formats is presented in Chapter 2, "Looking at the Capabilities of the Jet Engine."
Table 1.2 shows the types of data formats used for various types of information.
Table 1.2 Various Data Formats Used To Handle Information in the Database
Data Format | Used To Contain |
Text | Variable-length character strings up to 255 characters long. Examples of this are names, addresses, and alphanumeric item codes. Although it is true that not every name has the same number of characters, a length can be set that can accommodates most names. Character strings are often used for index and filter expressions. When text fields are created, they are assigned a maximum length. The data stored in the field can contain any number of characters up to that maximum. |
Memo | Large variable-length character strings. Examples of memo fields are product descriptions or doctor's notes. They are used instead of the character data format either because of limitations on the size of the character format in some database engines or because the information is stored more efficiently. |
Integer | Numerical data that has no decimal component. Examples are quantities of items and populations. |
Decimal | Numerical quantities such as fractional amounts and money. |
Date | Month, day, and year information. Although this is usually in the mm/dd/yyyy (06/28/1995) format, most database engines support date formats for countries other than the United States. Most also allow you to input only the last two digits of the year and assume that the first two digits are 19. |
Logical/Boolean | Items with a true/false or yes/no value or any items that have only two possible choices. For example, is a person married or is the phone number a business phone? |
Binary | Pictures, sound clips, or OLE objects. In the sample case, you can use the binary format to store pictures of the fish available for sale. |
Although the necessary format for most data items is quite obvious, some items require additional consideration. A good example is a person's social security number. The social security number is made up of nine numeric digits but is often stored in databases in a character format. ZIP codes are also typically stored in a character format instead of a numeric format. The reason you may want to store numbers in a character format is typically because you want to sort and index the numbers, usually when the field is used in conjunction with other fields. Consider these two items when you choose formatting:
After determining what data you need for the application, you need to organize the data in a manner that allows for easy maintenance and retrieval of the data. Within a database, data is stored in one or more tables. For most database applications, efficient data management is accomplished by storing data in multiple tables and establishing relationships between these tables. The following sections will describe how you can determine what data belongs in each table of your database.
A table is a collection of information related to a particular topic. By thinking of a key topic for the table, you can determine whether a particular piece of data fits into the table. For example, if a store owner wants to track information about both customers and employees, the owner may be tempted to put both in the same table (because both groups refer to people). However, look at the data required for each group. Although both groups require information about a person's name, address, and possibly phone number, the employee group also requires information about the person's social security number, job category, payroll, and tax status. If you were to create just one table, many of the entries would be blank for the customers. You would also have to add a field to distinguish between a customer and an employee. Clearly, this technique would result in a lot of wasted space. It could also result in slower processing of employee transactions or customer transactions because the program would have to skip a number of records in the table. Figure 1.4 shows a database table with the two groups combined. Figure 1.5 shows the reduction in the number of fields in a customer-only database table.
Combining the employee and customer tables creates wasted space.
A separate database table for customers has only the required fields and is more efficient.
By thinking of the topic to which a table relates, it is easier to determine whether a particular piece of information belongs in the table or not. If the information results in wasted space for many records, the data belongs in a different table.
Data normalization is the process of eliminating redundant data within the database. Taking data normalization to its fullest extent results in each piece of information in a database appearing only once.
Consider the example of the customer order function. For each item ordered, you need the item number, description, price, order number, order date, and customer name, address, and phone number. If you place all this information in one table, the result looks like the table shown in figure 1.6.
Nonnormalized data produces a large, inefficient data table.
As you can see, much of the data in the table is repeated over and over. This introduces two problems. The first problem is wasted space because you repeat information such as the customer name, address, and phone number. The second problem is one of data accuracy or currency. If, for example, one of the customers changes her phone number, you have to change it for all the records that apply to that customer—with the possibility that you will miss one of the entries. In the table in figure 1.7, notice that Martha Smith's phone number was changed in the latest entry, but not in the two earlier entries. If an employee looked up Martha Smith and used an earlier entry, that employee would not find Martha's updated phone number.
A better solution for handling the data is to put the customer information in one table and the sales order information in another table. You assign each customer a unique ID and include that ID in the sales order table to identify the customer. This arrangement yields two tables with the data structure shown in figure 1.7.
Normalized customer and order tables eliminate data redundancy.
With this type of arrangement, the customer information only appears in one place. Now, if a customer changes his or her phone number, you have to change only one record.
You can do the same thing to the items sold and order information. This thinking leads to the development of four tables, but the organization of the tables is much more efficient and you can be sure that when information must be changed, it has to change in only one place. This arrangement is shown in figure 1.8. With the four-table arrangement, the Orders table and the Items Ordered table provide the links between the customers and the retail items they purchased. The Items Ordered table contains one record for each item of a given order. The Orders table relates the items to the date of purchase and the customer making the purchase.
Complete normalization of the tables provides the greatest efficiency.
When information is moved out of one table and into another, you must have a way of keeping track of the relationships between the tables. This is done through the use of data keys. This topic is discussed in depth later in this chapter.
Another part of the design process is estimating the maximum desirable size of your database and the rate at which the size of the database will increase. This is important for getting an idea of the performance and space requirements of a database. The more data in the database, the more disk space required for it, and the slower database operations will be. The maximum desirable size will be determined by a number of factors such as the size of the user's disk and the data transfer rates of their network, if they use one. For example, if the user has only a 70 megabyte hard drive and the application is expected to be creating two megabytes of data per week, the user will quickly run out of available space. This fact may indicate the need to redesign the database.
By analyzing the data volume, you may see areas where the database can grow much larger than desired, indicating the need for a change to the design. Remember that it is much easier to implement design changes in the early phases of the project than it is later. Often, you may have to discuss a change with the client and may even have to make a change to the functional specifications of the application.
One of the things the sample application has to do is produce a sales receipt for each sale. Originally, it was determined that a record should be entered into the database for each item sold. This record contains the customer number, item number, price, and date sold. Mr. Crabb wants this so that he can look at each customer's buying habits and notify them of appropriate sales or special events that may interest them. After you talk to Mr. Crabb about his sales volume (which is directly related to the number of records that go into the database), you determine that the item sales table would grow at a rate of 1.5 MB per month. This growth rate will rapidly chew up space on the disk and has the potential to slow down many database transactions.
Because this information isn't needed for anything else (you use the sales information for inventory processing but don't have to retain it), you suggest to Mr. Crabb that a better approach may be to add a preference field to the customer database and simply ask the customer what particular areas of the aquarium hobby he or she is interested in. You may even suggest that a customer information response card be produced with predefined interest categories on it. This card, shown in figure 1.9, can be filled out by the customer and the information entered into the system. If properly designed, the card can be directly translated into a data entry form in the application to make it easy for employees to enter the data.
A customer information card can eliminate the tracking of some data.
If Mr. Crabb insists that the information is necessary for some other needs of his store, you might consider archiving the sales history on a monthly basis. This can be done using a query to transfer the data to another database, which can then be stored either on tape or on floppies.
If you had not analyzed the data volume, a problem may have occurred within a few months when Mr. Crabb started complaining that his system was running slowly or that he got an Out of disk space error message. Once the problem arises, consolidating the data and changing the database structure is much more difficult. This example also shows that database design requires an understanding of the user's intended application as well as the principles of programming.
A child table is a table in which all the entries share some common information, and the common information is stored in another table. A simple example of this is a membership directory: the family shares a common last name, address, and phone number but each family member has a different first name. The table containing the common information is called the parent table, and the table containing the member's first names is the child table. Figure 1.10 shows a parent table and its related child table. The use of child tables is a form of data normalization.
Parent and child tables are a form of data normalization.
A lookup table is typically a table used to store valid data entries (for example, a state abbreviations table). When a person enters the state code in an application, the program looks in the abbreviations table to make sure that the code exists.
A lookup table can also be used in data normalization. If you have a large mailing list, many of the entries use the same city and state information. In this case, you can use a ZIP code table as a related table to store the city and state by ZIP code (remember that each ZIP code corresponds to a single city and state combination). Using the ZIP code table requires that the mailing list use only the ZIP code of the address and not the city and state. During data entry, you can have the program check an entered ZIP code against the valid entries.
As stated above, the guidelines for defining tables are not hard and fast rules. There are times when it makes sense for the developer to deviate from the guidelines.
Performance Considerations
One of the most frequent reasons for deviating from the guidelines just given is to improve performance. For example, if obtaining a total sales figure for a given salesperson requires summing several thousand records, it may be worthwhile to include a Total Sales field in the salesperson table that is updated each time a sale is made. This way, when reports are generated, the application doesn't have to do large numbers of calculations and the report process is dramatically faster.
Another reason to deviate from the guidelines is to avoid opening a large number of tables at the same time. Because each open table uses a file handle and takes up memory, having too many open tables can slow down your application.
There are two major consequences of deviating from the guidelines. The first is increasing the size of the database because of redundant data. The second is the possibility of having incorrect data in some of the records because a piece of data was changed and not all the affected records were updated.
There are trade-offs between application performance and data storage efficiency. For each design, the developer must look at the trade-offs and decide what the optimum design is.
When data is normalized and information is moved from one table to another, a method must exist to relate the two tables. The method of relating tables is the use of data keys. This section discusses the two types of table relationships and how data keys are established.
Data keys are usually referred to as either primary keys or foreign keys. A primary key is the one that uniquely identifies a record in a table. For example, in the Customers table, each record contains information about a specific customer. The primary key then provides a unique identifier for each customer record. In the case of the sales order tables shown earlier in figure 1.9, the Custno field in the Customers table is the primary key. A foreign key is one used to relate a record in one table to a specific record in another table. There may be multiple records in the second table relating to a single record in the primary table. In the order system, the customer key in the Orders table is a foreign key, linking the order records back to the Customers table.
A one-to-many relationship occurs when a record in one table is related to one or more records in a second table, but each record in the second table is related to only one record in the first table. One-to-many relationships comprise the majority of the table relations in a database system.
In the sales order application example, a customer may make many purchases, but each purchase is made by only one customer (see fig. 1.11). In the membership directory example, each family record is related to one or more member name records, but each member name record is tied to only one family record.
A one-to-many relationship between tables shows the use of key fields.
Many-to-many relationships occur when each record from the first table relates to each record in the second table, and vice versa. When this occurs, an intermediate table is usually introduced that provides a one-to-many relationship with each of the other two tables.
An example of a many-to-many relationship is the items purchased by customers. Each customer may purchase many items, and each item can be purchased by many customers. Figure 1.12 shows how the data is structured with all the information in a single table. Figure 1.13 shows how the data is structured as separate item and customer tables with intermediate tables.
Item and customer information in a single table is an inefficient means of handling the data.
Separate item and customer tables with intermediate tables show the resolution of the many-to-many relationship.
Tables are related to each other through key fields. A key field is one that uniquely identifies a record. A key field may be one that has meaningful data in it or it may be a created field that serves the sole purpose of providing a unique identifier for the record. The main criteria for the key field is that it must be unique. Figure 1.14 shows a table with a key field added to provide a unique ID for each record.
A table showing an added key field to ensure unique record IDs.
The key field is present in both databases of the relationship. For the membership directory, you can assign a unique identifier to each family record. You then include the same identifier in each of the name records to indicate the family to which the name belongs. If the key-field value is not unique, there is confusion about the family information for a member.
If you are developing an employee database, it is possible that several people have the same name. One possible unique identifier is the social security number. However, because this nine-digit number must be stored in every related record, it may be better to create a smaller, unique employee ID. If you know that there will never be more than 9,999 employees, for example, a four-digit ID can be used, saving five digits in every related record. Depending on the number of related records, the space savings can be significant.
One way to ensure unique IDs is to use a counter field for the primary key. A counter field is an integer number field the database engine automatically increments when a new record is added. The counter field takes the responsibility of creating unique keys away from the user and places it on the database engine. One drawback in using a counter field is that the ID has no intrinsic meaning to the user.
Indexes, filters, and sorts are used to control the appearance and processing of tables in a database. An index or sort controls the order of the database; a filter controls the range of records accessed.
When information is entered into a table, records are stored in the order in which they are entered. This is the physical order of the data. However, you usually want to view or process data in an order different from the order of entry. You also frequently have to find a specific record in a table. Doing this by scanning the table in its physical order is quite time consuming.
An index provides a method of showing a table in a specific order. An index is a special table that contains a key value (usually derived from the values of one or more fields) for each record in the data table stored in an order requested by the user. The index also contains pointers that tell the database engine where the actual record is located.
The structure of an index allows for rapid searches of the data. If you have a table of names indexed alphabetically, you can rapidly retrieve the record for John Smith by searching the index. To get an idea of the value of such an index, imagine a phone book that lists the customer names in the order in which they signed up for phone service. If you live in a large city, it could take forever to find a person's number because you have to look at each line until you find the one you want.
A table can have a number of different indexes associated with it to give you several different organizations of the data. An employee table may have indexes on last name, date of birth, date of hire, and pay scale. Each index shows the same data in a different order, for a different purpose.
Although it may be desirable to have many different views of the data, keeping multiple indexes can take a toll on performance. Once again, there are trade-offs in the database design.
You can also create different views of the information in a table by sorting the records or by specifying an order using a SQL statement.
The most common type of index is the single-key index, an index based on the value of a single field in a table. Examples of this type of index are social security number, ZIP code, employee ID, and last name. If multiple records exist with the same index key, those records are shown in physical order within the sort order imposed by the single-key index. Figure 1.15 shows the physical order of a names table and how the table appears after being indexed on the last name field.
The physical and logical order of a table may be different. Logical order depends on indexes.
Although single-key expressions are valuable in presenting data in a specific order, it is often necessary to impose an even more detailed order on the table. This can be done through the use of multiple-key indexes. As you can infer from the name, a multiple-key index is based on the values of two or more fields in a table. A prime example is to use last name and first name when indexing a membership list. Figure 1.16 updates the view of the table shown in figure 1.15 to show how using the first name field to help sort the records changes the order of the table. As with single-key indexes, if the key values of several records are the same, the records are presented in physical order within the index order.
Multiple-key indexes further refine the logical order of a table.
The order of the fields in the index expression has a dramatic impact on the order of the records in the table. Although this may be obvious, the point should be stressed. Indexing on first name and then last name produces different results than indexing on last name and then first name. Figure 1.17 shows the undesirable results of using a first name/last name index on the table used in figure 1.16.
Improper index field order yields undesirable results.
Use care when using multiple numeric fields as an index expression. Because the key value is a combination of the individual keys, a key value obtained adding two numbers may yield a different order than a key value obtained by concatenating the character representation of the two numeric fields. Figure 1.18 shows a simple example of this phenomenon.
Different ways of combining numeric fields in an index expression can result in different arrangements of information.
The functional specifications for the application helps determine which fields should be used as key fields for an index in a table. For lookup tables, an index is often included on the field being searched. Using an index speeds up the search for a specific value; as a table gets larger, the speed advantage of using an index increases. If a function calls for data from a table to be displayed or processed in a particular order, include an index for the table that results in that order.
In the sample case, you determine that you need two indexes on the table containing customer information. The first index is on the customer name (so that you can quickly locate a particular customer in the table, and so that you can prepare a list of all customers presented in alphabetical order). The second index is on ZIP code (so that you can sort deliveries by geographic area, and so that you can prepare mailing lists sorted by ZIP code to use discounted mailing rates).
Another of the tables is the inventory items table, which includes item numbers, inventory levels, prices, and item descriptions. You decide to use an index on the item number for this table (so that you can find the item quickly to update the inventory quantity, and so that you can look up the price of the item for the point-of-sale portion of the application).
Another way to control the information an application displays or processes is with the use of filters. A filter limits the number of records to those that meet a certain condition (for example, people with last names beginning with S or people who live in Florida). A filter is expressed as a logical condition in which the value of a field is compared to the filter value. Operators typically used for filters are equal to, not equal to, greater than, less than, and so on. For filter conditions, the comparison value must be of the same data type as the fields being compared. The most common error of mismatched data types occurs when a number is compared to the character representation of another number.
As with indexes, filter conditions can use comparisons of multiple fields (for example, "people named Smith who live in Florida). When you use multiple filter conditions, the individual conditions are joined by logical operators such as AND and OR. Obviously, the joined condition has a big impact on the records that meet the filter condition. For example, the condition Birthdate<01/01/65 AND Birthdate>12/31/54 yields a list of people aged 30 to 39; the condition Birthdate>01/01/65 OR Birthdate<12/31/54 yields a list of people that excludes those in the 30 to 39 age bracket.
When multiple conditions are used, each condition can use a different data type. Therefore, a filter condition of Lastname='Smith' AND Age>30 is perfectly valid.
A sort is used to create a recordset presented in a specific order from one or more base recordsets. In Visual Basic, a sort is used when you have a dynaset and want to change the presentation order of the records. Because an index cannot be assigned to a dynaset, you must set a sort condition and create a second dynaset from the first one. A sort is also implied when you use the ORDER BY clause of a SQL statement.
The term recordset is used to refer to any collection of records from one or more tables. A dynaset refers to a specific type of recordset used in Visual Basic. These terms are further defined in Chapter 4, "Manipulating Data with a Program."
When setting up index and filter expressions, it is not always possible to achieve the desired results with direct comparisons to field values. Therefore, most database engines allow the use of calculated values or functions in the expression. For example, if you want to find the people whose names start with S, you actually create a condition such as Firstletter(Lastname)='S', where Firstletter is a function that returns the first letter of the string (the Lastname) passed to it.
In the sample case, you must look for service calls that should be performed when the period of time since the last service exceeds the recommended service frequency. The data you have available is the date of the last service call and the current date. Therefore, you can generate a filter condition similar to this one: CurrentDate-ServiceDate>ServiceFrequency.
Another common use of functions in filter and index expressions is to put character data in uppercase (or lowercase). This change ensures that the index or filter is independent of the case of the information in the field. Some people have last names that begin with lowercase letters. If you do not use a function to convert the last name to uppercase letters in the index, these names are sorted out of order with other names that start with the same letter (see fig. 1.19).
An example of not using a function in an index expression.
Two other issues in database design are data validation and data integrity. The first is concerned with making sure that data entered into the database is correct. The second refers to maintaining proper relationships between tables.
Data validation is the process of ensuring that the data stored in a field meets certain criteria. Common validation checks are dates (to make sure that someone hasn't entered February 31, for example), numbers (someone's age can't be less than 0), or lookup table values (to ensure that a ZIP code does exist). Validation checks can also be used to make sure that the data entered meets the design criteria of a database or table (not allowing a blank social security number in a personnel table).
Depending on the database engine in use, the validation of data can be handled by the rules embedded in the table, or it may have to be handled by rules set in the program that is processing the data. In either case, the determination of the validation rules is part of the design process, and the rules are developed from the functional specifications.
The Jet database engine supports the use of validation rules embedded in the tables. This is known as engine-level validation. These rules are stored in the database itself. Validation rules can also be defined within the program commands used to access the database.
Data integrity refers to making sure that references between tables remain valid (also known as referential integrity). A common problem arises when a record in one table is deleted, and that record is related to other records in a dependent table. Consider the case of the membership directory: if a family name is deleted from the family table, information contained in that record is no longer available to the records in the member name table. The member names left without a family record are referred to as orphan records.
Determining whether or not to allow orphan records is one of the decisions to be made during database design. As with data validation, some database engines can automatically handle the maintenance of referential integrity. For other database engines, referential integrity is the responsibility of the programmer.
The Jet database engine supports referential integrity. This support is optional and must be turned on when relationships between tables are defined.
If you perform operations on a repetitive basis, you may want to create a query to perform the operation and store the query in the database. For example, in the sales order function, you want to be able to produce a billing report for each customer every month. This report is shown in the following table:
Order Number | Date | Total Cost | Customer Name |
10001 | 9/1/94 | 4.25 | Martha Smith |
10002 | 9/2/94 | 2.15 | Bill Jones |
10003 | 9/2/94 | 5.50 | Bob Brown |
10004 | 9/5/94 | 1.35 | Jim White |
To generate the report, you need the orders for the specific time period, total them by customer number, and add the customer information to the report. This can easily be done with the following SQL statement:
This SQL statement asks the database engine to return a recordset containing specific data fields (the names between the SELECT and FROM keywords). The statement also defines in which tables the records are found (the names after the FROM keyword), and a condition that the returned records should meet (the text following the WHERE keyword). A complete description of SQL statements is in Chapter 6, "Understanding Structured Query Language (SQL)."
Because this query is run on a regular basis, you can store the query in the database. Creating a query and storing it is covered in Chapter 3, "Implementing the Database Design."
After you complete the initial design of the database, review the design to look for any omissions or any inefficiencies that should be resolved. The best way to review the design is to input sample data for the application. This data is preferably a subset of the real data that will be used once the application is completed.
The first question to be addressed is, "Does the design meet the objectives?" In other words, can it perform all the functions asked for by the client? Common causes of missing an objective are omission of a piece of data, not creating a specific index, or not linking a table correctly to other tables. To determine this, the developer should walk through the processes involved in each function using the data in the sample database. If a process dead ends or takes a wrong turn, some redesign is in order.
Here is a list of several other things you can look for in the design that may show up when you use the sample data:
Tables 1.4 through 1.9 list the structures of the data tables developed for the sample case. The Length column in these tables has different meanings for the different data types. For character fields, the length indicates the maximum number of characters contained in the string. For numbers, the length column indicates the specific type of number, which in turn defines the number of bytes used to represent the number. For other field types, the number of bytes for the field is predefined, and no further information is needed. This is summarized in table 1.3.
Table 1.3 Use of Length Column for Data Types
Data Type | Meaning |
Character | The number of characters in the text string. |
Numeric | Indicates whether a number is an integer (2 bytes), long integer (4 bytes), or a single or double precision real number (4 or 8 bytes, respectively). |
Memo | Length predefined. |
Binary | Length predefined. |
Long Binary | Length predefined. |
Date | Length predefined. |
Yes/No | Length predefined. |
Table 1.4 Data Structure of the Customer Data Table
Field Name | Type | Length | Description |
Custno | Numeric | Long Integer | Unique customer ID |
Lastname | Character | 30 | Last name |
Firstname | Character | 30 | First name |
Address | Character | 40 | Street Address |
City | Character | 30 | City |
State | Character | 2 | State abbreviation code |
ZIP | Character | 5 | Postal ZIP code |
Phone | Character | 13 | Phone number |
Interests | Memo | Customer interests | |
SalesID | Character | 6 | ID of salesperson assigned to customer |
Table 1.5 Data Structure of the Salesperson Data Table
Field Name | Type | Length | Description |
SalesID | Character | 6 | ID of salesperson |
SalesLast | Character | 30 | Last name of salesperson |
SalesFirst | Character | 30 | First name of salesperson |
Table 1.6 Data Structure of the Retail Items Data Table
Field Name | Type | Length | Description |
Item Code | Numeric | Integer | Unique item ID |
Item Description | Character | 50 | Item description |
Product Category | Character | 10 | Product category |
Wholesale | Numeric | Single | Wholesale price |
Retail | Numeric | Single | Retail price |
Min Quantity | Numeric | Integer | Desired minimum inventory |
On Hand | Numeric | Integer | Current inventory level |
SupplierID | Numeric | Long Integer | Supplier ID for item |
Photo | Picture | Long Binary | Picture of product |
Table 1.7 Data Structure for Orders Data Table
Field Name | Type | Length | Description |
---|---|---|---|
Orderno | Numeric | Long Integer | Unique order ID |
Custno | Numeric | Long Integer | Unique customer ID |
SalesID | Character | 6 | ID of salesperson |
OrderDate | Date | Date order was placed | |
Totcost | Numeric | Single | Total cost of order |
Table 1.8 Data Structure for Sales Data Table
Field Name | Type | Length | Description |
Orderno | Numeric | Long Integer | Unique order ID |
Item Code | Numeric | Integer | Unique item ID |
Table 1.9 Data Structure for Suppliers Data Table
Field Name | Type | Length | Description |
SupplierID | Numeric | Long Integer | Supplier ID |
Name | Character | 50 | Name of supplier |
Contact | Character | 50 | Name of contact person |
Address | Character | 50 | Supplier's address |
City | Character | 30 | City |
State | Character | 2 | State abbreviation |
ZIP | Character | 5 | ZIP code |
Phone | Character | 13 | Phone number |
This chapter introduced the concepts used in designing a database. You learned about data modeling, the organization of data into tables, and the use of relations and data normalization to prevent redundant data. The chapter also explained the workings of indexes, sorts, and filters.
To learn more, refer to these chapters:
© 1996, QUE Corporation, an imprint of Macmillan Publishing USA, a Simon and Schuster Company.