Today you learn about creating databases. Day 9 covers the CREATE DATABASE, CREATE TABLE, ALTER TABLE, DROP TABLE, and DROP DATABASE statements, which are collectively known as data definition statements. (In contrast, the SELECT, UPDATE, INSERT, and DELETE statements are often described as data manipulation statements.) By the end of the day, you will understand and be able to do the following:
You now know much of the SQL vocabulary and have examined the SQL query in some detail, beginning with its basic syntax. On Day 2, "Introduction to the Query: The SELECT Statement," you learned how to select data from the database. On Day 8, "Manipulating Data," you learned how to insert, update, and delete data from the database. Now, nine days into the learning process, you probably have been wondering just where these databases come from. For simplicity's sake, we have been ignoring the process of creating databases and tables. We have assumed that these data objects existed currently on your system. Today you finally create these objects.
The syntax of the CREATE statements can range from the extremely simple to the complex, depending on the options your database management system (DBMS) supports and how detailed you want to be when building a database.
NOTE: The examples used today were generated using Personal Oracle7. Please see the documentation for your specific SQL implementation for any minor differences in syntax.
The first data management step in any database project is to create the database. This task can range from the elementary to the complicated, depending on your needs and the database management system you have chosen. Many modern systems (including Personal Oracle7) include graphical tools that enable you to completely build the database with the click of a mouse button. This time-saving feature is certainly helpful, but you should understand the SQL statements that execute in response to the mouse clicks.
Through personal experience, we have learned the importance of creating a good SQL install script. This script file contains the necessary SQL code to completely rebuild a database or databases; the script often includes database objects such as indexes, stored procedures, and triggers. You will see the value of this script during development as you continually make changes to the underlying database and on occasion want to completely rebuild the database with all the latest changes. Using the graphical tools each time you need to perform a rebuild can become extremely time-consuming. In addition, knowing the SQL syntax for this procedure enables you to apply your knowledge to other database systems.
The syntax for the typical CREATE DATABASE statement looks like this:
CREATE DATABASE database_name
Because the syntax varies so widely from system to system, we will not expand on the CREATE DATABASE statement's syntax. Many systems do not even support an SQL CREATE DATABASE command. However, all the popular, more powerful, relational database management systems (RDBMSs) do provide it. Instead of focusing on its syntax, we will spend some time discussing the options to consider when creating a database.
The syntax for the CREATE DATABASE statement can vary widely. Many SQL texts skip over the CREATE DATABASE statement and move directly on to the CREATE TABLE statement. Because you must create a database before you can build a table, this section focuses on some of the concepts a developer must consider when building a database. The first consideration is your level of permission. If you are using a relational database management system (RDBMS) that supports user permissions, you must make sure that either you have system administrator-level permission settings or the system administrator has granted you CREATE DATABASE permission. Refer to your RDBMS documentation for more information.
Most RDBMSs also allow you to specify a default database size, usually in terms of hard disk space (such as megabytes). You will need to understand how your database system stores and locates data on the disk to accurately estimate the size you need. The responsibility for managing this space falls primarily to system administrators, and possibly at your location a database administrator will build you a test database.
Don't let the CREATE DATABASE statement intimidate you. At its simplest, you can create a database named PAYMENTS with the following statement:
SQL> CREATE DATABASE PAYMENTS;
NOTE: Again, be sure to consult your database management system's documentation to learn the specifics of building a database, as the CREATE DATABASE statement can and does vary for the different implementations. Each implementation also has some unique options.
Designing a database properly is extremely important to the success of your application. The introductory material on Day 1, "Introduction to SQL," touched on the topics of relational database theory and database normalization.
Normalization is the process of breaking your data into separate components to reduce the repetition of data. Each level of normalization reduces the repetition of data. Normalizing your data can be an extremely complex process, and numerous database design tools enable you to plan this process in a logical fashion.
Many factors can influence the design of your database, including the following:
Disk space is always an important factor. Although you may not think that disk space is a major concern in an age of multigigabyte storage, remember that the bigger your database is, the longer it takes to retrieve records. If you have done a poor job of designing your table structure, chances are that you have needlessly repeated much of your data.
Often the opposite problem can occur. You may have sought to completely normalize your tables' design with the database and in doing so created many tables. Although you may have approached database-design nirvana, any query operations done against this database may take a very long time to execute. Databases designed in this manner are sometimes difficult to maintain because the table structure might obscure the designer's intent. This problem underlines the importance of always documenting your code or design so that others can come in after you (or work with you) and have some idea of what you were thinking at the time you created your database structure. In database designer's terms, this documentation is known as a data dictionary.
A data dictionary is the database designer's most important form of documentation. It performs the following functions:
Many computer-aided software engineering (CASE) tools aid the database designer in the creation of this data dictionary. For instance, Microsoft Access comes prepackaged with a database documenting tool that prints out a detailed description of every object in the database. See Day 17, "Using SQL to Generate SQL Statements," for more details on the data dictionary.
NOTE: Most of the major RDBMS packages come with either the data dic-tionary installed or scripts to install it.
Along with documenting your database design, the most important design goal you should have is to create your table structure so that each table has a primary key and a foreign key. The primary key should meet the following goals:
Regarding the second goal, the column that has completely unique data throughout the table is known as the primary key field. A foreign key field is a field that links one table to another table's primary or foreign key. The following example should clarify this situation.
Assume you have three tables: BILLS, BANK_ACCOUNTS, and COMPANY. Table 9.1 shows the format of these three tables.
Bills | Bank_Accounts | Company |
NAME, CHAR(30) | ACCOUNT_ID, NUMBER | NAME, CHAR(30) |
AMOUNT, NUMBER | TYPE, CHAR(30) | ADDRESS, CHAR(50) |
ACCOUNT_ID, NUMBER | BALANCE, NUMBER | CITY, CHAR(20) |
BANK, CHAR(30) | STATE, CHAR(2) |
Take a moment to examine these tables. Which fields do you think are the primary keys? Which are the foreign keys?
The primary key in the BILLS table is the NAME field. This field should not be duplicated because you have only one bill with this amount. (In reality, you would probably have a check number or a date to make this record truly unique, but assume for now that the NAME field works.) The ACCOUNT_ID field in the BANK_ACCOUNTS table is the primary key for that table. The NAME field is the primary key for the COMPANY table.
The foreign keys in this example are probably easy to spot. The ACCOUNT_ID field in the BILLS table joins the BILLS table with the BANK_ACCOUNTS table. The NAME field in the BILLS table joins the BILLS table with the COMPANY table. If this were a full-fledged database design, you would have many more tables and data breakdowns. For instance, the BANK field in the BANK_ACCOUNTS table could point to a BANK table containing bank information such as addresses and phone numbers. The COMPANY table could be linked with another table (or database for that matter) containing information about the company and its products.
Let's take a moment to examine an incorrect database design using the same information contained in the BILLS, BANK_ACCOUNTS, and COMPANY tables. A mistake many beginning users make is not breaking down their data into as many logical groups as possible. For instance, one poorly designed BILLS table might look like this:
Column Names | Comments |
NAME, CHAR(30) | Name of company that bill is owed to |
AMOUNT, NUMBER | Amount of bill in dollars |
ACCOUNT_ID, NUMBER | Bank account number of bill (linked to BANK_ACCOUNTS table) |
ADDRESS, CHAR(30) | Address of company that bill is owed to |
CITY, CHAR(15) | City of company that bill is owed to |
STATE, CHAR(2) | State of company that bill is owed to |
The results may look correct, but take a moment to really look at the data here. If over several months you wrote several bills to the company in the NAME field, each time a new record was added for a bill, the company's ADDRESS, CITY, and STATE information would be duplicated. Now multiply that duplication over several hundred or thousand records and then multiply that figure by 10, 20, or 30 tables. You can begin to see the importance of a properly normalized database.
Before you actually fill these tables with data, you will need to know how to create a table.
The process of creating a table is far more standardized than the CREATE DATABASE statement. Here's the basic syntax for the CREATE TABLE statement:
CREATE TABLE table_name ( field1 datatype [ NOT NULL ], field2 datatype [ NOT NULL ], field3 datatype [ NOT NULL ]...)
A simple example of a CREATE TABLE statement follows.
SQL> CREATE TABLE BILLS ( 2 NAME CHAR(30), 3 AMOUNT NUMBER, 4 ACCOUNT_ID NUMBER); Table created.
This statement creates a table named BILLS. Within the BILLS table are three fields: NAME, AMOUNT, and ACCOUNT_ID. The NAME field has a data type of character and can store strings up to 30 characters long. The AMOUNT and ACCOUNT_ID fields can contain number values only.
The following section examines components of the CREATE TABLE command.
When creating a table using Personal Oracle7, several constraints apply when naming the table. First, the table name can be no more than 30 characters long. Because Oracle is case insensitive, you can use either uppercase or lowercase for the individual characters. However, the first character of the name must be a letter between A and Z. The remaining characters can be letters or the symbols _, #, $, and @. Of course, the table name must be unique within its schema. The name also cannot be one of the Oracle or SQL reserved words (such as SELECT).
NOTE: You can have duplicate table names as long as the owner or schema is different. Table names in the same schema must be unique.
The same constraints that apply to the table name also apply to the field name. However, a field name can be duplicated within the database. The restriction is that the field name must be unique within its table. For instance, assume that you have two tables in your database: TABLE1and TABLE2. Both of these tables could have fields called ID. You cannot, however, have two fields within TABLE1 called ID, even if they are of different data types.
If you have ever programmed in any language, you are familiar with the concept of data types, or the type of data that is to be stored in a specific field. For instance, a character data type constitutes a field that stores only character string data. Table 9.2 shows the data types supported by Personal Oracle7.
Data Type | Comments |
CHAR | Alphanumeric data with a length between 1 and 255 characters. Spaces are padded to the right of the value to supplement the total allocated length of the column. |
DATE | Included as part of the date are century, year, month, day, hour, minute, and second. |
LONG | Variable-length alphanumeric strings up to 2 gigabytes. (See the following note.) |
LONG RAW | Binary data up to 2 gigabytes. (See the following note.) |
NUMBER | Numeric 0, positive or negative fixed or floating-point data. |
RAW | Binary data up to 255 bytes. |
ROWID | Hexadecimal string representing the unique address of a row in a table. (See the following note.) |
VARCHAR2 | Alphanumeric data that is variable length; this field must be between 1 and 2,000 characters long. |
NOTE: The LONG data type is often called a MEMO data type in other database management systems. It is primarily used to store large amounts of text for retrieval at some later time.The LONG RAW data type is often called a binary large object (BLOB) in other database management systems. It is typically used to store graphics, sound, or video data. Although relational database management systems were not originally designed to serve this type of data, many multimedia systems today store their data in LONG RAW, or BLOB, fields.
The ROWID field type is used to give each record within your table a unique, nonduplicating value. Many other database systems support this concept with a COUNTER field (Microsoft Access) or an IDENTITY field (SQL Server).
NOTE: Check your implementation for supported data types as they may vary.
SQL also enables you to identify what can be stored within a column. A NULL value is almost an oxymoron, because having a field with a value of NULL means that the field actually has no value stored in it.
When building a table, most database systems enable you to denote a column with the NOT NULL keywords. NOT NULL means the column cannot contain any NULL values for any records in the table. Conversely, NOT NULL means that every record must have an actual value in this column. The following example illustrates the use of the NOT NULL keywords.
SQL> CREATE TABLE BILLS ( 2 NAME CHAR(30) NOT NULL, 3 AMOUNT NUMBER, 4 ACCOUNT_ID NOT NULL);
In this table you want to save the name of the company you owe the money to, along with the bill's amount. If the NAME field and/or the ACCOUNT_ID were not stored, the record would be meaningless. You would end up with a record with a bill, but you would have no idea whom you should pay.
The first statement in the next example inserts a valid record containing data for a bill to be sent to Joe's Computer Service for $25.
SQL> INSERT INTO BILLS VALUES("Joe's Computer Service", 25, 1); 1 row inserted.
SQL> INSERT INTO BILLS VALUES("", 25000, 1); 1 row inserted.
Notice that the second record in the preceding example does not contain a NAME value. (You might think that a missing payee is to your advantage because the bill amount is $25,000, but we won't consider that.) If the table had been created with a NOT NULL value for the NAME field, the second insert would have raised an error.
A good rule of thumb is that the primary key field and all foreign key fields should never contain NULL values.
One of your design goals should be to have one unique column within each table. This column or field is a primary key field. Some database management systems allow you to set a field as unique. Other database management systems, such as Oracle and SQL Server, allow you to create a unique index on a field. (See Day 10.) This feature keeps you from inserting duplicate key field values into the database.
You should notice several things when choosing a key field. As we mentioned, Oracle provides a ROWID field that is incremented for each row that is added, which makes this field by default always a unique key. ROWID fields make excellent key fields for several reasons. First, it is much faster to join on an integer value than on an 80-character string. Such joins result in smaller database sizes over time if you store an integer value in every primary and foreign key as opposed to a long CHAR value. Another advantage is that you can use ROWID fields to see how a table is organized. Also, using CHAR values leaves you open to a number of data entry problems. For instance, what would happen if one person entered 111 First Street, another entered 111 1st Street, and yet another entered 111 First St.? With today's graphical user environments, the correct string could be entered into a list box. When a user makes a selection from the list box, the code would convert this string to a unique ID and save this ID to the database.
Now you can create the tables you used earlier today. You will use these tables for the rest of today, so you will want to fill them with some data. Use the INSERT command covered yesterday to load the tables with the data in Tables 9.3, 9.4, and 9.5.
SQL> create database PAYMENTS; Statement processed. SQL> create table BILLS ( 2 NAME CHAR(30) NOT NULL, 3 AMOUNT NUMBER, 4 ACCOUNT_ID NUMBER NOT NULL); Table created. SQL> create table BANK_ACCOUNTS ( 2 ACCOUNT_ID NUMBER NOT NULL, 3 TYPE CHAR(30), 4 BALANCE NUMBER, 5 BANK CHAR(30)); Table created. SQL> create table COMPANY ( 2 NAME CHAR(30) NOT NULL, 3 ADDRESS CHAR(50), 4 CITY CHAR(30), 5 STATE CHAR(2)); Table created.
Name | Amount | Account_ID |
Phone Company | 125 | 1 |
Power Company | 75 | 1 |
Record Club | 25 | 2 |
Software Company | 250 | 1 |
Cable TV Company | 35 | 3 |
Account_ID | Type | Balance | Band |
1 | Checking | 500 | First Federal |
2 | Money Market | 1200 | First Investor's |
3 | Checking | 90 | Credit Union |
Name | Address | City | State |
Phone Company | 111 1st Street | Atlanta | GA |
Power Company | 222 2nd Street | Jacksonville | FL |
Record Club | 333 3rd Avenue | Los Angeles | CA |
Software Company | 444 4th Drive | San Francisco | CA |
Cable TV Company | 555 5th Drive | Austin | TX |
Most major RDBMSs have default settings for table sizes and table locations. If you do not specify table size and location, then the table will take the defaults. The defaults may be very undesirable, especially for large tables. The default sizes and locations will vary among the implementations. Here is an example of a CREATE TABLE statement with a storage clause (from Oracle).
SQL> CREATE TABLE TABLENAME 2 (COLUMN1 CHAR NOT NULL, 3 COLUMN2 NUMBER, 4 COLUMN3 DATE) 5 TABLESPACE TABLESPACE NAME 6 STORAGE 7 INITIAL SIZE, 8 NEXT SIZE, 9 MINEXTENTS value, 10 MAXEXTENTS value, 11 PCTINCREASE value);
Table created.
In Oracle you can specify a tablespace in which you want the table to reside. A decision is usually made according to the space available, often by the database administrator (DBA). INITIAL SIZE is the size for the initial extent of the table (the initial allocated space). NEXT SIZE is the value for any additional extents the table may take through growth. MINEXTENTS and MAXEXTENTS identify the minimum and maximum extents allowed for the table, and PCTINCREASE identifies the percentage the next extent will be increased each time the table grows, or takes another extent.
The most common way to create a table is with the CREATE TABLE command. However, some database management systems provide an alternative method of creating tables, using the format and data of an existing table. This method is useful when you want to select the data out of a table for temporary modification. It can also be useful when you have to create a table similar to the existing table and fill it with similar data. (You won't have to reenter all this information.) The syntax for Oracle follows.
CREATE TABLE NEW_TABLE(FIELD1, FIELD2, FIELD3) AS (SELECT FIELD1, FIELD2, FIELD3 FROM OLD_TABLE <WHERE...>
This syntax allows you to create a new table with the same data types as those of the fields that are selected from the old table. It also allows you to rename the fields in the new table by giving them new names.
SQL> CREATE TABLE NEW_BILLS(NAME, AMOUNT, ACCOUNT_ID) 2 AS (SELECT * FROM BILLS WHERE AMOUNT < 50); Table created.
The preceding statement creates a new table (NEW_BILLS) with all the records from the BILLS table that have an AMOUNT less than 50.
Some database systems also allow you to use the following syntax:
INSERT NEW_TABLE SELECT <field1, field2... | *> from OLD_TABLE <WHERE...>
The preceding syntax would create a new table with the exact field structure and data found in the old table. Using SQL Server's Transact-SQL language in the following example illustrates this technique.
INSERT NEW_BILLS 1> select * from BILLS where AMOUNT < 50 2> go
(The GO statement in SQL Server processes the SQL statements in the command buffer. It is equivalent to the semicolon (;) used in Oracle7.)
Many times your database design does not account for everything it should. Also, requirements for applications and databases are always subject to change. The ALTER TABLE statement enables the database administrator or designer to change the structure of a table after it has been created.
The ALTER TABLE command enables you to do two things:
The syntax for the ALTER TABLE statement is as follows:
ALTER TABLE table_name <ADD column_name data_type; | MODIFY column_name data_type;>
The following command changes the NAME field of the BILLS table to hold 40 characters:
SQL> ALTER TABLE BILLS 2 MODIFY NAME CHAR(40); Table altered.
NOTE: You can increase or decrease the length of columns; however, you can not decrease a column's length if the current size of one of its values is greater than the value you want to assign to the column length.
Here's a statement to add a new column to the NEW_BILLS table:
SQL> ALTER TABLE NEW_BILLS 2 ADD COMMENTS CHAR(80); Table altered.
This statement would add a new column named COMMENTS capable of holding 80 characters. The field would be added to the right of all the existing fields.
Several restrictions apply to using the ALTER TABLE statement. You cannot use it to add or delete fields from a database. It can change a column from NOT NULL to NULL, but not necessarily the other way around. A column specification can be changed from NULL to NOT NULL only if the column does not contain any NULL values. To change a column from NOT NULL to NULL, use the following syntax:
ALTER TABLE table_name MODIFY (column_name data_type NULL);
To change a column from NULL to NOT NULL, you might have to take several steps:
2. Deal with any NULL values that you find. (Delete those records, update the column's value, and so on.)
3. Issue the ALTER TABLE command.
NOTE: Some database management systems allow the use of the MODIFY clause; others do not. Still others have added other clauses to the ALTER TABLE statement. In Oracle, you can even alter the table's storage parameters. Check the documentation of the system you are using to determine the implementation of the ALTER TABLE statement.
SQL provides a command to completely remove a table from a database. The DROP TABLE command deletes a table along with all its associated views and indexes. (See Day 10 for details.) After this command has been issued, there is no turning back. The most common use of the DROP TABLE statement is when you have created a table for temporary use. When you have completed all operations on the table that you planned to do, issue the DROP TABLE statement with the following syntax:
DROP TABLE table_name;
Here's how to drop the NEW_BILLS table:
SQL> DROP TABLE NEW_BILLS; Table dropped.
Notice the absence of system prompts. This command did not ask Are you sure? (Y/N). After the DROP TABLE command is issued, the table is permanently deleted.
WARNING: If you issueSQL> DROP TABLE NEW_BILLS;you could be dropping the incorrect table. When dropping tables, you should always use the owner or schema name. The recommended syntax is
SQL> DROP TABLE OWNER.NEW_BILLS;We are stressing this syntax because we once had to repair a production database from which the wrong table had been dropped. The table was not properly identified with the schema name. Restoring the database was an eight-hour job, and we had to work until well past midnight.
Some database management systems also provide the DROP DATABASE statement, which is identical in usage to the DROP TABLE statement. The syntax for this statement is as follows:
DROP DATABASE database_name
Don't drop the BILLS database now because you will use it for the rest of today, as well as on Day 10.
NOTE: The various relational database implementations require you to take diff-erent steps to drop a database. After the database is dropped, you will need to clean up the operating system files that compose the database.
Create a database with one table in it. Issue the DROP TABLE command and the issue the DROP DATABASE command. Does your database system allow you to do this? Single-file-based systems, such as Microsoft Access, do not support this command. The database is contained in a single file. To create a database, you must use the menu options provided in the product itself. To delete a database, simply delete the file from the hard drive.
Day 9 covers the major features of SQL's Data Manipulation Language (DML). In particular, you learned five new statements: CREATE DATABASE, CREATE TABLE, ALTER TABLE, DROP TABLE, and DROP DATABASE. Today's lesson also discusses the importance of creating a good database design.
A data dictionary is one of the most important pieces of documentation you can create when designing a database. This dictionary should include a complete description of all objects in the database: tables, fields, views, indexes, stored procedures, triggers, and so forth. A complete data dictionary also contains a brief comment explaining the purpose behind each item in the database. You should update the data dictionary whenever you make changes to the database.
Before using any of the data manipulation statements, it is also important to create a good database design. Break down the required information into logical groups and try to identify a primary key field that other groups (or tables) can use to reference this logical group. Use foreign key fields to point to the primary or foreign key fields in other tables.
You learned that the CREATE DATABASE statement is not a standard element within database systems. This variation is primarily due to the many different ways vendors store their databases on disk. Each implementation enables a different set of features and options, which results in a completely different CREATE DATABASE statement. Simply issuing CREATE DATABASE database_name creates a default database with a default size on most systems. The DROP DATABASE statement permanently removes that database.
The CREATE TABLE statement creates a new table. With this command, you can create the fields you need and identify their data types. Some database management systems also allow you to specify other attributes for the field, such as whether it can allow NULL values or whether that field should be unique throughout the table. The ALTER TABLE statement can alter the structure of an existing table. The DROP TABLE statement can delete a table from a database.
A CREATE DATABASE varies because the actual process of creating a database varies from one database system to another. Small PC-based databases usually rely on files that are created within some type of application program. To distribute the database on a large server, related database files are simply distributed over several disk drives. When your code accesses these databases, there is no database process running on the computer, just your application accessing the files directly. More powerful database systems must take into account disk space management as well as support features such as security, transaction control, and stored procedures embedded within the database itself. When your application program accesses a database, a database server manages your requests (along with many others' requests) and returns data to you through a sometimes complex layer of middleware. These topics are discussed more in Week 3. For now, learn all you can about how your particular database management system creates and manages databases.
Q Can I create a table temporarily and then automatically drop it when I am done with it?
A Yes. Many database management systems support the concept of a temporary table. This type of table is created for temporary usage and is automatically deleted when your user's process ends or when you issue the DROP TABLE command. The use of temporary tables is discussed on Day 14, "Dynamic Uses of SQL."
Q Can I remove columns with the ALTER TABLE statement?
A No. The ALTER TABLE command can be used only to add or modify columns within a table. To remove columns, create a new table with the desired format and then select the records from the old table into the new table.
The Workshop provides quiz questions to help solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix F, "Answers to Quizzes and Exercises."
2. True or False: The DROP TABLE command is functionally equivalent to the DELETE FROM <table_name> command.
3. True or False: To add a new table to a database, use the CREATE TABLE command.
4. What is wrong with the following statement?
CREATE TABLE new_table ( ID NUMBER, FIELD1 char(40), FIELD2 char(80), ID char(40);
ALTER DATABASE BILLS ( COMPANY char(80));
7. If data in a character column has varying lengths, what is the best choice for the data type?
8. Can you have duplicate table names?
2. With the five tables that you have created--BILLS, BANK_ACCOUNTS, COMPANY, BANK, and ACCOUNT_TYPE--change the table structure so that instead of using CHAR fields as keys, you use integer ID fields as keys.
3. Using your knowledge of SQL joins (see Day 6, "Joining Tables"), write several queries to join the tables in the BILLS database.
© Copyright, Macmillan Computer Publishing. All rights reserved.