Teach Yourself Database Programming
with Visual C++ 6 in 21 days

Day 13
Melding Object-Oriented Programming
with Relational Databases

Relational Databases and Object-Oriented Languages, the Oil and Vinegar of Modern Application Development
- How the Relational Model and the Object Model Differ
Object Databases Versus Relational Databases
The Costs and Benefits of Using Relational Databases with C++
Techniques for Integrating Relational Databases with Object-Oriented C++ Programs
Summary
Q&A
Workshop
- Quiz
- Exercises

Relational database technologies have revolutionized the way data is managed. With relational databases, you can manage huge quantities of data that are impossible to manage using other, less capable database technologies.

Object-oriented programming techniques have revolutionized the way software is developed. With object-oriented programming techniques, you can create large and complex software systems that are impossible to create using other, less capable programming methodologies.

Relational databases enable powerful data management, and object-oriented programming enables powerful software development. By combining the object model and the relational model, you can write large and complex software systems that manage huge amounts of data.

Today you will learn

The characteristics of the relational model and the object model that make them difficult to integrate
The factors that determine whether an application is suitable for an object/relational mix or for a pure object database
The costs and the benefits of combining the relational model with the object model
The techniques for integrating relational databases with object-oriented programs

Because today's work deals primarily with design issues, you won't be writing any code today. Also, it is very difficult to write a day's worth of code on this topic. The topic is so large that to write a little bit of code, you end up having to write a lot.

Relational Databases and Object-Oriented Languages, the Oil and Vinegar of Modern Application Development

Some things don't mix easily. Their characteristics are so different that they do not blend with each other naturally. Relational databases and object-oriented programming languages are like that.

Combining the object model and the relational model can enable you to write complex applications that manage huge amounts of data. However, the relational model and the object model come from different worlds and are quite dissimilar. The differences between the two models prevent them from blending easily with each other easily.

The interface between the relational model and the object model is not straightforward, but there are proven techniques for combining the two models. As a C++ database programmer, you possess knowledge of both the relational model and the object model. You are in a unique position to learn how to fuse these two disparate models to build advanced software applications.

How the Relational Model and the Object Model Differ

Relational database systems:

Data is represented in a collection of two-dimensional tables.
Relationships between the data in the tables are expressed by values stored in the tables.
Using SQL, you can create relationships between the data in a dynamic fashion on-the-fly.
The relationships between the data entities do not necessarily need to be defined or envisioned when the database is first created.

Object-oriented systems:

The application consists of a set of objects that contain data and the code to act on that data.
These objects are data types defined by the programmer when the system is first created.
These data types can have complex relationships with each other through techniques such as inheritance and encapsulation.
The relationships between data types cannot be modified easily or created dynamically after the system is built.
The instances (or objects) of these data types can hold references to other objects and can also be stored in collections for enumerating them easily.

Relational systems and object-oriented systems differ from each other fundamentally. These differences can be categorized as

Differences in type systems
Differences in language
Differences in paradigms
Differences in basic data entities

Differences in Type Systems

The type system of a relational database is relatively simple. The database vendor establishes the data types. These data types typically consist of simple numeric, date/time, text, and binary types. Developers cannot create their own types, as they can in the object model.

The data types in a relational database are used to define fields (or columns) in the database tables. The data that goes into each column must match the data type defined for that column.

For a relational database to perform properly, the data in the database tables must conform to the first normal form. The data in each row of each column must be atomic.

The object model is based on objects, each of which can contain many data attributes and the code to act on them. Objects are instances of these data types. The application developer can create new data types as needed.

Because of the complexity of objects, you cannot create usable objects that would be considered atomic types in the relational model. In other words, you cannot store objects in relational tables because objects have many attributes and do not fit the relational definition of an atomic data type.

In addition, the type system of a relational database is established by the database itself. Data of the supported types are the all that you can store in a relational database.

The basic properties of objects are encapsulation, inheritance, and polymorphism. There is no real support for encapsulation, inheritance, and polymorphism in the type systems of relational databases.

Differences in Language

In the relational model, SQL is the only language you can use, and SQL does everything. SQL code expresses the functionality of the database. SQL statements create the tables in which the data is stored. Relationships in the data are expressed through a combination of data in key fields and SQL code. All access to the data in the database is done through SQL.

Each relational database vendor implements SQL for its own database system. Each relational database system contains a SQL engine that executes its particular implementation of SQL. In relational systems, SQL is the only language you can use, and you can use only your database's particular brand of SQL.

SQL is not an object-oriented language. It has no object-oriented constructs or capabilities. There is no real support for encapsulation, inheritance, and polymorphism in SQL. SQL's sole purpose is to store and retrieve data in tables in a relational database.

In the object model, you can use a language such as C++, which supports object-oriented mechanisms such as encapsulation, inheritance, and polymorphism. Using C++, you can define your own data types. Instances of data types (objects) can contain multiple attributes as well as code to operate on those attributes. You can use mechanisms such as encapsulation, inheritance, and polymorphism to create complex hierarchies and networks of interrelated objects.

With C++, you can implement data models that are highly complex and intricate. However, C++ provides only primitive mechanisms for persisting and retrieving objects from a permanent store such as a hard disk.

SQL was built to store and retrieve data from tables in a database and has no object capabilities. C++ was built to create complex object systems and has extremely meager data storage capabilities.

These two languages are fundamentally different from each other, and no facility is built in to either language for interfacing with the other. The two languages are as different as English and music.

Differences in Paradigms

In the relational model, the database server dictates the model, the language, and the type system. Applications must use the database server on the database server's terms. If the application wants to use a model that is different from the model provided by the database, the application must provide the interface that maps between the models.

The necessity of using a database server on its terms has been the driving force behind the creation of database interfaces such as ODBC and OLE DB. These APIs are an effort to provide an interface between databases (which use the relational model) and C++ applications (which can use either the procedural model or the object model). At the very least, what these APIs do is translate between the data types in relational databases and data types in C++.

The relational model sees the world as a set of two-dimensional tables. Data that cannot be easily represented in a tabular form is difficult to fashion in a relational database.

The field is the basic unit of the relational database. The field is the smallest data element in a relational database. All data access in a relational database is performed on a field basis. All data access is done through SQL, which is built to work with fields.

The relational model is built to handle large quantities of data. If you can mold your data so that it conforms to the relational model, a relational database can manage enormous quantities of data quite effectively.

By contrast, object-oriented systems try to mimic objects in the real world. You don't have to try to mold your data to fit the object model. Instead, you create objects that mimic or reproduce the attributes and characteristics of real-life objects.

This is why the object model can handle more complexity than the relational model. With the relational model, you have to make the data fit the model. With the object model, you can make the model fit the data.

Differences in Basic Data Entities

A common mapping between the object model and the relational model is the direct mapping of a class to a relational table, as shown in Figure 13.1.

Figure 13.1 : Mapping a C++ class to a relational table.

As you can see in Figure 13.1, a C++ class corresponds to a relational table. The data members of the class correspond to fields in the table. Instances of the class correspond to records in the table.

One thing that makes it difficult to interface the relational model with the object model is that the basic data entities of the two models do not correspond to each other. In the C++ object model, the basic data entity is the class. In the relational model, the basic data entity is the field. C++ classes do not map to fields. C++ classes map to tables.

C++ and SQL are built around their basic data entities. C++ is built to work with classes. SQL is built to work with fields. When you try to map between the two models, you end up having to make your SQL code work with tables and your C++ code work with the data members.

Object Databases Versus Relational Databases

With all the difficulty in trying to map between two such disparate models, it might seem better to simply use one model for all your application development. If you were to use an object database management system (ODBMS) instead of a relational database management system (RDBMS), you could avoid the muss and the fuss of mapping between the object model and the relational model.

However, some applications lend themselves to object databases, and some do not. Your choice of database technology should depend on the particular requirements of your application.

Object databases and relational databases each have specific characteristics, and each will provide particular capabilities in your applications.

As you learned in Day 1, "Choosing the Right Database Technology," C++ object databases directly support the type system of the C++ language. In other words, you can use a C++ object database to store instances of C++ classes right in the database.

Refer to Listing 1.5 to see how well and how easily C++ databases can integrate with C++ applications. By using simple overrides of a few C++ operators, you can easily persist and retrieve your C++ objects in an object database.

C++ object databases support

The C++ type system, including classes that are defined by the application developer
Inheritance, where the database understands the class hierarchy and manages the object store appropriately
Polymorphism, where objects can be read from the database without knowing in advance their complete type information
Object identities, where the database assigns a unique ID to each object, can readily retrieve each object by ID, and can determine whether the object has already been loaded into program memory
References to other objects, where the database understands pointer references between objects and can store and retrieve linked objects appropriately

Using an object database means that your database will directly mirror the objects in your application. The objects in the database will be the same as the objects in your C++ code. Having a database that mirrors the objects in your C++ code is a two-edged sword with two potentially negative consequences.

First, the only data that can be retrieved from your database is the data encapsulated in the objects in your C++ source code. All the data, relationships, and uses of that data must be defined in the source code for your application. You might not be able to foresee all the potential uses of this data during your application development. This means that your application and its database could miss opportunities to be useful in the future.

Second, the only applications that will be able to access the database will be C++ programs that have an intimate knowledge of the objects in your application source code. The database will be a closed, proprietary database. This means that your database could miss opportunities to be useful in the future.

NOTE

If your data is important to you, it is probably important to others, also. They will want access to the data through more than just your application.

It is important to remember what makes most database applications valuable is the information that they provide. Your application processes the data from the data source into useful information and presents its view of the information. However, people will invariably want access to the data source themselves so that they can perform additional analyses for use in different applications.

TIP

Over time, applications often depreciate, while databases appreciate. The value of a given application will decline as business needs change, but information is always valuable.

Because information is always valuable, its value often justifies the additional time and effort required to place it in a relational database and create an object-to-relational mapping layer for your application.

Figure 13.2 illustrates the closed nature of a C++ object database and the openness and availability of a relational database.

Figure 13.2 : The relative openness of object databases and relational databases.

Some applications lend themselves to object databases, whereas other applications lend themselves to relational databases. Whether an application should use an object database or a relational database is largely determined by the need for flexibility in using the data.

A relational database will be open to other uses and other applications. It will also be open to future versions of your application that might need to apply new analyses to data from the database. A relational database enables you to create new relationships and collections of data that you had not envisioned at the time the database was first created. If your database conforms to the normal forms, you can simply write some new SQL code to create new relationships and to perform new analyses on the data. Relational databases provide a high degree of flexibility in defining new uses for the data.

Object databases lend themselves to applications in which the data model is complex and the relationships are well defined at the time the database is created. Some data models are too complex to be molded into a relational database. For specialized applications like this, you will need to take advantage of the object-oriented features of C++ to model the complexities of the data. An object database will enable you to easily persist those C++ objects to a data store. Defining new uses for that data and making that data available to other applications might not be easy, however.

The Costs and Benefits of Using Relational Databases with C++

To use a relational database with a C++ object-oriented application, you must create a mapping layer that translates between the object model and the relational model. Creating this mapping layer can be a costly endeavor in terms of time and effort, but can also result in significant benefits and synergies for your application.

The costs and difficulties of creating an object-to-relational mapping layer include

Dealing with two semantic spaces where the logic for your application is mani-fested in two different programming models
Creating two separate designs: a relational design and an object design
Building an interface to translate between the relational model and the object model
Developing the code in each object to communicate with the translation layer
Your application not being able to handle highly complex data models because of the relational database and the need for the translation layer

Using a relational database with an object-oriented application enables you to

Create new uses for the data in the database because the data itself is valuable.
Use the strengths of the relational model, by using the relational database to obtain aggregate totals directly from the server without having to instantiate lots of objects and access each of the data members of each object.
Use the strengths of the object model by using C++ objects to handle nested relationships that are difficult to model in a relational database, such as a bill of materials.
Take advantage of the many server management tools available for relational database servers.

Techniques for Integrating Relational Databases with Object-Oriented C++ Programs

There are a few general strategies you can use to build an object-oriented application that uses a relational database.

Begin by Designing the Relational Database

In general, when you are creating the object-oriented application that uses a relational database, it is not a good idea to create the object model first and then try to map it to relational tables. Object models can be significantly more complex than relational models. If you build the object model first, you might not be able to create a relational model that matches it.

Even in relatively simple object models, an object might need to be stored across several relational tables, and many joins might be required to construct the objects. This can cause application performance to deteriorate.

Because the relational database is the foundation of your application, and because the relational model might ultimately limit the degree of complexity that your application can handle, you should begin with the design of the relational database.

If you are able to model the data for your application effectively in a relational database, you will know that your application can indeed handle the level of complexity that is required. Refer to Day 7, "Database Design," for information on designing relational databases.

Relational models tend to be simpler than object models. Therefore, entities in a relational model can usually be converted to entities in an object model quite readily. If you use this approach, the relational database schema becomes the basis for the object model.

Create Simple Object/Relational Mappings

Simple mappings between the relational and object models provide the best performance. Remember KISS (Keep It Simple, Sweetie).

Map Tables to Classes

As shown in Figure 13.1, the simplest and most straightforward mapping is to map a class to each relational table. The class absolutely must contain the primary key and foreign key fields from the table as data members and would probably contain data members for all the fields in the table as well.

Routines need to be developed for reading and persisting objects to and from the database. These routines will need to have intimate knowledge of the C++ classes and of the database. These routines could exist in each class, or they could be implemented in friend classes.

Encapsulating disconnected ADO Recordsets (which you learned about in Day 12, "Using Microsoft Transaction Server to Build Scalable Applications") in your classes can go a long way toward encapsulating the field data members, as well as the reading and persisting routines that each of your classes needs.

Use Primary and Foreign Keys to Map Relationships Between Objects

The simplest approach for defining a relationship between two objects is to embed the primary key of one object as a data member of the other object. (As you will recall from Day 2, "Tools for Database Development in Visual C++ Developer Studio," and Day 3, "Retrieving Data Through Structured Query Language (SQL)," a primary key from another table is called a foreign key.) This approach works well for one-to-one and one-to-many relationships.

For many-to-many relationships, it is necessary to have a link table in the database that contains the primary keys of the two related entities. This is illustrated in Figure 13.3.

Figure 13.3 : Many-to-many relationships in a relational database.

Figure 13.3 shows a relational database schema. It has a Customers table, with CustomerNumber as the primary key, a Products table, with ProductNumber as the primary key, and a Purchases table, with a composite key of CustomerNumber and ProductNumber. The Purchases table's CustomerNumber and ProductNumber fields are also foreign keys from the other two tables. The Purchases table defines a many-to-many relationship between records in the Customers table and the Products table. A customer can purchase many products, and a product can be purchased by many customers.

Modeling many-to-many relationships in an object system might involve storing a collection (or disconnected Recordset) containing foreign keys inside the objects involved in the relationships.

Don't Try to Map Object Inheritance in Relational Databases

Object inheritance is all but impossible to implement in relational databases. The best approach generally is to map only the leaf classes (the concrete classes at the bottom of the hierarchy) to tables in the database. For example, refer to the C++ classes declared in Listing 13.1

Listing 13.1 Creating C++ Classes with Inheritance

 1:  class shoe
 2:  {
 3:  public:
 4:    int sole;
 5:    int upper;
 6:  };
 7:
 8:  class athleticshoe : public shoe
 9:  {
10:  public:
11:    int tongue;
12:    int laces;
13:  };
14:
15:  class basketballshoe : public athleticshoe
16:  {
17:  public:
18:    int archsupport;
19:  };

In this case, it would probably be best to create a single table in the database, called BasketballShoes. The BasketballShoes table would contain fields for sole, upper, tongue, laces, and archsupport. You would also need to have an additional field for the primary key in the BasketballShoes table. You should also add a data member to the shoe class to hold the primary key from the database table.

If other classes derive from athleticshoe, such as crosstrainer, you would want to create a CrossTrainers table that contains fields for sole, upper, tongue, laces, the primary key, and whatever data members are in the crosstrainer class.

Sometimes, however, it might be necessary to get a count all of the shoes. That would mean that you would have to tell the database to count the records in the BasketballShoes table, count the records in the CrossTrainers table, and add those two numbers together. That would not be very elegant and will become uglier if the number of different types of shoes increases.

One possible solution is to create a Shoes table that contains fields for the primary key, sole, and upper and remove the sole and upper fields from the BasketballShoes table and the CrossTrainers table. You would need to add a field to the Shoes table to indicate the shoe type.

The idea then would be to add a record for each shoe to the Shoes table and also add a record to whatever additional table is appropriate.

For instance, a basketball shoe would have a record in the Shoes table, which indicates its sole and upper. It would also have a record in the BasketballShoes table that indicates its tongue, laces, and archsupport. The Shoes table and the BasketballShoes table would have a one-to-one relationship with each other, based on the primary key in each table. Likewise, a cross-trainer shoe would have a record in the Shoes table and a record in the CrossTrainers table.

The database schema would look like the one shown in Figure 13.4.

Figure 13.4 : A relational database schema to model a single level of inheritance.

Data in the database would look something like the data shown in Figure 13.5.

Figure 13.5 : Data in a relational database schema that models a single level of inheritance.

As you can see in Figure 13.5, there are two records in the Shoes table and one record each in the BasketballShoes table and the CrossTrainers table.

The class definition for the shoe type would have two data members added to it, as shown in Listing 13.2.

Listing 13.2 Changes to the Base SHOE Class

 1:  class shoe
 2:  {
 3:  public:
 4:    int shoeID;
 5:    int shoeType;
 4:    int sole;
 5:    int upper;
 6:  };

To get a count of all the shoes using this schema, you needn't count the records from multiple tables and add up the counts to get the total number of instances. You merely get a count of the records in the Shoes table.

This schema also enables you to look up a shoe by its ID (without knowing its type in advance) and discover its type. You can then use its type to execute the proper SQL query to perform a join with the appropriate table to get all the shoe's attributes.

Note that in relational database servers, you cannot use a variable for a table name in a compiled stored procedure. Therefore, in a stored procedure you could not put the name of the table from the ShoeType field in a variable and use that variable in the FROM clause of a SELECT statement to get the rest of the shoe's attributes. However, you could use that variable as a flag in a case-type or switch-type statement in a SQL query to execute the appropriate query to retrieve the attributes from the appropriate table.

As you can see, mapping object-oriented concepts to the relational model requires imagination and potentially lots of code.

Create a Live Object Cache

The biggest performance hit in database applications is database access. If you can minimize the number of times the application must access the database, the performance will be dramatically faster than if the application has to hit the database frequently.

To reduce the number of database hits, applications can use an object cache. The idea is to cache objects read from the database in RAM so that the next time the objects are needed, they can be read from the cache instead of the database.

Using a cache provides significant performance benefits because accessing data in RAM is much faster than accessing data in a database. A cache can also reduce the number of costly network roundtrips between a client application and a database server.

When the client application asks for an object, the translation layer should look to see whether that object already exists in the cache. The translation layer can use the primary key as the object identifier. If the object is in the cache, the translation layer can give the application a pointer to the existing object without having to query the database. This is a huge optimization, but requires a bit of code.

Unfortunately, describing the code required to implement an object cache is beyond the scope of the book. Some technical white papers on this topic are available from programming tool vendors who specialize in this kind of work.

Use the Strengths of Both Models

Take advantage of objects when you can, and take advantage of the RDMS server when you can-use both.

For example, if you need to get the total number of instances in the database, do not count them by instantiating every object inside a loop in the client code. Instead, ask the database server to simply count the records and return the figure to the client application that needs it. The performance will be much better with this approach.

Another example is if you need to traverse a tree of nested objects, such as in a bill of materials. It would probably be inefficient to have the relational database server traverse the tree. Instead, you should instantiate the appropriate objects and have the object code traverse the tree.

Summary

Relational databases and object-oriented programming languages are powerful tools for managing data and writing software. Unfortunately, melding these two technologies is not straightforward. This is because relational databases were not designed to store objects, and objects were not designed to be stored in relational databases.

Melding an object-oriented application with a relational database requires you to write a translation layer between the object code and the relational database. Writing this translation layer can be difficult and time-consuming. However, your application and your database can derive great benefits from the synergies of these two technologies.

Q&A

Q	Are any software tools available that make the task of writing the translation layer easier?
A	Yes. There are independent software vendors who produce tools for just this purpose. You can find them by perusing the advertisements in the various relational database or C++ technical journals. You can also search the Web for terms such as object database, RDBMS, ODBMS, persistence, mapping, translation layer, and so on.
Q	Aren't the vendors of relational databases extending their databases to support the storage of objects?
A	Yes. Relational database vendors such as Informix, Oracle, and others have made efforts to extend their databases to support object storage. However, there is no clear indication of significant market acceptance of any of their approaches so far. Let the buyer beware.
Q	Can't I just create a set of C++ base classes that talk to the translation layer and derive the classes in my application from these base classes to get easy communication with a relational database?
A	If only it were that simple. One of the problems you will encounter is that a C++ base class will have trouble persisting an instance of a derived class, because the derived class might contain data that the base class does not know about. The derived classes themselves will probably need to participate in some way in their being persisted to the database.
Q	What is Microsoft's approach to object storage?
A	Microsoft does not seem to be trying to extend its SQL Server database to make it store objects. Rather, Microsoft has provided OLE DB as an object-oriented API that can communicate with relational as well as object-oriented data stores.

Workshop

The Workshop quiz questions test your understanding of today's material. (The answers appear in Appendix F, "Answers.") The exercises encourage you to apply the information you learned today to real-life situations.

Quiz

What prevents you from being able to store C++ objects in relational database fields?
Why can't you use SQL for object-oriented programming tasks?
What are the primary differences between C++ object databases and relational databases?
When designing an application that will use object and relational technology, where do you start?
What are the benefits of a live object cache?

Exercises

Write a SELECT statement that retrieves the shoe type based on the shoe ID from the Shoes table shown in Figure 13.5.
Write a SELECT statement that retrieves all the attributes of basketball shoes from the tables shown in Figure 13.5.

Teach Yourself Database Programming with Visual C++ 6 in 21 days

Day 13 Melding Object-Oriented Programming with Relational Databases