Handling Numbers

28
- From There to Here: Database Migration

-- 28 --

From There to Here: Database Migration

The world is filled with legacy database applications that need to be rightsized for a number of reasons. Mainframe-based applications are often downsized to provide greater functionality, lower connectivity costs, and lower maintenance costs. Small applications that support a small number of users--;or even a single user--;are upsized so that more users can benefit from the features that they offer. Another reason for database migration is the "year 2000 problem"--;a legacy database may use only two digits to store the year portion of a date, whereas Oracle stores both century and year.

Data migration is, indeed, an unending activity. When 90 percent of the legacy applications that were implemented in the early 1970s have been rightsized, programmers will face the task of migrating legacy applications from later in the decade. In fact, developing database applications is the process of creating tomorrow's legacy applications. Of course, unless your own money is funding the migration of an application, you must build a solid business case that justifies a return on the investment; the availability of a faster CPU or a glitzier user interface is not a sufficient reason for making the significant investment that is required to rightsize an information system.

The intent of this chapter is to focus on the issues that pertain to migrating an application to an Oracle database. You can't rely on a standard recipe to accomplish this job; every migration is somewhat unique. Instead of a recipe, I use this chapter to outline the major tasks that you'll face regardless of the application. Of course, the details of migrating/reengineering the application code is beyond the scope of this book.

What Is Migration?

When I use the term migration, I am referring to the process of converting a database application to the Oracle database. Obviously, I'm assuming that the application is using some other database for information storage. This database might fit one of the following categories:

A desktop-based database such as dBASE or Access
A server-based database such as Informix or Sybase
A mainframe-based, nonrelational database such as IMS
A proprietary file management system

For greater efficiency, you should have plenty of free disk space on your target platform and a network connection between the legacy platform and the target platform.

Migrating an application to an Oracle database consists of three main tasks:

Migrating the database structures
Transferring the data into the Oracle structures
Reengineering the application logic

The following sections look at each subtask in greater detail.

Migrating Database Structures

Migration is the first critical task that you must accomplish; reengineering is the second. Migration consists of creating an Oracle table that is equivalent in structure to each legacy table or file. I refer to these Oracle tables as the migrated tables. The role of the migrated tables is that of an intermediary--;they store the legacy data before it is cleaned up and restructured.

Reengineering consists of the following tasks:

Developing a thorough understanding of the legacy data model. Understanding is the starting point for migrating a database application. Without a complete grasp of the legacy data model, you will have difficulty identifying any opportunities for restructuring the data model.
Restructuring the legacy data model so that it is in third normal form (3NF). Many legacy fields are composed of several pieces of information and many legacy files contain a repeating field. Please refer to Chapter 8, "Creating and Modifying Tables," for more information on database design.
Changing the datatypes used by the migrated tables. For instance, a legacy file or table might use a character field for storing date or time information. The reengineered table would use a DATE column for storing the same information.

Figure 28.1 provides a graphical representation of the migration and reengineering tasks.

Figure 28.1 The migration and reengineering tasks.

The legacy database structures that must be migrated depend on the type of database:

For a file management system, you need to migrate records and fields.
For a desktop database, you need to migrate records and fields.
For a relational database, you need to migrate tables, columns, primary/foreign keys, views, and indexes.

I highly recommend the use of a data modeling tool for the migration of a legacy database. You'll want a tool that offers broad support for desktop and server databases, particularly if you're working with a variety of database products. (Your improved productivity will quickly cover the cost of a data modeling tool.) You can think of a data modeling tool as "light" CASE. In other words, a data modeling tool provides valuable support during the analysis phase of a project without forcing you to make a commitment to an entire CASE life cycle.

In the next section, I use ERwin, produced by Logic Works, to support the task of database migration. Although many similar products are available, I've found ERwin to be particularly useful.

Migrating Database Structures Using ERwin ER Data Modeler

ERwin is a Windows-based tool and supports the IDEF1X data modeling standard. In IDEF1X, the logical view of a data model consists of entities, attributes, primary and foreign keys, relationships, and role names. The physical view consists of the tables, columns, and datatypes. ERwin enables you to define domains that can then be shared by several columns. You can switch between the logical and physical views of the data model. Based on both the logical and physical models, ERwin can generate a schema for the target database. The generated schema can be stored to a file or processed directly by the target database.

ERwin provides support for both reverse and forward engineering. Reverse engineering is the process of connecting to a database, capturing data dictionary information, and constructing a graphical representation of the data model. Forward engineering is the process of developing a data model and generating a schema for the target database.

The first step is to identify the target server by selecting the appropriate radio button. (See Figure 28.2.) To illustrate this process, I've selected one of the sample Watcom databases that comes with PowerBuilder.

Figure 28.2 Selecting the target server.

Once you've identified the target server, ERwin must connect to that database. Select Watcom Connection from the Server menu. ERwin displays a dialog box in which you specify the username, password, and database. (See Figure 28.3.)

Figure 28.3 Connecting to the target server.

Once ERwin is connected to the target server, synchronization may begin. ERwin uses the term synchronization to refer to the process of aligning the ERwin data model with that of the target server. The default option is to synchronize ERwin with the target server. Select Synchronize ERwin with Watcom from the Server menu. A list of unsynchronized Watcom tables appears on the right side of the window. (See Figure 28.4.)

Figure 28.4 Synchronizing ERwin with the WATCOM tables.

Highlight both tables--;sales_rep and sales_summary--;and click Import. The tables appear in the list box on the bottom of the window. (See Figure 28.5.) Click Execute to begin synchronization.

Figure 28.5 Selecting the tables to reverse engineer.

ERwin automatically generates a graphical layout of the target server's tables and captures any relationships that exist in the target server. Figure 28.6 illustrates how the sales_rep and sales_summary tables are related by the sales rep ID.

Figure 28.6 Reverse engineering layout of data model.

Now that you've captured the data model from the Watcom database, you may begin the process of generating the schema for Personal Oracle. Select Target Server from the Server menu. This time, select Oracle as the target server. (See Figure 28.7.)

Figure 28.7 Selecting Oracle as the target server.

To connect to Personal Oracle, select Oracle Connection from the Server menu. ERwin displays a dialog box in which you specify the username, password, and database. (See Figure 28.8.) For database, remember to specify oracle7.

Figure 28.8 Connecting to Personal Oracle.

To generate the schema, select Oracle Schema Generation from the Server menu. ERwin displays a window containing many options for schema generation. Using the default settings, click Generate to begin creating the sales_rep and sales_summary tables for Personal Oracle. (See Figure 28.9.)

Figure 28.9 Selecting the schema generation options.

ERwin displays each SQL statement submitted to Personal Oracle for execution. When it has finished generating the schema, ERwin displays the total number of statements that were processed successfully. (See Figure 28.10.)

Figure 28.10 Personal Oracle schema is successfully generated.

Moving the Data

Once you've migrated the database structures, the task of data conversion begins. This activity is typically a two-step process, and it is rarely straightforward. First, you transfer the contents of the legacy structures to the migrated Oracle tables. Second, you copy the contents of the migrated Oracle tables to the target Oracle tables. Along the way, however, you'll discover erroneous data, illegal values, foreign keys whose values don't exist as primary key values--;in short, it's a messy job, but somebody has to do it!

Many methods are available for moving the data. This chapter presents two different tools for accomplishing this task:

SQL*Loader
PowerBuilder's Data Pipeline

Using SQL*Loader

As you read in Chapter 24, "Saving and Loading Data," SQL*Loader is one of the Database Administration Tools that loads the contents of flat or ASCII files into an Oracle database. To use SQL*Loader for data migration, here is a procedure you can follow:

Unload the data from the legacy database into one or more flat files using whatever legacy tools are at your disposal. Remember that the flat files might require a considerable amount of disk storage.
Transfer the flat files to location that is accessible to SQL*Loader. One option is to store the files on a network file server.
Use SQL*Loader to load the contents of each file into the Personal Oracle database.

Using the Data Pipeline in PowerBuilder 4.0/5.0

The data pipeline tool, which was introduced in version 4.0 of PowerBuilder, is flexible and intuitive. Because the data pipeline is a PowerBuilder system object, it has its own attributes, events, and functions; in other words, you can control the creation and execution of a data pipeline from a PowerBuilder application.

To create a data pipeline, click the Data Pipeline icon; it looks like two aqua cylinders connected by a pipe. A window appears from which you can select an existing data pipeline. (See Figure 28.11.) Instead, click New.

Figure 28.11 PowerBuilder: Creating a new data pipeline.

Using its existing database profiles, PowerBuilder displays two lists: a list of source databases and a list of destination databases. Because you're interested in transferring data from the Watcom database to Personal Oracle, select ABNC Sales DB (v4) as the Source Connection and PO7 as the Destination Connection; then click OK. (See Figure 28.12.)

Figure 28.12 Choosing the source and destination database profiles.

Another window appears with a list of the tables that exist in the source connection. Highlight the sales_rep table and click Open. (See Figure 28.13.)

Figure 28.13 Selecting a table for the data pipeline.

PowerBuilder displays a box containing the columns belonging to the sales_rep table. (See Figure 28.14.) By default, no columns are selected for the data pipeline. Click each column to select it; the column name appears at the top of the window in the Selection List. (See Figure 28.15.)

Figure 28.14 The table structure is displayed.
Figure 28.15 Choosing the columns to be included.

Select File [vb] Design. The next window contains the data pipeline description along with execution parameters. (See Figure 28.16.) For example, the Commit field contains the value 100, signifying that a SQL Commit will be performed for every 100 rows that are inserted. To execute the data pipeline, select Options [vb] Execute. When the data pipeline execution finishes, PowerBuilder displays the execution statistics at the bottom of the window. (See Figure 28.17.)

Figure 28.16 The data pipeline options are displayed.
Figure 28.17 The pipeline transfer is finished.

Transforming the Application Logic

Be warned: You cannot buy or build a silver bullet that will automatically and flawlessly translate the legacy application into a client-server application. True, some specialized tools address a specific category of legacy environments, but they can be very expensive; in addition, these tools might impose a methodology that doesn't match the needs of your development environment.

If you merely translate the legacy application logic to the target application development environment, you will have missed an important opportunity to reengineer the application. To provide a simple example, consider a legacy information system that can produce 90 different management reports. Instead of blindly converting each of these reports to the new development environment, take some time to investigate how many of these reports are actually read. The number could be much lower than you expect. In addition, you might need to reengineer some of the reports that are used to make them more useful for the system's users.

Obviously, the first step in migrating the legacy application logic is to become thoroughly familiar with it. Identify the business rules of the legacy application and, where appropriate, use PL/SQL to implement these rules in the form of database triggers and stored programs. Please refer to Chapter 23, "Enforcing Business Rules with Database Triggers," for more information.

Migrating from Another Oracle Database to Personal Oracle

Migrating from another Oracle database--;whether it's from the Oracle Enterprise Server or Oracle Workgroup Server--;is a fairly straightforward process. Use the Export tool to extract the database objects that you want to migrate and use the Import tool to load those database objects into the Personal Oracle database. Be aware that because Personal Oracle doesn't support distributed database operations (except for the Enterprise Edition), you won't be able to use objects such as database links or snapshots in Personal Oracle.

Application Development Environments:
Making an Appropriate Choice

I'm not suggesting that only the tools described in this book are worthy of consideration; other excellent products should also be evaluated. However, the viability of a development tool--;the quality of each new release, its compatibility with new releases of operating systems and databases, and the existence of add-on products from third-party vendors--;depends greatly on its acceptance by the market.

You should consider the following factors in choosing an application development environment:

Support for other server and desktop databases: If you need to support other databases in addition to Oracle, you should probably consider using PowerBuilder, Oracle Developer/2000, Oracle Power Objects, or Visual Basic. This list is by no means exhaustive. Oracle Developer/2000 can access non-Oracle data sources via ODBC. An optional component, Oracle Client Adapter, also works with Developer/2000 to access non-Oracle database servers. Oracle Power Objects supports access to DB2, SQL Server, Sybase, and ODBC data sources.
PL/SQL support: Not surprisingly, the components of Oracle Developer/2000 offer the best support for PL/SQL. Oracle Forms and Reports both have built-in PL/SQL engines, which means that you can use the same scripting language for both the client and server. (However, the current release of Developer/2000 components doesn't use the same version of PL/SQL that is used in the Oracle7 Server--;PL/SQL version 2.1.) The runner-up for this category is PowerBuilder. Although PowerBuilder doesn't have a built-in PL/SQL engine, it can invoke an Oracle stored procedure.
Object-oriented programming: Of the products mentioned in this book, PowerBuilder offers the best object-oriented environment; it provides good support for creating classes, user-defined events, and inheritance. Also, the latest version of PowerBuilder supports the deployment of distributed application objects. Oracle Power Objects provides object-oriented features somewhat similar to those of PowerBuilder. However, if you compare the scripting languages of PowerBuilder and Developer/2000--;PowerScript and PL/SQL, respectively--;PL/SQL is quite limited in its capability to handle more than primitive datatypes. Nevertheless, Developer/2000 has many other features that you should consider when you evaluate application development environments.
Support for multiple platforms: Oracle's Developer/2000 is the only development environment that currently offers production versions for the Macintosh and Motif environments. In July 1995, Oracle released a Macintosh version of Oracle Power Objects. Powersoft is also planning to release a Macintosh and Motif version of PowerBuilder in the near future.
Availability of third-party add-on products: The clear winner in this category is Visual Basic, which has a tremendous number of reasonably priced add-on products. You can find trial versions of many products on the Internet. PowerBuilder also has a substantial number of add-on products, mostly in the form of class libraries. PowerBuilder supports the use of OCXs (OLE controls).

Summary

The focus of this chapter is database migration.

Database migration consists of three principal subtasks: migrating the database structures, moving the data, and transforming the application logic.
The use of a data modeling tool, such as ERwin from Logic Works, supports the task of migrating the legacy data model to an Oracle database.
You can use several methods to move the legacy data into an Oracle database. SQL*Loader can load legacy data from flat files into Oracle tables. PowerBuilder provides a data pipeline tool that copies data from a source database to a destination database.