If you are reading this blog, that means you have reached the final stage of moving data from Oracle to PostgreSQL after schema analysis and object conversion as described in “Oracle to PostgreSQL Migration: Phases, Differences & Challenges.”
What is Data Migration?
In this post, we will discuss data migration and available strategies. Data migration is a process of selecting, preparing, extracting, transforming and applying the data from one database to another database. Since the data migration is between two RDBMS(Oracle/PostgreSQL), it will be a challenging and time consuming process due to heterogeneous structure/data types, if it's not tackled and handled with the right tools.
What Are Data Migration Strategies?
Many of the data migration strategies fall under three categories:
- Snapshot (one big bang)
- Snapshot - Parallel in chunks (trickle)
- Change Data Capture (cdc/data sync)
Using data migration strategies one should achieve data movement is completed with minimal risk, stay on budget(commercial tools/resources), less downtime to a minimum and in case of failure a rollback plan available.
What To Do Before Data Migration
After verifying the source and target database compatibility, the following prerequisites should be met for data migration.
- Server Resources(Memory/Disk Space/network ports opened between source and destination)
- Sistema operativo
- Install/configure data migration softwares and related drivers
Let's discuss data migration strategies in detail.
Data Migration Strategy: Snapshot (On Big Bang)
In this approach, a snapshot of the source database state is taken and applied on the target database. Data moved from Oracle to PostgreSQL at once. During the snapshot process, no WRITE operation allowed on the source database. It's one of the clean and easy methods of data migration.
- Data movement will be completed at once
- No Data Type challenges(LoBs)
- After snapshot, application start accessing the target database
- No special configuration required. Easy to manage.
- READ ONLY users can access source database(in some cases)
- The application will be down during the snapshot.
- If a snapshot is interrupted, reinitiation is required.
Data Migration Strategy: Snapshot - Parallel in Chunks (Trickle)
It's another type of snapshot approach where data objects are broken into chunks and snapshots are taken in parallel. Most of the tools support snapshot and the process are invoked in tandem. There are two ways to perform a snapshot in chunks, 1) table by table or 2) a large table split into small sets using Primary Key or any row uniqueness. In this approach, the snapshot duration and downtime window is reduced to a great extent. Good scripting skills required to prepare data migration tools for table or large table migration.
- One time data moved with less downtime
- Data moved in parallel - Table by table or Large table is small sets
- Application downtime required (less compared to big bang approach)
- For large table small sets, Primary Key or row uniqueness is mandatory
- Script required to adjust the parallel approach
- If a snapshot is interrupted, re-initiation is required.
Data Migration Strategy: Change Data Capture (CDC - Data Sync)
There are different traditional Change Data Capture (CDC) approaches available from decades. In CDC model data migration, a software is designed to determine/track/capture the data that has changed on the source database in real time and replayed the same on the target database. Today CDC model softwares is in high demand, because they distribute data between heterogeneous databases with low-latency, reliable and scalable data. Most common CDC approaches for migrating Oracle to PostgreSQL databases are
- Trigger Based and
- Transaction Log-based
Trigger Based (CDC) - a remote trigger is created to capture DMLs(insert/update/delete) before or after and the sequence of transaction events are captured in change log(shadow tables), then they are processed by the software to replay on the target database.
Transaction Log-Based (CDC) - every database will have transaction logs(redos) to store all database transactions/events sequentially and used in case of database crash recovery. Using native database transaction log plug-ins the transactions(DMLs) can be captured to change log with some filters, transforms and aggregations. Later, the sequence of captured DMLs will be replicated to the target database.
Both CDC approaches can be used to replicate data from Oracle to PostgreSQL. However, each has its own merits and limitations, depending on the requirements we can choose one of the available approaches. There are very good softwares available to support both CDC approaches.
- Data loaded continuously to target database(after initial snapshot)
- User can access source database while data loading to target database
- Data sync control (if interrupted, it can be resumed)
- Partial replication (Set of tables can be replicated)
- Need replication software
- In trigger based cdc, there could be slight performance overhead
- No Large Objects support
- Partial to small application downtime(switchover time)
- Only commercial/free to use tools available, no open source.
What Are the Free Tools Available for Data Migration?
There are open and free to use tools available for data migration in three different categories we have discussed in the blog. Below are the set of tools that we came across under those categories:
- Others (Full convert, Ispirer)
- Snapshot - Parallel in chunks
- Change Data Capture
- Servidor de replicación de EDB
- Other tools (Oracle GG, DBConvert
I hope this blog helps you better understand the data migration strategies and tools available.
For a deeper dive, watch our on-demand webinar, Conquering Data Migration from Oracle to PostgreSQL!
Raghavendra Rao is a Managing Consultant in Professional Services Division in EnterpriseDB(Headquarters). He has been working with PostgreSQL since 2009 and over 20 years of working experience with relation database systems. He architects, migrates, manages, consults, speaks, teaches & blogs about PostgreSQL and provide enterprise solutions. Prior to EDB, he has worked with companies to model data, visualize and deliver critical information to make informed business decisions; using the relational database systems. Raghavendra holds a Masters Degree in Computer Information Systems from the University of Harvard. He holds several certifications including EDB PostgreSQL Certified Professional and Certified Training Instructor.