Kuali Cloud Replication
Overview
Because Tufts wants to retain the on-prem Award Budget Tool (ABT), whose data was previously co-located with on-prem Kuali Coeus, while also moving to Kuali Coeus Cloud, there’s a need to replicate the data from Kuali Coeus Cloud into the on-prem database for the ABT.
This replication is being performed using Amazon Database Migration Service (DMS).
Data is replicated with mere seconds delay. In the event of failure, emails are sent out.
Design
Components
kuali-cloud-replication (DMS Terraform)
Error Recovery
In the Terraform code, the parameters RecoverableErrorInterval
and RecoverableErrorCount
determine how long the task will wait before retrying and how many times it will retry before failing when encountering a recoverable error. If a fatal error is encountered, or the RecoverableErrorCount
is reached, the task will fail and need to be manually restarted via the console. Below are examples of recoverable and fatal errors.
Recoverable Errors
Oracle error code is '12541' ORA-12541: TNS:no listener
Oracle error code is '1033' ORA-01033: ORACLE initialization or shutdown in progress
Fatal Errors
2024-12-03T22:54:33 [SOURCE_CAPTURE ]E: Error 1236 (Could not find first log file name in binary log index file) reading binlog [1020493] (mysql_endpoint_capture.c:1263)
This usually occurs when the task is too far behind in reading the CDC logs (retry interval was too long)Oracle error code is '1017' ORA-01017: invalid username/password; logon denied
User locked out or not re-created after a refresh
When restarting a failed task, it’s always best to restart instead of resume. This will ensure the task performs a full load of all tables, then starts replicating for the latest CDC logs. This is the least error prone approach.