/
Kuali Cloud Replication

Kuali Cloud Replication

Overview

Because Tufts wants to retain the on-prem Award Budget Tool (ABT), whose data was previously co-located with on-prem Kuali Coeus, while also moving to Kuali Coeus Cloud, there’s a need to replicate the data from Kuali Coeus Cloud into the on-prem database for the ABT.

This replication is being performed using Amazon Database Migration Service (DMS).

Data is replicated with mere seconds delay. In the event of failure, emails are sent out.

Design

Components

kuali-cloud-replication (DMS Terraform)

Error Recovery

In the Terraform code, the parameters RecoverableErrorInterval and RecoverableErrorCount determine how long the task will wait before retrying and how many times it will retry before failing when encountering a recoverable error. If a fatal error is encountered, or the RecoverableErrorCount is reached, the task will fail and need to be manually restarted via the console. Below are examples of recoverable and fatal errors.

Recoverable Errors

  • Oracle error code is '12541' ORA-12541: TNS:no listener

  • Oracle error code is '1033' ORA-01033: ORACLE initialization or shutdown in progress

Fatal Errors

  • 2024-12-03T22:54:33 [SOURCE_CAPTURE ]E: Error 1236 (Could not find first log file name in binary log index file) reading binlog [1020493] (mysql_endpoint_capture.c:1263)
    This usually occurs when the task is too far behind in reading the CDC logs (retry interval was too long)

  • Oracle error code is '1017' ORA-01017: invalid username/password; logon denied
    User locked out or not re-created after a refresh

When restarting a failed task, it’s always best to restart instead of resume. This will ensure the task performs a full load of all tables, then starts replicating for the latest CDC logs. This is the least error prone approach.

image-20241204-154227.png