...
This replication is being performed using Amazon Data Database Migration Service (DMS).
Data is replicated with mere seconds delay. In the event of failure, emails are sent out.
...
kuali-cloud-replication (DMS Terraform)
Error Recovery
In the Terraform code, the parameters RecoverableErrorInterval
and RecoverableErrorCount
determine how long the task will wait before retrying and how many times it will retry before failing when encountering a recoverable error. If a fatal error is encountered, or the RecoverableErrorCount
is reached, the task will fail and need to be manually restarted via the console. Below are examples of recoverable and fatal errors.
Recoverable Errors
Oracle error code is '12541' ORA-12541: TNS:no listener
Oracle error code is '1033' ORA-01033: ORACLE initialization or shutdown in progress
Fatal Errors
2024-12-03T22:54:33 [SOURCE_CAPTURE ]E: Error 1236 (Could not find first log file name in binary log index file) reading binlog [1020493] (mysql_endpoint_capture.c:1263)
This usually occurs when the task is too far behind in reading the CDC logs (retry interval was too long)
When restarting a failed task, it’s always best to restart instead of resume. This will ensure the task performs a full load of all tables, then starts replicating for the latest CDC logs. This is the least error prone approach.
...