Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This replication is being performed using Amazon Data Database Migration Service (DMS).

Data is replicated with mere seconds delay. In the event of failure, emails are sent out.

...

kuali-cloud-replication (DMS Terraform)

Error Recovery

In the Terraform code, the parameters RecoverableErrorInterval and RecoverableErrorCount determine how long the task will wait before retrying and how many times it will retry before failing when encountering a recoverable error. If a fatal error is encountered, or the RecoverableErrorCount is reached, the task will fail and need to be manually restarted via the console. Below are examples of recoverable and fatal errors.

Recoverable Errors

  • Oracle error code is '12541' ORA-12541: TNS:no listener

  • Oracle error code is '1033' ORA-01033: ORACLE initialization or shutdown in progress

Fatal Errors

  • 2024-12-03T22:54:33 [SOURCE_CAPTURE ]E: Error 1236 (Could not find first log file name in binary log index file) reading binlog [1020493] (mysql_endpoint_capture.c:1263)
    This usually occurs when the task is too far behind in reading the CDC logs (retry interval was too long)

When restarting a failed task, it’s always best to restart instead of resume. This will ensure the task performs a full load of all tables, then starts replicating for the latest CDC logs. This is the least error prone approach.

...