This document explains the effects of give-up, which is a symptom that can occur when using replication.
Altibase replication uses the log_based replication method. In other words, this is a method to ensure data consistency between both servers by transferring logs created on the active side to the standby side.
In Altibase, log files are not circularly recreated by creating only a certain number of log files, but log files are automatically created continuously as long as the log is created. Therefore, unnecessary log files must be deleted periodically to prevent disk full.
Log files are deleted when the checkpoint is executed, and log files under the following conditions cannot be deleted.
(1) Log file referenced by the transaction currently in progress
(2) Log file that needs to be referenced in replication because replication transmission is not possible
(3) Log file referenced by the CLR log (Abbreviation for Compensation Log Record, the type of log record created when a transaction is rolled back)
Therefore, if the replication data transmission is slow for some reason, the log file cannot be deleted and disk full may occur.
If log files are not deleted due to a replication problem, the user can specify the maximum number of log files that can be kept in replication to prevent disk full.
If the number of log files that have not been deleted exceeds this number, even if the replication data has not been transmitted, the replication data is abandoned and the log file is deleted to prevent the disk to be full.
In this way, the phenomenon of deleting log files necessary for replication even though replication data is not transmitted is called replication give-up.
When replication give-up occurs, data inconsistency between both active and standby servers may occur, resulting in serious problems.
To prevent replication give-up, the user must ensure that network speeds that can affect replication performance are always stable.
In addition, in order to avoid give-up even in abnormal conditions such as power failure or network disconnection, consider the worst-case and increase the value of the property so that give-up does not occur.
REPLICATION_MAX_LOGFILE = 400
The unit of this property means the number of log files, and if it is set to 400 as in the example above, if the number of log files that have not been deleted currently is 400 or more, a replication give-up occurs.
The user can set how to restart the replication after a brief pause after exceeding the value set by the REPLICATION_MAX_LOGFILE property.
REPLICATION_SENDER_START_AFTER_GIVING_UP = 1 (default)
If set to 0, the replication "restart SN" (that is, the value of the XSN column of the SYS_REPLICATIONS_ meta table) is initialized to -1, and the replication is stopped. Then, the value of the IS_STARTED column of the SYS_REPLICATIONS_ meta table is changed to 0.
When set to 1, the value of "Restart SN" for replication is changed to the last (largest) SN of the current log file, and replication is performed again from this "Restart SN".
A check of whether a give-up needs to be performed is made when the checkpoint is performed.
This is because the log file is deleted at a checkpoint.
When give-up occurs, the give-up time is recorded in the replication meta table.
iSQL> set vertical on; iSQL> select replication_name, is_started, give_up_time from SYSTEM_.SYS_REPLICATIONS_; REPLICATION_NAME : REP1 IS_STARTED : 1 GIVE_UP_TIME : 1 row selected.
REPLICATION_NAME : Replication name
IS_STARTED : Whether the replication has started (start 1, stop 0)
GIVE_UP_TIME : The date and time that the replicated was most recently given up.