Failover
pgagroal can failover a PostgreSQL instance if clients can't write to it.
Configuration
In pgagroal.conf define:
failover = on
failover_script = /path/to/myscript.shThe script will be run as the same user as the pgagroal process so proper permissions (access and execution) must be in place.
Failover Script
The following information is passed to the script as parameters:
- Old primary host
- Old primary port
- New primary host
- New primary port
Example Script
A basic failover script could look like:
#!/bin/bash
OLD_PRIMARY_HOST=$1
OLD_PRIMARY_PORT=$2
NEW_PRIMARY_HOST=$3
NEW_PRIMARY_PORT=$4
# Promote the new primary
ssh -tt -o StrictHostKeyChecking=no postgres@${NEW_PRIMARY_HOST} pg_ctl promote -D /mnt/pgdata
if [ $? -ne 0 ]; then
exit 1
fi
exit 0Script Requirements
- The script is assumed successful if it has an exit code of 0
- Otherwise both servers will be recorded as failed
- The script should handle promotion of the new primary server
- Consider implementing proper error handling and logging
Advanced Failover Scenarios
Multiple Replica Configuration
When multiple replicas are available, the failover script can implement logic to:
- Check replica lag to select the best candidate
- Ensure proper promotion sequence
- Update DNS or load balancer configuration
- Notify monitoring systems
Automatic Failback
Consider implementing automatic failback when the original primary becomes available:
#!/bin/bash
# Check if original primary is healthy
if pg_isready -h $OLD_PRIMARY_HOST -p $OLD_PRIMARY_PORT; then
# Implement failback logic
echo "Original primary is healthy, considering failback"
fiFailover Notification Script
After a successful failover, you can optionally configure a notification script to reconfigure standby servers to follow the new primary.
Configuration
In pgagroal.conf add:
failover_notify_script = /path/to/notify_script.shImportant: The failover_notify_script requires failover_script to be configured. The notification script runs only after a successful failover.
The script runs with the same permissions as the pgagroal process and must be executable by that user.
Script Parameters
The notification script receives server information as separate arguments:
- Old primary host
- Old primary port
- New primary host
- New primary port
- Standby host (if any)
- Standby port (if any)
- Additional standby host/port pairs...
Arguments 5 onwards repeat in pairs (host, port) for each standby server that needs reconfiguration.
Example Notification Script
A basic notification script to reconfigure standbys:
#!/bin/sh
#
# This is an example script for pgagroal failover notification.
#
# The script receives information about the old primary, new primary,
# and remaining standby servers that need reconfiguration:
# $1 = old primary hostname
# $2 = old primary port
# $3 = new primary hostname
# $4 = new primary port
# $5 onwards = standby hostname/port pairs
#
# This script reconfigures each standby to follow the new primary by:
# 1. Stopping the standby
# 2. Attempting pg_rewind to sync with new primary
# 3. Falling back to pg_basebackup if pg_rewind fails
# 4. Restarting the standby
#
# Adjust PGDATA_DIR to match your PostgreSQL data directory.
# Ensure the postgres user can SSH without password to standby hosts.
#
# To configure pgagroal:
# failover_notify_script = /path/to/notify_standbys.sh
#
LOGGER_TAG=pgagroal
PGDATA_DIR=/mnt/pgdata # Adjust this to your data directory
REPLICATION_USER=replicator # Adjust if different
OLD_PRIMARY_HOST=$1
OLD_PRIMARY_PORT=$2
NEW_PRIMARY_HOST=$3
NEW_PRIMARY_PORT=$4
shift 4
logger -s -t $LOGGER_TAG "Failover notification: ${OLD_PRIMARY_HOST}:${OLD_PRIMARY_PORT} -> ${NEW_PRIMARY_HOST}:${NEW_PRIMARY_PORT}"
# Process each standby (host/port pairs)
while [ $# -gt 1 ]; do
STANDBY_HOST=$1
STANDBY_PORT=$2
shift 2
logger -s -t $LOGGER_TAG "Reconfiguring standby ${STANDBY_HOST}:${STANDBY_PORT} to follow new primary..."
# Try pg_rewind first
ssh postgres@${STANDBY_HOST} "
pg_ctl stop -D ${PGDATA_DIR} -m fast &&
pg_rewind -D ${PGDATA_DIR} --source-server='host=${NEW_PRIMARY_HOST} port=${NEW_PRIMARY_PORT} user=${REPLICATION_USER}' &&
pg_ctl start -D ${PGDATA_DIR}
" 2> /tmp/standby_rewind_${STANDBY_HOST}_$$.log
if [ $? -eq 0 ]; then
logger -s -t $LOGGER_TAG "Standby ${STANDBY_HOST}:${STANDBY_PORT} reconfigured successfully"
continue
fi
# Fallback to pg_basebackup if rewind fails
ssh postgres@${STANDBY_HOST} "
rm -rf ${PGDATA_DIR}/* &&
pg_basebackup -h ${NEW_PRIMARY_HOST} -p ${NEW_PRIMARY_PORT} -U ${REPLICATION_USER} -D ${PGDATA_DIR} -Fp -Xs -P -R &&
pg_ctl start -D ${PGDATA_DIR}
" 2> /tmp/standby_basebackup_${STANDBY_HOST}_$$.log
if [ $? -ne 0 ]; then
logger -s -t $LOGGER_TAG "ERROR: Failed to reconfigure standby ${STANDBY_HOST}:${STANDBY_PORT}"
logger -s -t $LOGGER_TAG < /tmp/standby_basebackup_${STANDBY_HOST}_$$.log
exit 1
else
logger -s -t $LOGGER_TAG "Standby ${STANDBY_HOST}:${STANDBY_PORT} reconfigured successfully"
fi
done
logger -s -t $LOGGER_TAG "All standbys reconfigured successfully"
exit 0Script Requirements
- The script must exit with code 0 for success
- Non-zero exit codes are logged as errors but don't affect failover completion
- The script should handle cases where no standbys exist (only 4 arguments passed)
- Consider implementing idempotent operations in case the script runs multiple times
When Notification Runs
The notification script executes only when:
- Failover completes successfully
failover_notify_scriptis configured- At least one standby server exists in the configuration
If no standbys need reconfiguration (only primary and new primary exist), the script may still run but receives only the first 4 arguments.
Monitoring Failover
Monitor failover events through:
- Log files: Check pgagroal logs for failover events
- Prometheus metrics: Monitor server status changes
- External monitoring: Implement alerts for failover events
Best Practices
- Test failover scripts regularly in non-production environments
- Monitor replica lag to ensure replicas are suitable for promotion
- Implement proper logging in failover scripts for troubleshooting
- Consider network partitions and split-brain scenarios
- Document failover procedures for operational teams
- Use configuration management to ensure consistent failover scripts across environments