You can create schedules in CockroachDB for periodic backups. Once a backup schedule is created, you can do the following:
- Set up monitoring for the backup schedule
- View scheduled backup details
- View and control the backup schedule
- View and control a backup initiated by a schedule
- Restore from a scheduled backup
Supported products
The feature described on this page is available in CockroachDB Advanced, CockroachDB Standard, CockroachDB Basic, and CockroachDB Self-Hosted clusters when you are running self-managed backups. For a full list of features, refer to Backup and restore product support.
Considerations
Protected timestamps and scheduled backups
Scheduled backups ensure that the data to be backed up is protected from garbage collection until it has been successfully backed up. This active management of protected timestamps means that you can run scheduled backups at a cadence independent from the GC TTL of the data. This is unlike non-scheduled backups that are tightly coupled to the GC TTL. See Garbage collection and backups for more detail.
The data being backed up will not be eligible for garbage collection until a successful backup completes. At this point, the schedule will release the existing protected timestamp record and write a new one to protect data for the next backup that is scheduled to run. It is important to consider that when a scheduled backup fails there will be an accumulation of data until the next successful backup. Resolving the backup failure or dropping the backup schedule will make the data eligible for garbage collection once again.
You can also use the exclude_data_from_backup
option with a scheduled backup as a way to prevent protected timestamps from prolonging garbage collection on a table. See the example Exclude a table's data from backups for usage information.
Backup collection storage URI and schedule backups
You will encounter an error if you run multiple backup collections to the same storage URI. Each collection's URI must be unique.
For example, if you have a backup schedule running backups for the database users
the full backup and incremental backup should have the same storage URI for the full and incremental schedule. (CREATE SCHEDULE FOR BACKUP
will automatically create two schedules for the full and incremental backup to the same storage URI.) If there is another backup schedule, for the database accounts
, the full and incremental backups for accounts
should have the same storage URI. However, the storage URI for the accounts
backup collection should be different to the storage URI for the users
backup collection.
Create a new backup schedule
To create a new backup schedule, use the CREATE SCHEDULE FOR BACKUP
statement. For example:
> CREATE SCHEDULE schedule_label
FOR BACKUP INTO 's3://test/backups/test_schedule_1?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
WITH revision_history
RECURRING '@daily'
WITH SCHEDULE OPTIONS first_run = 'now';
In this example, a schedule labeled schedule_label
is created to take daily (incremental) backups with revision history in AWS S3, with the first backup being taken now. A second schedule for weekly full backups is also created by default. Both schedules have the same label
(i.e., schedule_label
).
For more information about the different options available when creating a backup schedule, see CREATE SCHEDULE FOR BACKUP
.
Further guidance on connecting to Amazon S3, Google Cloud Storage, Azure Storage, and other storage options is outlined in Use Cloud Storage.
Set up monitoring for the backup schedule
We recommend that you monitor your backup schedule with Prometheus, and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time—at which point, you can inspect schedules by running SHOW SCHEDULES
.
Metrics for scheduled backups fall into two categories:
Backup schedule-specific metrics, aggregated across all schedules:
schedules.BACKUP.started
: The total number of backups started by a schedule.schedules.BACKUP.succeeded
: The number of backups started by a schedule that succeeded.schedules.BACKUP.failed
: The number of backups started by a schedule that failed.When
schedules.BACKUP.failed
increments, runSHOW SCHEDULES
to check which schedule is affected and to inspect the error in thestatus
column. If a backup job encounters too many retryable errors, it will enter afailed
state with the most recent error, which allows subsequent backups the chance to succeed. Refer to the Backup and Restore Monitoring page for metrics to track backup failures.schedules.BACKUP.protected_age_sec
: The age of the oldest protected timestamp record protected by backup schedules.schedules.BACKUP.protected_record_count
: The number of protected timestamp records held by backup schedules.
Scheduler-specific metrics:
schedules.round.reschedule-wait
: The number of schedules that were rescheduled due to a currently running job. A value greater than 0 indicates that a previous backup was still running when a new scheduled backup was supposed to start. This corresponds to theon_previous_running=wait
schedule option.schedules.round.reschedule-skip
: The number of schedules that were skipped due to a currently running job. A value greater than 0 indicates that a previous backup was still running when a new scheduled backup was supposed to start. This corresponds to theon_previous_running=skip
schedule option.
schedules.round.reschedule-wait
and schedules.round.reschedule-skip
are gauge metrics and can be graphed. A continual positive value for either of these metrics may indicate a misconfigured backup cadence, and you should consider adjusting the cadence to avoid waiting for or skipping the next backup.
For a tutorial on how to use Prometheus to set up monitoring and alerting, see Monitor CockroachDB with Prometheus.
View scheduled backup details
When a backup is created by a schedule, it is stored within a collection of backups in the given location. To view details for a backup created by a schedule, you can use the following:
SHOW BACKUPS IN collectionURI
statement to view a list of the full backup's subdirectories.SHOW BACKUP FROM subdirectory IN collectionURI
statement to view a list of the full and incremental backups that are stored in a specific full backup's subdirectory.- Use the Schedules page in the DB Console to view a list of created backup schedules and their individual details.
For more details, see SHOW BACKUP
.
View and control the backup schedule
Once a backup schedule is successfully created, you can view the schedule, pause the schedule, resume the schedule, or drop the schedule.
View the schedule
> SHOW SCHEDULES FOR BACKUP;
For more information, see SHOW SCHEDULES
.
Pause the schedule
To pause a schedule, you can either specify the schedule's id
:
> PAUSE SCHEDULE 589963390487363585;
Or nest a SELECT
clause that retrieves id
(s) inside the PAUSE SCHEDULES
statement:
> PAUSE SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'schedule_database';
For more information, see PAUSE SCHEDULES
.
Resume the schedule
To resume a paused schedule, you can either specify the schedule's id
:
> RESUME SCHEDULE 589963390487363585;
Or nest a SELECT
clause that retrieves id
(s) inside the RESUME SCHEDULES
statement:
> RESUME SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'schedule_database';
For more information, see RESUME SCHEDULES
.
Drop the schedule
To drop a schedule, you can either specify the schedule's id
:
> DROP SCHEDULE 589963390487363585;
Or nest a SELECT
clause that retrieves id
(s) inside the DROP SCHEDULES
statement:
> DROP SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'schedule_database';
When DROP SCHEDULES
removes a full backup schedule, it removes the associated incremental backup schedule, if it exists. For more information, see DROP SCHEDULES
.
DROP SCHEDULE
does not cancel any in-progress jobs started by the schedule. Before you drop a schedule, cancel any in-progress jobs first, as you will not be able to look up the job ID once the schedule is dropped.
View and control a backup initiated by a schedule
After CockroachDB successfully initiates a scheduled backup, it registers the backup as a job. You can view, pause, resume, or cancel each individual backup job.
View the backup job
To view jobs for a specific backup schedule, use the schedule's id
:
> SHOW JOBS FOR SCHEDULE 590204387299262465;
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
---------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+---------+----------------+----------------------------------+---------+----------+----------------------------------+--------------------+-------+-----------------
590205481558802434 | BACKUP | BACKUP INTO '/2020/09/15-161444.99' IN 's3://test/scheduled-backup-0915?AWS_ACCESS_KEY_ID=x&AWS_SECRET_ACCESS_KEY=redacted' AS OF SYSTEM TIME '2020-09-15 16:20:00+00:00' WITH revision_history, detached | | root | running | NULL | 2020-09-15 16:20:18.347383+00:00 | NULL | NULL | 2020-09-15 16:20:18.347383+00:00 | 0 | | 0
(1 row)
You can also view multiple schedules by nesting a SELECT
clause that retrieves id
(s) inside the SHOW JOBS
statement:
> SHOW JOBS FOR SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'test_schedule';
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
---------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+----------------+----------------------------------+---------+----------------------------------+----------------------------------+--------------------+-------+-----------------
590204496007299074 | BACKUP | BACKUP INTO '/2020/09/15-161444.99' IN 's3://test/scheduled-backup-0915?AWS_ACCESS_KEY_ID=x&AWS_SECRET_ACCESS_KEY=redacted' AS OF SYSTEM TIME '2020-09-15 16:14:44.991631+00:00' WITH revision_history, detached | | root | succeeded | NULL | 2020-09-15 16:15:17.720725+00:00 | NULL | 2020-09-15 16:15:20.913789+00:00 | 2020-09-15 16:15:20.910594+00:00 | 1 | | 0
590205481558802434 | BACKUP | BACKUP INTO '/2020/09/15-161444.99' IN 's3://test/scheduled-backup-0915?AWS_ACCESS_KEY_ID=x&AWS_SECRET_ACCESS_KEY=redacted' AS OF SYSTEM TIME '2020-09-15 16:20:00+00:00' WITH revision_history, detached | | root | succeeded | NULL | 2020-09-15 16:20:18.347383+00:00 | NULL | 2020-09-15 16:20:48.37873+00:00 | 2020-09-15 16:20:48.374256+00:00 | 1 | | 0
(2 rows)
For more information, see SHOW JOBS
.
Pause the backup job
To pause jobs for a specific backup schedule, use the schedule's id
:
> PAUSE JOBS FOR SCHEDULE 590204387299262465;
PAUSE JOBS FOR SCHEDULES 1
You can also pause multiple schedules by nesting a SELECT
clause that retrieves id
(s) inside the PAUSE JOBS
statement:
> PAUSE JOBS FOR SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'test_schedule';
PAUSE JOBS FOR SCHEDULES 2
For more information, see PAUSE JOB
.
Resume the backup job
To resume jobs for a specific backup schedule, use the schedule's id
:
> RESUME JOBS FOR SCHEDULE 590204387299262465;
RESUME JOBS FOR SCHEDULES 1
You can also resume multiple schedules by nesting a SELECT
clause that retrieves id
(s) inside the PAUSE JOBS
statement:
> RESUME JOBS FOR SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'test_schedule';
RESUME JOBS FOR SCHEDULES 2
For more information, see RESUME JOB
.
Cancel the backup job
To cancel jobs for a specific backup schedule, use the schedule's id
:
> CANCEL JOBS FOR SCHEDULE 590204387299262465;
CANCEL JOBS FOR SCHEDULES 1
You can also CANCEL multiple schedules by nesting a SELECT
clause that retrieves id
(s) inside the CANCEL JOBS
statement:
> CANCEL JOBS FOR SCHEDULES WITH x AS (SHOW SCHEDULES) SELECT id FROM x WHERE label = 'test_schedule';
CANCEL JOBS FOR SCHEDULES 2
For more information, see CANCEL JOB
.
Restore from a scheduled backup
To restore from a scheduled backup, use the RESTORE
statement:
> RESTORE
FROM '2020/08/19-035600.00' IN 's3://test/backups/test_schedule_1?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
AS OF SYSTEM TIME '2020-08-19 03:50:00+00:00';
To view the backups stored within a collection, use the SHOW BACKUP
statement.