prune_backups
is a small tool to tidy (prune) incremental backups created with rsync (and other backup tools) to the typical pattern of one per hour for a day, one per day for a month, and then one per month.
The tool prune_backups
takes one directory name as command line argument. It looks for subdirectories in that directory matching the naming pattern YYYY-MM-DD_HH-mm. The tool interpretes these directory names as dates and keeps exactly one of these directories for the current hour, one for the last hour and so on. The tool will always keep the latest and move all other directories into a subdirectory 'to_delete'. The tool ONLY MOVES directories and DOES NOT ACTUALLY DELETE anything!
- Install the golang-compiler. It is free and available on a wide variety of operating systems and processor architectures, including x86, x64, ARM32, ARM64, Windows, Linux, MacOS, etc.. See https://go.dev/dl/.
- Download source code:
git clone https://github.com/TomTonic/prune_backups.git
. - To compile the source code, change into the subdirectory that git just created and run
go build
. - The executable is called
prune_backups
orprune_backups.exe
, depending on your system. You can put it anywhere you want - it is fully self-contained. - You are ready to go!
On the command line: prune_backups -dir=/mnt/backups
In a script (with context):
#!/bin/sh
# this is the target directory where all backups shall be stored. adapt to your needs
my_backup_storage_dir=/srv/backup/mywebserver
# create the directory name for the current backup. this must match the naming scheme of prune_backups
current_snapshot_dir=$(date +%Y-%m-%d_%H-%M)
# the rsync command will do the actual backup of the directory /var/www from the server mywebserver.example.com, logging into this machine with the user backupuser.
# --link-dest=$my_backup_storage_dir/latest will ensure rsync creates hardlinks for identical files, so more diskspace is only needed for new/changed files
# see rsync documentation.
# to make sure we can identify incomplete backups by their directory name, we start the directory name with an underscore character (_).
rsync -avR --checksum --delete --link-dest=$my_backup_storage_dir/latest backupuser@mywebserver.example.com:/var/www $my_backup_storage_dir/_$current_snapshot_dir
cd $my_backup_storage_dir
# to make sure we can identify incomplete backups by their directory name, we started the directory name with an underscore character (_). now rename it to indicate it was complete.
mv _$current_snapshot_dir $current_snapshot_dir
# make the latest snapshot easily referable for next incremental backup. see above
ln -nsf $current_snapshot_dir latest
# prune old backups
prune_backups -dir=$my_backup_storage_dir
# uncomment the following line if you really want to delete the old backups
# rm -rf $my_backup_storage_dir/to_delete
You would run this script hourly via cron on your backup server to backup your web server.
Scenario: This example assumes you have a cron-job running hourly in the 49th minute, each creating a separate backup directory (for example with rsync --link-dest
). It is the 17th of June 2024 today, 09:54 in the morning when you run prune_backups
. It will leave your backup directory (-dir
parameter) with the following directory layout:
Directory name | Directory name (cntd.) | Directory name (cntd.) | Directory name (cntd.) |
---|---|---|---|
🟨 2024-06-17_09-49/ | 🟨 2024-06-16_18-49/ | 🟦 2024-06-09_23-49/ | 🟦 2024-05-25_23-49/ |
🟨 2024-06-17_08-49/ | 🟨 2024-06-16_17-49/ | 🟦 2024-06-08_23-49/ | 🟦 2024-05-24_23-49/ |
🟨 2024-06-17_07-49/ | 🟨 2024-06-16_16-49/ | 🟦 2024-06-07_23-49/ | 🟦 2024-05-23_23-49/ |
🟨 2024-06-17_06-49/ | 🟨 2024-06-16_15-49/ | 🟦 2024-06-06_23-49/ | 🟦 2024-05-22_23-49/ |
🟨 2024-06-17_05-49/ | 🟨 2024-06-16_14-49/ | 🟦 2024-06-05_23-49/ | 🟦 2024-05-21_23-49/ |
🟨 2024-06-17_04-49/ | 🟨 2024-06-16_13-49/ | 🟦 2024-06-04_23-49/ | 🟦 2024-05-20_23-49/ |
🟨 2024-06-17_03-49/ | 🟨 2024-06-16_12-49/ | 🟦 2024-06-03_23-49/ | 🟦 2024-05-19_23-49/ |
🟨 2024-06-17_02-49/ | 🟨 2024-06-16_11-49/ | 🟦 2024-06-02_23-49/ | 🟦 2024-05-18_23-49/ |
🟨 2024-06-17_01-49/ | 🟨 2024-06-16_10-49/ | 🟦 2024-06-01_23-49/ | 🟦 2024-05-17_23-49/ |
🟨 2024-06-17_00-49/ | 🟦 2024-06-15_23-49/ | 🟦 2024-05-31_23-49/ | 🟩 2024-04-30_23-49/ |
🟨 2024-06-16_23-49/ | 🟦 2024-06-14_23-49/ | 🟦 2024-05-30_23-49/ | 🟩 2024-03-31_23-49/ |
🟨 2024-06-16_22-49/ | 🟦 2024-06-13_23-49/ | 🟦 2024-05-29_23-49/ | 🟩 2024-02-29_23-49/ |
🟨 2024-06-16_21-49/ | 🟦 2024-06-12_23-49/ | 🟦 2024-05-28_23-49/ | 🟪 to_delete/ |
🟨 2024-06-16_20-49/ | 🟦 2024-06-11_23-49/ | 🟦 2024-05-27_23-49/ | 🟫 some_other_directory/ |
🟨 2024-06-16_19-49/ | 🟦 2024-06-10_23-49/ | 🟦 2024-05-26_23-49/ | 🟫 latest -> 2024-06-17_09-49/ |
- 🟨 Your backup directory will contain (up to) 24 directories for the last 24h. If there are multiple directories for a certain hour,
prune_backups
will keep the latest directory (determined by name, not by metadata) and prune the other directories for this hour. If there is no backup directory for a certain hour, that hour will simply be skipped; i.e. there won't be any extra hourly backups appended for compensation after the 24h mark. - 🟦 Your backup directory will contain (up to) 30 directories for the last 30 days. If there are multiple directories for a certain day,
prune_backups
will keep the last hourly directory (determined by name, not by metadata) and prune the other directories for this day. If there is no backup directory for a certain day, that day will simply be skipped; i.e. there won't be any extra daily backups appended for compensation after the 30 days mark. - 🟩 Your backup directory will contain directories for each month before that. If there are multiple directories for a certain month,
prune_backups
will keep the last daily directory (determined by name, not by metadata) and prune the other directories for this month. If there is no backup directory for a certain month, that month will simply be skipped; i.e. there won't be any extra daily backups kept for compensation in neighboring months or any magic like that. - 🟪 The directory
to_delete
is created byprune_backups
in the backup directory; and it moves all pruned directories here. You can change the name of this directory with theto_directory
parameter. Please note that this directory should reside in the same filesystem as your backup directory. - 🟫 Files, softlinks, or directories with other naming schemes than
YYYY-MM-DD*
will remain untouched.
The exact naming pattern is YYYY-MM-DD_HH-mm, where
- YYYY is the 4-digit year,
- MM the 2-digit† month,
- DD the 2-digit† day,
- HH the 2-digit† hour (24h format), and
- mm the 2-digit† minute of the time the backup was created.
You cannot change this pattern unless you change the golang code. However, the tool will also work when you don't have the minutes or hours in your directory names, i.e. a naming pattern of YYYY-MM-DD is sufficient. The tool will simply will not prune hourly backups in this case.
† Please be aware that the tool needs a trailing zero.