Set up the Amazon S3 Bucket:
To be written.
About Duply and Duplicity:
Duplicity is a python based shell application that makes encrypted incremental backups to remote storage locations. Duply is a frontend wrapper for Duplicity, designed to simplify setting up, managing, and running server backup and recovery activities.
FYI, duplicity provides the ability to create full and incremental backups. It does not let you create differential backups, however. This will result in some potentially long recovery chains, since recovery requires that all increments since the last full backup are available.Such long recovery chains translate into more potential recovery errors and much slower recoveries.
This means that Duply and Duplicity might not be the ideal solution, but my backup needs are modest so I’m okay with this situation for now. To get around ‘the long chain problem’, I will force a full backup fairly frequently.
Install Duply and Duplicity:
First lets install the necessary software onto our server (all commands are run as the root user):
# yum install duplicity duply python-boto
Generate the GPG Encryption Keys:
Next, we’ll need to generate the GPG keys so that Duply/Duplicity can encrypt and sign the backups. If you don’t already have a key, the next few commands will get you set up.
First, install a random number generator to make sure there are enough random bytes from which to generate a key. (FYI, this is needed for CentOS 6+ servers but might not be required for others.)
# yum install rng-tools
When installation is complete, open ‘
/etc/sysconfig/rngd' and add
Now start the random number generator service…
# service rngd start
From this point forward, you should NOT be logged in as root and you should NOT use sudo to execute any of the GPG commands. GPG behaves strangely when sessions are nested, such as when you log in as userX and use
sudo -s. (More on this below)
Next, make sure that
gpg-agent is running. Type ….
$ gpg-agent -s --daemon --write-env-file --use-standard-socket
After a moment or two, you will see something like the following where
username is your Network ID username and
Machine is the name of your system…
GPG_AGENT_INFO=/N/u/username/Machine/.gnupg/S.gpg-agent:22743:1; export GPG_AGENT_INFO;
Now generate the key…
$ gpg --gen-key
After a few moments, something like the following will appear…
gpg (GnuPG) 2.0.14; Copyright (C) 2009 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
gpg: keyring `/N/u/username/Machine/.gnupg/secring.gpg' created
gpg: keyring `/N/u/username/Machine/.gnupg/pubring.gpg' created
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
1 to select the default key.
Next, GPG will prompt you to choose a keysize (in bits). Enter
You will see…
Requested keysize is 2048 bits
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0)
Enter the value for how long the key should to remain valid. GPG will prompt for confirmation – enter
n as appropriate.
GPG now asks for information to be used to construct a user ID that will identify the key. At the prompts, enter a name, email address, and a comment.
GPG will now prompt to confirm or correct the information provided…
You selected this USER-ID:
"Full Name (comment) <email@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?
o to accept the user ID, or to correct errors or quit the process, enter the appropriate alternative (
If the user ID is accepted, GPG will prompt for a password. Choose a strong passphrase. Once the password has been added and confirmed, GPG will begin generating the key. You’ll see…
We need to generate a lot of random bytes. It is a good idea to
perform some other action (type on the keyboard, move the mouse,
utilize the disks) during the prime generation; this gives the
random number generator a better chance to gain enough entropy.
This process may take a moment to wrap up, but when it’s done you’ll see something like…
gpg: key 09D2B839 marked as ultimately trusted
public and secret key created and signed.
gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 4 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 4u
gpg: next trustdb check due at <expiration_date>
pub 2048R/09D2B839 2015-06-25 [expires: <expiration_date>]
Key fingerprint = 6AB2 7763 0378 9F7E 6242 77D5 F158 CDE5 09D2 B839
uid Full Name (comment) <firstname.lastname@example.org>
sub 2048R/7098E4C2 2015-06-25 [expires: <expiration_date>]
In this example, 09D2B839 is the name of the main signing key, and we have one subkey 7098E4C2 for encryption. These keys are enough to create encrypted and signed backups.
Make a copy of this data as well as the passphrase and store it somewhere safe. You won’t be able to decrypt the backups without these.
Create a Revocation Certificate:
Next, we’ll create a way of invalidating the GPG key pair in case of a security breach or loss of the secret key.
This should be done as soon as the key pair is made, not later. Keep the revocation key in a secure, separate location in case the server is compromised or becomes inoperable.
(FYI: This won’t work if you created the key-pair as ‘root’, such as when using ‘sudo’ to execute GPG. GPG won’t prompt for the required password if you’re in a ‘nested session’. In short, you should create the keys and the revocation certificate when logged in as a non-root user – some folks even recommend creating a specific user just for this purpose.)
$ gpg --gen-revoke email@example.com
Choose any of the available options, although since this is being done ahead of time, some specifics won’t be available. You will then be asked to provide a comment, and finally, to confirm the selections.
A revocation certificate will be generated to the screen…
Revocation certificate created.
Please move it to a medium which you can hide away; if Mallory gets
access to this certificate he can use it to make your key unusable.
It is smart to print this certificate and store it away, just in case
your media become unreadable. But have some caution: The print system of
your machine might store the data and make it available to others!
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: A revocation certificate should follow
-----END PGP PUBLIC KEY BLOCK-----
Copy and paste this key to a secure location, or print it for later use.
Create & Customize Backup Config Files:
Next, we need to create a ‘configuration profile’ (aka. ‘backup config’) for each backup job that we want done.
By default, this backup config file will be stored as ‘~/.duply/profile_name‘, where ~ is the current user’s home directory. Instead, we want to store these in the folder ‘/etc/duply’. If this folder exists prior to creating a config file, Duply will automatically create profiles for the super user root there instead of the user’s home folder.
Check to see if this folder already exists. If not, go to the root directory and type…
# mkdir /etc/duply/
Now we can go ahead and create a new backup config profile for each backup job we want to run…
# duply profile-name create
This will create a generic backup config file at ‘/etc/duply/profile_name/conf’ that can now be tweaked. (It also creates a file called ‘exclude’ which I’ll discuss later.)
# cd /etc/duply/profile-name/conf
# nano conf
Now look for the statement, GPG_KEY and add your public key and password…
Next, look for
GPG_OPTS and set the compression defaults….
Next, search for TARGET=’scheme://user[:password]@host[:port]/[/]path’, and modify it to point to your S3 bucket’s ‘endpoint’, like so…
Although you can include the user and password in the S3 URL, this is not considered safe. Instead, uncomment the lines for TARGET_USER and TARGET_PASS, and modify like so…
Now look for the line, SOURCE=’/path/to/source’, and change it as follows…
Next search for and uncomment the line, #MAX_AGE=1M, and set the interval value (to be used by the purge command) like so…
You can use any of several formats to specify MAX_AGE:
- The string “now” (refers to the current time)
- A sequence of digits, like “123456890” (indicating the time in seconds after the epoch)
- A string like “2002-01-25T07:00:00+02:00” in datetime format
- An interval, which is a number followed by one of the characters s, m, h, D, W, M, or Y (indicating seconds, minutes, hours, days, weeks, months, or years respectively), or a series of such pairs. In this case the string refers to the time that preceded the current time by the length of the interval. For instance, “1h78m” indicates the time that was one hour and 78 minutes ago. The calendar here is unsophisticated: a month is always 30 days, a year is always 365 days, and a day is always 86400 seconds.
- A date format of the form YYYY/MM/DD, YYYY-MM-DD, MM/DD/YYYY, or MM-DD-YYYY, which indicates midnight on the day in question, relative to the current time zone settings. For instance, “2002/3/5”, “03-05-2002”, and “2002-3-05” all mean March 5th, 2002.
Next, search for and uncomment the line, #MAX_FULL_BACKUPS=1, and set the maximum number of full backups to keep…
Now find and uncomment the statements, ‘#VOLSIZE=50’ and DUPL_PARAMS=”$DUPL_PARAMS –volsize $VOLSIZE “, then reset the VOLSIZE to something larger than the 25MB default, like this…
DUPL_PARAMS="$DUPL_PARAMS --volsize $VOLSIZE "
Create PRE and POST Scripts:
Duply lets you use ‘
pre' and ‘
post' scripts when doing backups. The
pre script is executed just before the backup, while the
post script executes immediately after the backup completes.
These scripts allow you to do such things as create MySQL database dumps that can be included in the backup. The
post files must be included in ‘/etc/duply/profile_name/’ directory together with the conf and exclude files.
The folowing example
post scripts create MySQL database dumps, place these into the /tmp folder before the backup procedure starts, and then deletes the dump files once the backup is finished.
… Pre Script Example
/usr/bin/mysqldump --all-databases -u root -l> /tmp/sqldump-$(date '+%F')
-- or --
/usr/bin/mysqldump --databases blog engine live mysql -u root -l > /tmp/sqldump-$(date '+%F')
-- or --
/usr/bin/mysqldump --databases blog -u root -l > /tmp/sqldump-blog--$(date '+%F')
/usr/bin/mysqldump --databases engine -u root -l > /tmp/sqldump-engine-$(date '+%F')
/usr/bin/mysqldump --databases live -u root -l > /tmp/sqldump-live-$(date '+%F')
/usr/bin/mysqldump --databases mysql -u root -l > /tmp/sqldump-mysql-$(date '+%F')
In the first example, all databases are dumped into a single file. In the second example, only the selected databases are dumped into a single file. In the third, each selected database is dumped into it’s own file. In all the examples, the ‘-l’ parameter locks the database tables while the dump is being performed.
… Post Script Example
/bin/rm /tmp/sqldump-$(date '+%F')
-- or --
In the above examples, the dump files are deleted from ‘/tmp’ directory.
It’s also important to backup the configuration for your profile – this is needed in order to recover the backup files. Technically this only needs to be done once, usually right after the configuration file is finished.
Some schools of thought believe it best to backup the
conf file each time a backup is performed. You can do this automatically by add the following commands to the
post script to create a tar of the profile immediately after each backup is finished…
# Archive the profile in the ~/.duply directory.
tar -cvzf $backup_file -C /etc/duply $profile_name
chmod 600 $backup_file
You will need to copy the *.tar.gz file to a secure storage location, preferably portable offline storage such as a CD, DVD, or USB thumb-drive. This way backups can be recovered from another machine or server in the event the current unit is lost or destroyed.
(NOTE: The above script still needs some work – as is, it copies over the entire directory each time, including the tar files that were left from last time. The trick is to use the script once each time the configuration changes, then delete the tar file from the profile_name directory. Once you’re done, just comment out this script.)
Create the Backup Whitelist:
Next, we want to tell Duply which folders and files should be backed up. We do this by editing the file, /etc/duply/profile_name/exclude. Here’s an example:
The format is simple. Duplicity checks each file against the rules in this file, starting from the top, until it finds a rule that matches. If the rule is preceded by a ‘+’ the file will be backed up, and if it is preceded by a ‘-‘ it will be ignored. The last rule ‘- **’ tells duplicity to ignore all files that didn’t match an earlier rule.
Don’t forget the ‘- **’ at the bottom – this tells Duply to stop looking for more files to include in the backup.
Okay, we’re finally ready to test Duply!
At the command prompt, type…
# duply profile_name backup
FYI, if you have a large amount of data, this could take a while to run.
Once the backup has completed, go to the AWS S3 control panel to check that files have in fact been uploaded to the right place.
To make sure things worked properly from the server side, type this at the prompt to get a complete listing of all the files and folders that were backed up…
# duply profile_name list
Backup & File Recovery:
Naturally, the point of doing a backup in the first place is so that files and folders can be recovered if for any reason the originals are corrupted or lost.
There two ways to do this. The first is to ‘fetch’ an individual file, the second is to ‘restore’ a complete backup:
- Typing ‘duply fetch <src_path> <target_path> [<age>]’ will restore a single file/folder from backup [as it was at <age>].
- Typing ‘duply restore <target_path> [<age>]’ will restore the complete backup to <target_path> [as it was at <age>].
Here’s an example of how to fetch a specific file:
# duply profile_name fetch var/www/live/app/webroot/resources/95/work/opening.jpg /mnt/restore/opening.jpg
And here’s an example of how to restore a complete backup:
# duply profile_name restore /mnt/restore
For either of these to work, you’ll need to make sure the target destination (e.g. /mnt/restore) already exists – Duply won’t create new directories for you.
All that’s left now is to set up a CRON job to execute the backup task automatically. But first, we need to take a deeper look at Duply parameters.
Duply uses a variety of command line parameters for backup, maintenance, recovery of data. (Check out Duply’s man page for details). You can ‘chain’ several parameters within a single command by separating them with an underscore (_).
duply /root/.duply/test full_verify_purge --force creates a full backup and deletes any old backups. Backups where
MAX_AGE is exceeded are listed by
purge and deleted via the additional option
This involves editing the ‘/etc/crontab file’ (or whatever file you use to manage crontabs) by adding…
15 4 * * * root /usr/bin/duply profile_name backup_cleanup_purge --force > /dev/null
This example will run the profile_name backup job at 4:15am each day of each month. It will also clean up and delete any files that have gone past their expiry date.
Note that it’s important to make sure there’s a line return after the last cron job in the list – I put a comment hash to make sure this is the case.