Distribute your backups using git-annex

Backuping a whole bunch of photos and videos might be a difficult task. Taking care of the consistency of your backups is even more complicated task. Besides that you don't want to have your backups at a single place, thus mitigating the impact of a single point of data loss.

I've been using git-annex for some years but not like a pro user, rather than "it just works". Meanwhile I've heard of ownCloud and I've liked it because one can access its data via web, mobile client or whatever. In connection with a VPN (I prefer openvpn) solution you can have a secure way of remotely accessing your data from everywhere.

Well this post is where the link between git-annex and ownCloud should be emphasized: Use ownCloud as your "frontend" tool for accessing the data while letting git-annex do the "backend" (aka backup) job. While this might sound like a pretty easy task, it does have some peculiarities to be taken into consideration.

Backup setup

There is one centralized repo on my raspberry pi where my HDD is attached to. Usually I push stuff to the HDD using rsync from my laptop or ownCloud using my mobile clients. Afterwards the encrypted repo and the data itself is being pushed to some external server. Data is being encrypted using my private GPG key. From the server I could then replicate the repo+data stuff to some cloud provider like AWS, DropBox or whatever.

Additionally one could push the git repo to GitHub in an encrypted form - without the data itself. It will then only contain the git information (symlinks) but no data (annexed data).

Using ownCloud with git-annex

ownCloud will act as a front-end and can be used by any ownCloud client. The data itself is then managed by git-annex which basically acts as a back-end. One can access the data using ownCloud but you there are some restrictions:

  • already annexed data can't be deleted
  • you can add files/folders and delete them only if these weren't added to the git-annex repo yet

That means: Newly added data (but not committed to the repo) can be deleted by the client which added the data. In this case old data can't be deleted. You'll have to work with git-annex to do that.

Encryption

One the important constraints before pushing my backups into the cloud was security. In order to be able to encrypt my stuff before pushing into the cloud, I had to

  • generate a GPG key
  • create a special remote (see below) which offers encryption

GPG

Generating a GPG key was the easiest step. Afterwards I had to make sure that my keychain was available git-annex for a specific period of time. Here are my configuration files:

$ cat ~/.gnupg/gpg.conf | tail -n 3

use-agent
pinentry-mode loopback

and then the GPG agent configuration:

$ cat ~/.gnupg/gpg-agent.conf

pinentry-program /usr/bin/pinentry 
default-cache-ttl 180000
max-cache-ttl 864000
allow-loopback-pinentry

Keychain

Then I've found keychain which helps you manage your SSH and GPG keys in a secure manner. Adding this to your bashrc/zshrc/whatever

$ eval `keychain --inherit any --eval --agents ssh,gpg id_rsa_backup xxx`

* keychain 2.8.2 ~ http://www.funtoo.org
 * Inheriting ssh-agent (7268)
 * Inheriting gpg-agent (3528)
 * Known ssh key: /home/cyneox/.ssh/id_rsa_backup

 * Known gpg key: xxxx

will cache your GPG and SSH keys to a specific time of period.

git-annex reference

Create git repo

$ git init

Create git annex repo

$ git-annex init <name>

Add remotes

Create bare git repo on the server

$ git init --bare oc-encrypted 

Add special remote (ssh+rsync)

$ git-annex initremote oc-encrypted type=gcrypt gitrepo=ssh://backup.ext/home/backup/oc-encrypted keyid=xxxx

Synchronize data

Sync only git repository to ssh remote

$ git-annex sync oc-encrypted

Sync git repo + content to ssh remote

$ git-annex sync --content oc-encrypted

If you have a look at the repo oc-encrypted on the external server, you'll see only encrypted stuff:

% tree | head -n 20 
.
|-- HEAD
|-- annex
|   |-- keys.lck
|   |-- objects
|   |   |-- 000
|   |   |   |-- 32a
|   |   |   |   `-- GPGHMACSHA1--0222fb5a96702da2e0e0d763f3893d0a97897d32
|   |   |   |       `-- GPGHMACSHA1--0222fb5a96702da2e0e0d763f3893d0a97897d32
|   |   |   |-- d88
|   |   |   |   `-- GPGHMACSHA1--55d8e2ea7626a3958b0182192e7cf34c8be09fd5
|   |   |   |       `-- GPGHMACSHA1--55d8e2ea7626a3958b0182192e7cf34c8be09fd5
|   |   |   `-- db9
|   |   |       `-- GPGHMACSHA1--2b2a121170d00fdb06a04d8df80b6135c4c51d7e
|   |   |           `-- GPGHMACSHA1--2b2a121170d00fdb06a04d8df80b6135c4c51d7e
|   |   |-- 001
|   |   |   |-- 380
|   |   |   |   `-- GPGHMACSHA1--6dbdddddde415496c429c9788428fffb358e55fa
|   |   |   |       `-- GPGHMACSHA1--6dbdddddde415496c429c9788428fffb358e55fa
|   |   |   |-- 590

% file annex/objects/000/32a/GPGHMACSHA1--0222fb5a96702da2e0e0d763f3893d0a97897d32/GPGHMACSHA1--0222fb5a96702da2e0e0d763f3893d0a97897d32 
annex/objects/000/32a/GPGHMACSHA1--0222fb5a96702da2e0e0d763f3893d0a97897d32/GPGHMACSHA1--0222fb5a96702da2e0e0d763f3893d0a97897d32: GPG symmetrically encrypted data (AES cipher)

Troubleshooting

Find broken symlinks

$ find . -xtype l

or

$ find . -type l -! -exec test -e {} \; -print

Garbage collection

Delete unused (annexed) data

$ git-annex unused 
unused . (checking for unused data...) (checking master...) 
  Some annexed data is no longer used by any files:
    NUMBER  KEY
    1       SHA256E-s1048577--dd8a6196a5a42dc394ed277191024ba51149167f2afd577557e29d4495ce107b.this-is-a-test-key
    2       SHA256E-s11--1fd9176b4dc46b02de28fc850c160d9d0bf71ebd3cddac52b83b288d73645d89
    3       SHA256E-s1048575--b877cbd76972eabf53837edf24af92f3567ff9dc6cc42c420f5ebbcb911d0ad5.this-is-a-test-key
    4       SHA256E-s2097152--be41ea1dc3c13e45848717d213bf64d11171f221b86be4b91c56baa17193ee6e.this-is-a-test-key
  (To see where data was previously used, try: git log --stat -S'KEY')

  To remove unwanted data: git-annex dropunused NUMBER


  Some partially transferred data exists in temporary files:
    NUMBER  KEY
    5       GPGHMACSHA1--5b029d5db5dde1c7e12e347580e732c00de22f6e

  To remove unwanted data: git-annex dropunused NUMBER

ok

Now you can drop the unused data:

$ git-annex dropunused 1-4

Delete unused remote

First you'll have to mark the repo as dead:

$ git-annex dead <remote name>

Then you'll have to forget the dead repo:

$ git-annex forget --force --drop-dead

And finally you can remove the remote using git:

$ git remote remove <repo name>

Prev: The Digital Future in Berlin
Next: ringzer0 CTF - Jail Escaping PHP

comments powered by Disqus
Published:
2016-06-16 00:00
category:
Tag: