Backup system progress (by alaric)
I mentioned an intent to build a backup system this Christmas. Well, I've not had two days to work on it yet; I've just had an hour and a half one evening, in which I've written the local-filesystem backend...
But last night I had a dream about it. In particular, I dreampt of visiting an old friend I've neglected to keep in touch with lately; and at her house she had a rather snazzy little mini-laptop, like an EeePC or something, which had an eSATA connector into which she plugged some humungous (in terms of gibibytes; it was the size of a double CD case, physically) external hard disk upon which she had a filesystem that seemed to work very much like Fossil/Venti, the original inspiration for my backup system - or perhaps like a giant personal Git repository.
In particular, one feature that cropped up in the dream was that the filesystem had a magic file on it which was an RSS feed of recent changes to files.
Which got me thinking about features for version 2 of my backup system (if I get to finish version 1, that is!). My focus is on offline use, for batched backups, but the filesystem in my dream was being used online.
For a start, we could have a log of changes to tags, as in my system tags are the "roots" of the archive system. The creation of new tags, the deletion of tags, or the updating of a tag to point to a new snapshot would all be logged. This could then be used to create the RSS feed.
Secondly, it shouldn't be too hard to write a FUSE interface to the thing for read-only access, presenting a root directory containing an RSS file generated from the log, along with a directory for each tag, which in turn contains a current
subdirectory containing the current value of the tag as well as dated subdirectories containing all the past contents of the tag, in ISO date format so they sort correctly. And perhaps an RSS file just listing the history of that tag.
But then the next cool thing would be to allow write access, by using a local disk directory as a staging area, so the current
subdirectory can be written to (with the changes being spooled to the local disk). Then when a commit command is given, those changes are merged into current
. Which would require a new underlying operation to merge changes in, rather than taking a new snapshot; the difference being that any files or directories missing in the directory tree being snapshotted are 'inherited' from a previous snapshot already in the system, with some mechanism to reflect deletions.
Anyway, unrelated to the dream, it also occurred to me that it'll be neat to support replicated archives; my implementation of the backend architecture will make it easy to take a set of backend instances and merge them into one, with every write going to all of them and reads serviced from the first one that succeeds. It'll also be easy to support staged archives, where a number of read-only backends are checked for existing blocks, but all new blocks go to a nominated writable backend, with another backend adapter. That will allow for generational backup systems, where a local disk is filled up with backups until it reaches a size limit, whereupon its contents are shipped off to a DVD writer (keeping a local cache of what blocks where on there, so the DVD need not be put in unless the contents of blocks are actually needed).
But, all idle speculation for now. I still need another day and three quarters to implement the core of the thing...