Ugarit update (by )

Jean woke us up at 6am today, wanting to be in Mummy and Daddy's Bed, but I couldn't get back to sleep after this so lay there thinking about things.

I was pondering the cache-deletion issue in Ugarit that I mentioned before.

The cache in Ugarit serves an important function: in order to know if it needs to upload a block of data or not, Ugarit checks first to see if a block with the same Tiger hash already exists. This operation is performed on everything in your entire filesystem every time you do a snapshot (although I'm scheming up some optimsiations to avoid this based on caching the hashes of files by their modtime and all that). If you have a slow remote archive, such as in S3, then each check for existence of a block might easily take a tenth of a second or more - and since the largest filesystem I need backing up will contain at least a million blocks, that'd be over a day just spent on checking what blocks already exist. So the cache contains a list of known-existing blocks, stored in a B-Tree on the local filesystem. Whenever a block is uploaded, it's added to the B-Tree, as we know it exists in the archive. When a check for the existance of a block comes in, we check the cache; if we see it, then we know instantly that the block exists. Otherwise we check the archive, and if the block was found, we mark this fact in the cache so that we can use it in future. And, of course, if we are asked to unlink a block, we pass that request to the archive, and if it reports that it finally deleted the block rather than just decrementing its reference count, we remove the block's hash from the cache.

However, if you have multiple computers sharing a single archive - which is a perfectly sensible thing to do, as the shared archive will mean that all the different filesystems being backed up to it will share copies (and thus not waste upload bandwidth) of files that appear in both (such as most of everything outside of /var, /home, and */etc on a NetBSD system) - then deletion with caches poses an issue: if you delete a file, and update your local cache, but somebody else is also using a cache on the same archive, then their cache will not know about the deletion. This is dangerous since it means that cache will then falsely report existence of a block that's not actually in the archive, which means the contents of that block won't be uploaded - and since it was deleted from the archive, it means that block won't be backed up anywhere. Danger, danger!

But as I lay there thinking, a solution came to me.

I should make the cache backend maintain an intent log of deletions it proposes to make. When this log is about to become too big to fit in a block itself, it should:

  1. Upload it to the storage backend
  2. Request the key of the block pointed at by a special **DELETION CHAIN HEAD** tag
  3. Upload a small block containing that key, and the key of the block full of deleted block keys.
  4. Update the **DELETION CHAIN HEAD** tag to point to the new block
  5. Process the actual deletions

That way, it keeps a log of deleted references in the archive, and makes sure it's initialised before doing any actual deletes.

Then the cache just needs to store a local copy of the **DELETION CHAIN HEAD** tag. When it starts up (or wants to do a periodic check) it can fetch the one from the archive; if they differ, then it should follow the chain of deletion blocks, removing them from the cache, until it reaches the deletion block with the key it has stored, or the end of the list, at which point it can stop and update its local copy of the tag.

There's still potential races - another deletion operation running in parallel might race over the **DELETION CHAIN HEAD** tag, although I've tried to keep only a very small block upload within the window between getting and setting the tag - so I've added tag-lock! and tag-unlock! primitives to the storage engine API, to avoid that entirely.

More excitingly, if a deletion is running in parallel with a snapshot, then the cache being used by the snapshot might not realise a block has been deleted and return success right away.

Perhaps I need to extend tag-lock! and tag-unlock! to be a full-fledged semaphore system, so I can maintain archive invariants, such as that there may be N snapshots running or 1 deletion, like a read/write lock. But I don't like locks - doing it without locks would be much better!

Currently, the archive storage engine API won't quite allow the intent log, anyway, since it just has an unlink! function that returns false if the block still has references, and returns the contents of the block if it does not (so that the caller can then recursively unlink! everything mentioned from within that block). So there's no easy way of asking the storage engine we're wrapping the cache around whether an unlink! would delete a block without actually deleting it. But, we can make do without, at the cost of a little less safety; we can instead make the cache store an after-the-fact log of deleted block keys, and just upload them when it gets full.

So, I'm still not sure if we need the complexity of safe deletions. Are frequent deletions actually a good idea anyway? The neat thing about a content-addressed store is that it does work well as a continual archive, as it essentially stores differences. I decided to implement deletion since I know there will be situations where the thought of freeing up a hundred gigabytes will be more inviting than having a year's snapshots from a decade ago lying around; if I don't implement deletion, then users will forever be pestering me about it. So perhaps I should just leave deletion in as is, along with a warning (automatically output when a cache spots its first deletion request of a session) to the user that doing a deletion will invalidate any other caches around the same archive on different machines.

Still, one cheery thought struck me: if you're running a snapshot, and your computer crashes, then you can just start the snapshot again. We'll only update the snapshot tag when a snapshot is completed, so you won't get a partial snapshot; but when you start again, all the blocks you'd already uploaded will not need to be uploaded again, so it'll just pick up about where it left off last time. Excellent!

Backup system progress update (by )

I forgot to mention before... I chose a name for it. Ugarit, after the site of the oldest human archive yet found.

Anyway, I had another few hours at it last night (I think my total's up to about a working day now; one more working day to go before I know if my two day estimate was correct!).

I've implemented two fundamental abstractions. In a limited-block-size reference-counted content-addressed store, a large file has to be stored as lots of chunks, with the keys of the chunks stored in one or more indirect chunks full of keys. If there's too many keys to fit in one indirect chunk, then you need lots of indirect chunks, and in turn, one or more doubly-indirect chunks to store the keys of THEM in, until you finally have a single chunk at the top of a big tree, whose key you can use to refer to the file.

Reference counting complicates this. As you're writing the file's chunks, some may already exist in the archive, and some might not. Those that already exist, you need to up their reference count, so we know they're being used. Those that don't, you upload the actual content to create them.

But if you reuse all of the blocks of a file - because it's not changed - then you'll find you can reuse the indirect blocks, too, since they won't have changed. In which case you mustn't increment the refcounts of the data blocks, since they're still only referenced from the one indirect block, even though that indirect block now has two references.

So, what I've done is to implement a "key stream writer" API. You create a key stream writer and feed keys into it, but along with every key, you feed a 'reused?' flag which is true if the key is a reused one rather than a new one.

At the end, you finish the key stream writer, and it returns you a key for the entire tree, plus a reused? flag.

And it does all the magic internally, incrementing the refcounts on reused keys passed to it if and only if it creates a brand new indirect block referring to that key, and correctly handling refcounts on any indirect blocks it uses. It does this with a lovely recursion; the key stream writer accepts keys into a buffer and flushes the buffer out to a key stream indirect block when it gets full, at which point it creates a parent key stream writer to manage the next "level" up in the tree, writing the key of the newly created indirect block to that writer. I'm quite taken aback by how simple and elegant the code is, for what it does 🙂

Then, on top of that, I implemented an sexpr-stream writer, which accepts sexprs (Lisp data records, in effect) and buffers them, writing them to the archive when there's too much for a single block, and using a key stream writer to track the blocks written. This is to be used to implement directories, which will be a list of records giving details of all the files in the directory. The sexpr stream writer needs to know about keys mentioned in the sexprs, so it can do the same refcount-tracking magic to handle sharing correctly. Luckily, in spite of those requirements, it also works beautifully and is also elegant code!

Then I just need to implement storing files (which should be quite easy; split the file into chunks, check to see if each chunk is already in the archive and upload if not, write keys to key stream), then I can implement storing directories (recursively store files and directories, use an sexpr stream writer to log each file created). Directory storage will need a few cases handled, since I plan to store funny filesystem objects like FIFOs, device specials, symlinks, and so on.

Then I'll need the main driver that does a snapshot of a directory tree, then creates a snapshot object with metadata in and links it into the snapshot chain for a chosen tag.

I reckon that lot will take me half of the next day - then I'll implement deletion (I've been implementing primitive deletion operators for key streams and sexpr streams as I go along, so this won't take long) and extraction (again, I've written fold operators for key and sexpr streams, so this won't take long), then I'll have the core done!

What will remain then? A command-line wrapper with decent option parsing and command line ops to do a snapshot to a tag, retrieve all or part of a snapshot, list snapshots on a tag, delete a snapshot, and delete a tag, which should just be a matter of plumbing. Then I'll be done!

I still think I'm on for two days... but mainly because I'm punting on writing an actual S3 backend for now. That'll take a while, dealing with fiddly Web Services authentication stuff!

Other things I ought to get around to doing at some point are:

  1. LZMA compression. For now, I'm using a deflate implementation for Chicken Scheme that I found, but I would like to wrap liblzma as a Chicken egg, since it's thought to have tighter compression.
  2. Encryption. Right now I have a stub in the code for encrypt/decrypt operations, applied to the compressed blocks, but for now it just returns the block unchanged. I need to either use Chicken's libcrypt egg (which looks complicated) or find a lightweight implementation of a good cipher that I can wrap as an egg.
  3. More storage backends.
    1. A replication backend that wraps a list of backends, doing writes and deletes to all of them, and servicing reads from the first backend that succeeds to provide a block. That way I could back up to a local disk and to S3, and restores would come from the local disk if possible or from S3 if not. Even if I lost part of the local disk in a filesystem corruption incident, anything I lost would still be available from S3.
    2. A generational backend, that wraps a list of backends. Writes are only done to the first, but reads (and deletes) are serviced from the first one that responds. That way, I can fill up one hard disk with archives, then start filling another, and so on, building up a chain of archive partitions.

Care must be taken when doing complex things with backends, though, in order to preserve the correctness of reference counts. A group of archives that are used together by replication or generations will have refcounts that are correct across the whole group, not necessarily individually within each component archive.

More forbodingly, my existing cache backend wrapper won't mix well with deletion of snapshots when the same backend is shared between multiple machines, as it basically caches the existence of blocks; but if a block is deleted by another system, the local cache won't know, and will incorrectly report that the block is already known. Which means it won't get uploaded to the archive, which means data could be lost.

For now I'll recommend that people who want to delete from shared repositories don't use the cache, or flush the caches after a deletion, but in the long run I'd like to think about how to improve the cache. Perhaps by logging deleted keys to a special area in the backend, and when a cache starts up, it can fetch the log and look for any deletions it doesn't already have. Or just compress the entire cache file and upload it when closing the connection, and download it when starting a connection, as long as no two cache-using machines are snapshotting at the same time...

Backup system progress (by )

I mentioned an intent to build a backup system this Christmas. Well, I've not had two days to work on it yet; I've just had an hour and a half one evening, in which I've written the local-filesystem backend...

But last night I had a dream about it. In particular, I dreampt of visiting an old friend I've neglected to keep in touch with lately; and at her house she had a rather snazzy little mini-laptop, like an EeePC or something, which had an eSATA connector into which she plugged some humungous (in terms of gibibytes; it was the size of a double CD case, physically) external hard disk upon which she had a filesystem that seemed to work very much like Fossil/Venti, the original inspiration for my backup system - or perhaps like a giant personal Git repository.

In particular, one feature that cropped up in the dream was that the filesystem had a magic file on it which was an RSS feed of recent changes to files.

Which got me thinking about features for version 2 of my backup system (if I get to finish version 1, that is!). My focus is on offline use, for batched backups, but the filesystem in my dream was being used online.

For a start, we could have a log of changes to tags, as in my system tags are the "roots" of the archive system. The creation of new tags, the deletion of tags, or the updating of a tag to point to a new snapshot would all be logged. This could then be used to create the RSS feed.

Secondly, it shouldn't be too hard to write a FUSE interface to the thing for read-only access, presenting a root directory containing an RSS file generated from the log, along with a directory for each tag, which in turn contains a current subdirectory containing the current value of the tag as well as dated subdirectories containing all the past contents of the tag, in ISO date format so they sort correctly. And perhaps an RSS file just listing the history of that tag.

But then the next cool thing would be to allow write access, by using a local disk directory as a staging area, so the current subdirectory can be written to (with the changes being spooled to the local disk). Then when a commit command is given, those changes are merged into current. Which would require a new underlying operation to merge changes in, rather than taking a new snapshot; the difference being that any files or directories missing in the directory tree being snapshotted are 'inherited' from a previous snapshot already in the system, with some mechanism to reflect deletions.

Anyway, unrelated to the dream, it also occurred to me that it'll be neat to support replicated archives; my implementation of the backend architecture will make it easy to take a set of backend instances and merge them into one, with every write going to all of them and reads serviced from the first one that succeeds. It'll also be easy to support staged archives, where a number of read-only backends are checked for existing blocks, but all new blocks go to a nominated writable backend, with another backend adapter. That will allow for generational backup systems, where a local disk is filled up with backups until it reaches a size limit, whereupon its contents are shipped off to a DVD writer (keeping a local cache of what blocks where on there, so the DVD need not be put in unless the contents of blocks are actually needed).

But, all idle speculation for now. I still need another day and three quarters to implement the core of the thing...

Design patterns (by )

I'd thought I'd had this little rant before, but as I went to find the post I'd had it in to link to it from a big ranty post I'm working on, I couldn't find it.

So here it is.

My opinion of design patterns.

Design patterns are common structures that appear in computer programs, but that can't themselves be abstracted into a reusable component.

That's fair enough. I have the "gang of four" book of design patterns, that started the craze, and I think they're all good design patterns in mainstream OO languages; things that can't be actually encapsulated as a class or whatever, but that keep cropping up.

However, the problem is that the patterns community has somehow ended up with some dogmatic folks in it. Who take phrases such as "Design patterns are common structures that appear in computer programs, but that can't themselves be abstracted into a reusable component" - and then attack anybody who claims to have encapsulated a design pattern into a reusable component.

Because design patterns are really only valid in relation to a set of languages. For example, in assembly language (or even traditional BASIC with GOSUB), procedure calls are a design pattern. There is no language construct for a procedure; instead, you have a GOSUB or CALL to a given address, that explicitly saves the calling location somewhere and then jumps, and a RETURN to go back to that saved location. The GOSUB...RETURN pairing is put in there by the programmer; it's a design pattern.

But then higher-level languages have explicit procedure abstractions, and all the fun they entail.

Even object orientation is a design pattern in languages like C. "Put function pointers in a struct, and call them with the struct pointer passed as the first argument". Or including a pointer to a struct of function pointers in every data struct (and call it a virtual method table). But go to Java and that's an explicit language abstraction facility.

The gang-of-four book is taken as the authoritative book of common design patterns (although it is, in my opinion, a book of design patterns for mainstream OO languages). And design patterns are things that cannot be expressed as native abstractions in code. THEREFORE, fools think, it's impossible to write a design pattern in code. And anybody who claims to have done so needs shouting at.

Sadly, as demonstrated in Peter Norvig's presentation on design patterns in dynamic languages, more advanced languages can express abstractions that others can't, and there are plenty of more advanced languages than C++ and Java.

In fact, I think that the appearance of design patterns in a language is probably a hint that the language needs to be more powerful - and suggests how to make it so. What sort of abstraction would make it possible to encode your design patterns?

And so, as I read "Modern C++ Design" by Andrei Alexandrescu (a book on doing good things with C++ templates), I come across the following phrase:

Eventually, Andrei turned his attention to the development of template-based implementations of popular language idioms and design patterns, especially the GoF [gang of four] patterns. This led to a brief skirmish within the Patterns community, because one of their fundamental tenets is that patterns cannot be represented in code. Once it became clear that Andrei was automating the generation of pattern implementations rather than trying to encode patterns themselves, the objection was removed...

Bleargh. It sucks when people get all religious about the interpretation of a given piece of "scripture". Even when people manage to work around it by using illogical arguments ("No, no, it's not a representation of the pattern in code, it's a bit of code that automates the creation of implementations of the pattern..."). Sadly, in other realms, this kind of thing can lead to people being burnt at the stake.

Sump plug woes (by )

It's time I changed the oil (and the oil filter) in the van, since it's been over a year now.

So I bought myself some new oil, and a new oil filter, and prepared to do the deed according to the instructions in my Haynes manual.

The first step is removing the plug in the bottom of the oil sump, that lets the oil out. It looks like this:

The sump and its drain plug

Naturally, when you remove it, lots of hot oil comes running out (you do this with the engine warm so the oil flows well). So you do it with a bucket underneath!

However, I've not gotten as far as that step yet, because the head of the sump plug (which is hexagonal, like a bolt) is rounded. You can see in the pictures how the corners of the hexagon have all come off. They're shiny where I've been trying to grab them with a spanner, which just rotates when I apply enough force... and it doesn't take very much force.

A closeup of the sump plug Another closeup of the sump plug A direct side view of the sump plug

So I went out and bought a special thing for undoing damaged nuts and bolts. It'll make a mess of the sump plug, so I bought a new sump plug, too. The magical nut remover looks like this:

The magical nut remover

The spiral grooves inside are rifled so that when you put it onto a nut and turn it (such as with the big adjustable spanner in the background), the points grip into the metal of the nut, and the spiral pulls the socket onto the nut as it twists it out, so that it can't push the socket off. Most stripped nuts and bolt heads are rounded towards the top, so tend to push spanners and sockets off as they are twisted; this thing pulls itself onto the nut with the spiral shape, thus counteracting this.

However, my sump plug bolt head is already far enough gone that the special socket rotates on it (shaving off little curls of shiny metal as it goes) when I turn it, again with surprisingly little force - but the next size of socket down doesn't fit onto it...

Perhaps the sump plug was made out of lead or aluminium or something?!?

I guess the next thing to try would be getting under there with a file and shrinking the head down to the next size by putting new flats on it.

But, since it's freezing outside and I've nowhere inside to work on the van, and we need the van ready to do a long drive in a couple of days (which I'd rather not do with the current black oil in, in this punishing weather), I think I'm going to take it to a garage and ask them to sort it out for me... I can give them the oil, the oil filter and the replacement sump plug I've already bought and just ask them to do the hard part!

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales