Category: Computing

Ugarit archive mode manifest maker (by )

When I last wrote about Ugarit progress, I had developed archive mode to the point where one could import a list of files with metadata from a "manifest file", and then search for files based on the metadata from the manifest and stream out chosen files. I gave an example of using this to play MP3s matching a search pattern:

[alaric@ahusai ugarit]$ for i in `ugarit search test.conf music '(= ($ artist) "UNKLE")' keys`;
do ugarit archive-stream test.conf music $i | mpg123 -;
done

Well, that was all based on hand-written manifest files, which are no fun to produce (our music collection is large). As such, I've been working on a "manifest maker" that takes a list of files and directories and makes a manifest file from them, recursing down through directories to list all the files. And for each, it automatically extracts metadata into the manifest file, which can then be hand-edited if required, and then used to import from.

The idea is that the manifest maker will have support for a number of file types it knows how to extract additional metadata from, and the first one I've implemented is ID3 tag extraction from MP3s. I've implemented the ID3 V2.2 and ID3 V2.3 specs, as those were the two that I found present in the subset of my MP3 collection I'm testing against!

For example, here's the output it produced for one of my MP3s:

(object "./test-data/THE HOLLIES - He Ain't Heavy, He's My Brother.mp3"
  (filename = "THE HOLLIES - He Ain't Heavy, He's My Brother.mp3")
  (mime-type = "audio/mpeg")

   ;; Unknown ID3 tag "COMM"="engiTunNORM\x00 00000402 00000000 00001B59 00000000 00004E65 00000000 000040EC 00000000 00015FD5 00000000"
  (keyword = "Pop")
  (name = "He Ain't Heavy, He's My Brother")
  (creator = "THE HOLLIES")
  (creation-date = "2002")
  #;(featuring = "")
  (collection-name = "Legends CD2")
  #;(collection-volume = "")
  #;(collection-volumes = "")
  (volume-index = 16)
  (volume-size = 18)

  (mtime = 1428948696.0)
  (ctime = 1428948696.0)
  (size = 4063360))

It prints out unknown ID3 tags as comments, in case a human can glean some useful information from them to put into the metadata, and it suggests the names of metadata tags I might be able to provide by hand that it hasn't found (in this case, a tag for other people featured in the music, and two for indicating that this album is part of a set. As it happens, it is, as the "CD2" in the name suggests, but it wasn't indicated in the ID3 so I'll have to hand-edit it; likewise, the date from the MP3 of 2002 is clearly for the production of the album, not that classic track... ID3 metadata is often a bit shabby!). Also included are file mtime, ctime, and size in bytes.

I hope to add Ogg Vorbis metadata next; I'd like to add EXIF support to parse information out of the JPEGs in our vast family photo library, but it looks much harder, and I'm not sure how useful it will actually be!

How I’m managing my life with emacs org-mode (by )

It's no secret that I'm a busy person; often, when I decide to do something, a few years pass before I actually get to do it. So the only way I keep afloat is by the judicious use of computers to track my task lists. I rely on automatic systems to make sure I always know what I need to do today, and what are the most important things I need to do "sometime" that I can do right now. There's no way I could keep all that in my head without forgetting about things and letting people down, or feeling stressed because I'm juggling too much in my mind, or not being able to find the best thing to do when I have a free moment.

As I've mentioned before on here, I want a personal information management system based on predicate logic, so I can express complex relationships between things easily, and tell the system how to infer knowledge automatically. However, "build one of those" has yet to hit the top of my TODO list, so for now I'm using emacs' legendary org-mode. This lacks the rich semantic power of my proposed PIM, but it's already implemented and has a nice editing interface 🙂

A few people have asked about what I've done, so here's my attempt to document it.

Read more »

Goal-based artificial intelligence for home automation (and maybe piloting a network of autonomous killbots) (by )

For a while, I've been mulling the idea of writing zmiku, a daemon that can be programmed to automatically control various kinds of systems. My application is home automation, and maybe automating the management of servers (restarting and failing over services, dealing with overload situations, gracefully handling disks being full, that sort of thing); but it occurs to me that the same basic problem also applies to controlling autonomous robots such as space probes, industrial processes, and that sort of thing. A good solution to all these problems in one would be quite useful!

You might say that this is a non-problem; normally, people would just write programs from scratch to control these kinds of things, sitting in a loop reading inputs and updating state variables and choosing what output actions to generate, but the complexity of the resulting program tends to increase rapidly as the problem complexity rises.

Rather than traditional programming languages, a better notation for such a reactive system is a state machine. The Wikipedia articles on a UML state machine diagram gives a good introduction to one version of this notation, including some discussion of ways to extend the most basic version in ways that increase its expressiveness and modularity.

I'd like to base zmiku on a textual version of the UML statecharts, but today I've had a horrible stomach ache, so been unable to do much more than lie around and think about stuff, and what my mind settled on was the interesting question of how to integrate state machines with goal-based programming, which is also useful for controlling complex systems. In a goal-based system, various goals are known to the system, each with a priority; for instance, a flying robot may have a destination demanded by the user, which the navigation system tries to fly the robot towards; but a collision-avoidance system may sometimes override the navigation system when it detect that a collision will result otherwise, with a higher-priority goal for the steering system. And when the collision has been avoided, that goal will disappear, and the earlier goal of getting to the destination will take over once more. And if the robot's batteries are running low, then flying towards a charging place (or a place where the solar panels are in sunlight) might be a higher priority than the user's chosen destination, but not a higher priority than avoiding collisions. And so on.

So I came up with a model for integrating the two, using the "scoreboard" model from artificial intelligence; giving a system a shared global state between a number of concurrent subsystems. And this blog post is the result of me writing up my scribbled notes. I'm still in a lot of stomach pain, so I'm afraid it's going to be a bit rambly 🙂

Read more »

Folding history (by )

Ugarit is a content-addressed store; the vault is a series of blocks, identified by a hash, that cannot change once they are written.

But logically, they appear as a set of "tags", each of which either points to an archive (a set of files with associated metadata, which can be added to, or the metadata of existing files changed) or snapshots (a chain of snapshots of a filesystem at a point in time).

So in a store where objects cannot be modified, how do we create the illusion of mutable state in these "tags"? Read more »

Further progress on Ugarit archival mode (by )

Further to my last post on the matter, I've been working on the basic user interface to accessing archive metadata.

As before, let's do an import to an archive tag in a vault. I've made a manifest file with three MP3s in - all data that could be extract from ID3 tags, and I plan to write a tool to automate the generation of manifests by examining their contents in exactly that manner, but for now I had to hand-write one:

[alaric@ahusai ugarit]$ cat test.manifest
(object "/home/alaric/archive/sorted-music/UNKLE/Psyence Fiction/13 Be There.mp3"
        (title = "Be There")
        (track = 13)
        (artist = "UNKLE")
        (album = "Psyence Fiction"))

(object "/home/alaric/archive/sorted-music/UNKLE/Psyence Fiction/11 Rabbit in Your Headlights.mp3"
        (title = "Rabbit in Your Headlights")
        (track = 11)
        (artist = "UNKLE")
        (album = "Psyence Fiction"))

(object "/home/alaric/archive/sorted-music/Led Zeppelin/Remasters/1-09 Celebration Day.mp3"
        (title = "Celebration Day")
        (track = 9)
        (volume = 1)
        (artist = "Led Zeppelin")
        (album = "Remasters"))

As before, I import it, loading the files into the content-addressible storage of the vault, automatically deduplicating, and possibly storing the data on a cluster of remote servers (although in this case, I'm just using a local vault). This was done with Ugarit revision [80b324f3af]:

[alaric@ahusai ugarit]$ ugarit import test.conf music test.manifest
Loading manifest file test.manifest...
Importing from test.manifest to tag music...
Importing /home/alaric/archive/sorted-music/Led Zeppelin/Remasters/1-09 Celebration Day.mp3...
...imported with key 4d64e4650333741cb56c3e6a785b6de4d23324cb1055e529
Importing /home/alaric/archive/sorted-music/UNKLE/Psyence Fiction/11 Rabbit in Your Headlights.mp3...
...imported with key 370bee7debb458357a2b879014d4abbeb409215ed269c1c6
Importing /home/alaric/archive/sorted-music/UNKLE/Psyence Fiction/13 Be There.mp3...
...imported with key 39df8bafd530a66614ad60ab323033b1385cdd842528dbd2
Committing import...
Imported successfully to tag music with import key ac26354ccfb0530109932c1aaddd414b59d4394d44ec43cd
Written 16MiB to the vault in 24 blocks, and reused 0B in 1 blocks (before compression)

But now it's in, we can query the metadata. Firstly, let's see what properties are available - a combination of the ones we wrote in the manifest, and automatically-generated ones such as a MIME type and the original import path:

[alaric@ahusai ugarit]$ ugarit search-props test.conf music
album
artist
filename
import-path
mime-type
title
track
volume

Let's see what values there are for the "artist" property:

[alaric@ahusai ugarit]$ ugarit search-values test.conf music artist
UNKLE
Led Zeppelin

(they're sorted by popularity, and we have two UNKLE tracks, so that comes first)

Let's see what UNKLE albums we have, by filtering for objects with an artist property of "UNKLE" and asking what values of the "album" property are available:

[alaric@ahusai ugarit]$ ugarit search-values test.conf music '(= ($ artist) "UNKLE")' album
Psyence Fiction

Let's see what we know about music by UNKLE:

[alaric@ahusai ugarit]$ ugarit search test.conf music '(= ($ artist) "UNKLE")'
object 39df8bafd530a66614ad60ab323033b1385cdd842528dbd2
    (album = "Psyence Fiction")
    (artist = "UNKLE")
    (filename = "13 Be There.mp3")
    (import-path = "/home/alaric/archive/sorted-music/UNKLE/Psyence Fiction/13 Be There.mp3")
    (mime-type = "audio/mpeg")
    (title = "Be There")
    (track = 13)
object 370bee7debb458357a2b879014d4abbeb409215ed269c1c6
    (album = "Psyence Fiction")
    (artist = "UNKLE")
    (filename = "11 Rabbit in Your Headlights.mp3")
    (import-path = "/home/alaric/archive/sorted-music/UNKLE/Psyence Fiction/11 Rabbit in Your Headlights.mp3")
    (mime-type = "audio/mpeg")
    (title = "Rabbit in Your Headlights")
    (track = 11)

Ok, let's listen to all our music by UNKLE (the extra "keys" parameter to the search command says to just output the object keys, one per line, and the "archive-stream" command streams the contents of an archived file to standard output):

[alaric@ahusai ugarit]$ for i in `ugarit search test.conf music '(= ($ artist) "UNKLE")' keys`;
do ugarit archive-stream test.conf music $i | mpg123 -;
done

...music by UNKLE plays...

We're slowly moving towards having a usable and useful archival filesystem, backed on a modular content-addressible storage system! Isn't that neat? Of course, it's not amazingly useful as it stands - at first sight, it's like a very crude version of the browser found in any modern music collection management app these days; but this is the seed of something much more interesting. For a start, it can categorise files using any user-defined schema. The backend storage can be encrypted, and accessed remotely over a network (and, in future, replicated over a cluster, or mirrored between your laptop and a home fileserver, and automatically synchronised when they're connected). The same storage can be used to store backup snapshots as well as archives, and if files exist in any combination of archives and snapshots, then only one copy of it will be stored (or need uploading, even); most files in an archive will have started off in a backed-up directory tree, or will be extracted into one.

There are many interesting use cases for Ugarit, but my personal one is to have a fault-tolerant vault of all the data that matters to me, neatly organised so I can find things quickly, and so I can access things from different locations (even when offline). Rather than having files scattered over different disks on different machines, and having to move things around to make space, and remember where they are, I can add more disks to the vault when I need more capacity, and have Ugarit manage everything for me. With the amount of data I manage, that'll be a great weight off my mind!

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales