My friend Ben Summers has been busy developing SpotMeta, which is a nifty tool for Mac OS X (Tiger).
The hole that SpotMeta fills is a basic omission in how Apple have implemented Spotlight, their system for indexing files for rapid searching.
Spotlight works by being notified whenever a file is modified. It then looks at the file type and invokes an appropriate "importer plugin" for that file type, which extracts relevant searchable information from the file. For example, the PDF importer extracts the page count and page size from the file. The result of the importer is a bunch of key=value pairs, plus a summary of the plain text content. Spotlight merges this with basic information from the filesystem - file size, last modified timestamp, creation timestamp, file name, that sort of thing - and indexes the result.
You can either do a quick search for any given string, which will be looked for in the indexed content as well as in all the indexed key=value pairs, or you can set up a structured search query - telling the system to find files with a modified date in the last week and whose content contains a certain phrase, for example.
However, this is rather limited in some ways. For a start, only one importer is applied to a given file. That's a problem, since many file formats have support for optional extra application-specific information within them. A PNG file, for example, might have an importer that extracts the standard comments field inside a PNG, but a scientific visualisation application that extends PNG files with extra information about the units of the axes will either need to write a whole new importer that provides the standard PNG fields as well as their own - or just live with their users not being able to find images that show a time axis with an accuracy better than 3ms.
On the other hand, Tiger's file system allows you to store arbitrary key-value pairs for any file - "extended attributes", in computing parlance. Yet Spotlight doesn't bother indexing these attributes, and it's not easy for a normal user to view or edit them.
SpotMeta does three things that, together, rather neatly solve these problems.
- It provides a nice user interface to edit extended attributes. A metadata schema editor is included, which lets you define your own keys - you might have a choice field with a given list of options, a multi-choice field with a specified list of checkboxes, and plain text, date, or numeric fields, as well as a few other options. Then you can use the Finder context menu or a keyboard shortcut (from within the finder or an application) to edit the attributes of the selected file, according to your definitions.
- It provides a framework for extending the Spotlight importing process. Basically, it allows other importers to run as well as the "native" importer for a file, and merges their results with what the native importer proides. So the author of the PNG-based sci vis package could write an importer that just reports on their extensions to PNG files, and have it run as well as the native PNG importer.
- It provides an extension importer that reads the user-defined extended attributes and indexes them, so they can be searched for with Spotlight.
The net result is that you can define (and subsequently refine as you wish) a tagging system for your files - I already organise my directory structure by client and project, so tag my files with the project lifecycle stage they relate to (legal, requirements, specification, implementation, documentation, ...) and whether they still require action. Then you can search for files based on your requirements - I can find all legal documents I've not dealt with, and what's more, I can create a 'smart folder' that will always show the result of the search (changing in real time as I alter my files).
Neat stuff!