Generic Functions (by )

Clearly, in the context of a given program, you'll define a generic function (perhaps in a library), then define lots of methods on it. Those method definitions need to find the generic function definition somehow, so they can all group together and be associated with that GF in some way, so when somebody calls the GF, the implementation can know the set of methods on that GF and do some combination of compile-time and run-time analysis to figure out which one to call.

That's not too hard; a logical approach would be lexical scoping. The method must be in the scope of the GF definition, and it can then find it by looking up the GF name in the lexical environment.

But then a piece of code that uses that GF therefore has, indirectly, the ability to call any method on it, even though it might not be in the lexical scope of the method definition; say library system.core defined the addition generic function, library mathslib.matrices defined a method on that GF that added matrices (and imported system.core), and then library datalib.collections defined a List class, and a generic function to find the sum of a group of things, and a method implementing sum on lists by using the addition generic function which it imports from system.core to add the contents of the list together and return the result.

Then a bit of code might import mathslib.matrices and datalib.collections (but not, necessarily, system.core), create a list of matrices, and ask for their sum. datalib.collections doesn't have any lexical visibility of mathslib.matrices, but when it calls the addition GF on the matrices in its list, it invokes the addition method from mathslib.matrices. Which is all well and good, and rather the point of the thing; datalib.collections has no need to know about how to add things (defined in mathslib.matrices, mathslib.integers, etc); all it needs to know is that addition exists (defined in system.core).

Now, in a dynamic language with mutation like Scheme, this is usually implemented like so: when you load a library that defines some methods, the act of loading that library looks up the generic functions being added to, and imperatively appends the methods to some kind of list inside the generic function. So the act of loading the module affects the global state of the system in order to make the methods available. If you add methods to a GF that's already being used in some previously-loaded code, then the precise semantics of that existing code changes, because you've assigned something into a global variable that code depends upon.

But how do we model this in a pure functional language? Normally, there's only two ways for a value to "flow" somewhere in such a language; lexical scoping from definition to use, or by being passed in as an argument to a function. The latter mechanism lets you implement dynamic scope, by passing an implicit parameter into every function call that contains a stack of dynamic bindings, and passing this on (unchanged, unless a new dynamic binding is made) into every function application. Indeed, dynamic scoping and passing arguments to function calls are really just different ways of expressing each other, that are more or less pleasant to use in different situations.

However, neither quite model the scoping of methods, which all flow magically upwards into the generic function, which then flows lexically or dynamically (it can be a first-class function) on to be used.

One approach is to do a whole-program analysis for resolving method scope. With lexical scoping one can compile each library in turn, then use what it exports to compile the next library; then dynamic scope happens at run time by passing arguments around. But a whole-program analysis means that the entire set of libraries used in the program has to be considered once to find all the generic functions and methods, in order to actually fully define each generic function by collecting all its methods into it, then it can be compiled on a library-by-library basis.

This has two issues: firstly, it can make separate compilation a bit of a lie in a pure functional system. If you want to compile datalib.collections once and for all on your computer and have the compiled version loaded in by different programs as they need them, then you need to complicate matters by making the compiled version not really be the final compiled version, as when you link a program together you still need to do the first pass to define all the GFs; we can do separate compilation by compiling templates for libraries that are just missing the detailed definitions of the GFs, which are filled in at link time (which, apart from precluding inlining, is what happens in the implementation layer anyway; but it needs to be modelled differently in your pure functional world; a precompiled library is really a function from a program's generic function structure to a finished library). But it also raises a subtle issue: what's the boundary of the program? We have to statically define this; and no methods can 'migrate' across this program boundary. This is fine when we're compiling a program to run as a POSIX process, where the only communication outside is byte strings and syscalls. But what if we're writing an operating system based around a purely functional model? Or writing some dynamic application that will need to be able to load plugins? Both of these tasks become tricky in one way or another if we require whole-program analysis.

Another approach is to shoehorn it into dynamic state somehow. In our program that sums up a list of matrices, the top-level "main function" that organises this has to import the matrices and collections libraries, in order to create a collection of matrices. So it could create some sort of data structure that lists the methods it knows about from libraries its imported, or defined itself; and it could pass this as an implicit parameter to every function call within it. And other functions can then just pass on the method-environment they are given when invoked.

This means that when the code to sum the contents of a list in the collections library is invoked, it's actually passed the method to add matrices in this implicit parameter. So when it calls the addition GF from system.core, it passes the method environment in, and the implementation of the GF can look up the method from the environment that matches its arguments. And so it'll find matrix addition defined.

This means that methods are inherited according to dynamic scope - which is in turn inherited from the methods that are lexically visible (eg, imported from libraries or defined there and then) at some "start point", such as our main function. In principle this mechanism could be exposed, with a function that accepts a single function as its argument, but discards the method-environment passed to it and calls the argument function with a new one created from the list of methods lexically visible at that point (or, even, a list of methods passed in as a real first-class list).

And so if our pure-functional OS implements shared libraries (eg, has a single compiled implementation of datalib.collections that all running programs share), then the list-sum method called in a dynamic context that has matrix addition defined will know how to do it; but it could also be called from a dynamic context that has a different matrix addition library loaded, that has different time/space trade-offs or a different licence or something, and it would use the correct one.

That looks like the most elegant approach; but as the problem always is, how to implement it efficiently? As described, it sounds horribly run-time-heavy, but in practice, there's no constraint that the implementation of a pure functional system work the way you'd think, as long as it produces the same result; Uniqueness typing lets you implement efficient array updates in a pure functional world, by making it easy for the compiler to spot that it can get away with mutating the array without anybody noticing, for example. Also, you can do a tremendous amount of optimisation of runtime dispatches such as generic functions (again, by slyly mutating things under the covers).

Pages: 1 2

1 Comment

  • By Peter Bex, Tue 18th Aug 2009 @ 2:15 pm

    I agree that generic functions are an interesting idea, but I think you've not given enough credit to prototype based systems, or "open class" systems like Ruby.

    In a prototype-based system you can manipulate any object as you please, so you can add new methods which become available to all clones of that object (unless the clone already overrides the method with its own).

    In the "open class" systems you can mix in any number of modules into any class you wish.

    Of course, with both systems you pay the same price as with generic functions; because everything is so open, it is hard to compile things efficiently because anything can happen at run-time.

    A more interesting question is; do generic methods get you into trouble as easily as mixins do? (the dreaded "monkey-patching" problems)

    I don't know if prototype systems can get you into these troubles so easy either, since I haven't done any big systems hacking with such an object system.

Other Links to this Post

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales