Uniqueness Typing (by alaric)
Ever since I was a kid, I've been interested in exploring Uniqueness typing as a paradigm for mutation in a programming language.
The principle is simple: mutating operations - assignment, I/O, etc - are a pain. Both for the implementers of the language, who are limited in what optimisations can be performed when the values of things can shift around beneath them and when any given part of the program may have side effects so order of execution must be preserved, and for the programmers in the language, who have to deal with bugs and complex behaviour that just don't happen when everything is referentially transparent.
However, languages without mutation - pure functional languages - have two big problems: Although they're in principle more efficiently implementable than other languages since the compiler can do so much more rearrangement and analysis, they lose efficiency since modifications to data structures have to be handled by making a copy of the data structure with the change in place, rather than being able to actually change things. If you're doing image processing on a large 2D pixel array, then you can end up copying the entire array for every pixel updated, which is clearly not good. The second problem is that interacting with the outside world is a mutating operation: when you output something to a network socket, you're causing a change, just as if you are assigning to a variable. Unless you only want to write boring programs, you need some way around that!
There are a few approaches to this problem that I've seen.
- Be "mostly functional", like ML. Write a functional language, then tack on mutation as a "side dish". ML has a kind of object called a 'box' which is like a little mutable cell; one can call a function on it to get its value out, and another function that (naughtily) updates its inner value. I can't remember offhand, but I think they do I/O in a similar way. Since the language is mainly functional the compiler is allowed to do exciting rearrangements, and the programmer has to be a bit careful to make sure their mutating operations happen in the order they want, by using data flow to enforce orderings.
- Use monads, like Haskell. Monads are mathematical objects that people seem to have trouble explaining and understanding, but the principle is quite simple: they're values that consist of basic atoms that can be constructed, and then concatenation operations that join them together in an order-preserving way. Therefore, they can be used to generate an ordered list of mutating operations; generally, one of the concatenation operations provided is one that takes a monad value and a function from something to a new monad value. The 'something' is extracted from within the monad. In practice, this is used to handle input, or reading from mutable variables; one takes an operation that reads some value from somewhere, and it puts the value "into" the monadic value that represents the operation; then one concatenates it with a function that takes the inputted value and then returns a monadic value that represents the operation performed with it. Eg, input a string, then output the string "Hello, " followed by the inputted string. In a way, your program returns a kind of lazy list of operations to perform, which the runtime system then performs.
- Use uniqueness typing to enforce referential transparency in the face of mutation. This means that your language allows you to specify that certain values may only ever have a single reference to them. Imagine you have a pure functional update function, that accepts (for instance) an array, an index into the array, and a new value to put at that element, and returns an array that's the same as the old array but with that one element changed. One way to implement that is to copy the array and then make the change in the new copy. If you just changed the array in-place, then any other bits of the application that have references to that array will "see" the change in future - which breaks referential transparency, meaning that evaluation order matters. But if there are no other references - if the one passed into the update function is the ONLY reference - then it can mutate the array in-place and just return it. From the perspective of the purely functional language, there's no way of knowing that it's the same region of memory with a slightly different array in it, so all the pure functional properties are preserved (with one small exception relating to termination behaviour, but we can gloss over that). But the implementation can efficiently update the array in-place without copying. Also, one can model interaction with the outside world, if the world is represented as a special value which one can operate on with a set of functions that accept a unique world and return a new unique world in which some change has occurred. For example, an input function might accept a world and a return an inputted string, plus a world in which some time has passed. It sort of turns your program inside-out, making the outside world just another value you pass about, rather than something actually outside the program. Since the entire program has to be of type !World -> !World (read: function from unique world to unique world), and all the world-affecting operations accept one world and return one world, your program has to contain an unbroken ordered single chain of world affecting operations, or it won't type-check. The type checker therefore prevents your app from destroying the world, cloning it, or creating new ones from scratch... which is comforting!
I've always been keen on uniqueness typing, since it's very powerful and general. For example, it's hard to interleave two monads: if your program does some I/O and has a few mutable states it deals with, then operations on those different mutable things can't be interleaved unless they're all handled by the same monad, since they won't concatenate into a single list of mutations to do. So you really need a big complex all-mutating-operations monad with everything bundled into one.
While with unique values, you can have as many as you like. Some functions will operate on one, some on another, some on two or more at once. Indeed, two sequences of operations on independent linear values will be able to proceed in parallel within the same program, and the compiler can trivially (with dataflow analysis) identify potential parallelism and split the independent computations into threads, safe in the knowledge that it is impossible for the two threads to have race conditions.
But there are downsides... I'm not sure how one would express a circular data structure like an efficient double-ended queue without being able to have more than one reference to anything (sweep it into the implementation and have a black box deque type that happens to use a doubly linked list under the hood, hiding the detail from the purely functional world?). And the syntax can get ugly. And exception handling is a pain if your exceptions have to carefully wrap the state of the linear values you were working on at the time and pass them up so the exception handler can continue with that state.
How to handle mutation is a big issue for CHROME. I was leaning heavily towards uniqueness typing, but I'm beginning to worry now. I don't like monads, for the reasons listed above - but is my faith in uniqueness typing justified? Should I just play it simple and dirty, and go for the ML approach?
By Ben, Fri 2nd May 2008 @ 8:11 am
I am coming to the opinion that mutable strings by default is a design flaw. Ruby is annoying, even a comparison to a constant string has to create a new string each time (I think). It's freeze method doesn't help much, because it's not on by default.
Cocoa has data structures which are immutable in the base class, and have mutable derived classes. Everything takes and returns immutable things by default. It works well, but it is down to trust -- I think you want to enforce this in the compiler or at runtime, so it's not good enough.
C++ has the const modifer, which again works very well but is the wrong way round by default. It's too easy to forget it, then find that you can't practically use something as a const because a non-modifying access function is missing the modifier. A mutable modifier would be better.
Still doesn't enforce it though. How about a const/mutable modifier that on the first cast, made it irrevocably const or mutable, and using the wrong way exceptions?
This is probably more pragmatic than correct, though.
By alaric, Fri 2nd May 2008 @ 10:39 am
Yeah, Python's good in that respect. Strings and tuples (both major data structures) are immutable. There's mutable lists which work like a subclass of tuples. And so on.
Functional programming has demonstrated that mutation isn't quite as useful as was once thought; and ongoing experiments in programming language implementation (particularly with relation to increasing parallelism) are showing that mutation isn't quite as cheap as was once thought 🙂