One Size Fits All (by alaric)
On Monday, I happened to be discussing some ARGON stuff with a friend, and he pointed out that what I'm trying to do, in many ways, is to find a one-size-fits-all solution for a lot of problems, and that this is often dangerous since you can end up making a nasty compromise.
He's right - part of the challenge in designing ARGON has been to find ways to avoid nasty compromises. So I thought I'd describe a few techniques I've been using.
The best way of finding a one-size-fits-all solution without compromising is finding a general solution to the problem. There are right and wrong ways of doing this.
Take, for example, communication between entities. The problem is that bits of software all over the world will want to use the services of other bits of software, be they within the same cluster (=administrative domain), or elsewhere.
The current conventional approach to this is to provide applications with, on the one hand, access to TCP and UDP sockets, and on the other hand, distributed computing toolkits such as RMI, CORBA, HTTP, or ONC RPC.
Sockets are too low-level for applications, really. To do anything useful on top of TCP or UDP one has to define a lot of generic functionality, such as matching responses to requests, authentication, error signalling, and so on. Yet the distributed computing toolkits are too specialised; they emphasise certain styles of interaction (generally, RPC) while making others harder. Applications like media streaming tend to drop down to the raw UDP layer and build up their own infrastructure from scratch, in order to avoid the inefficiencies of attempting to operate within a model that's hostile to their requirements.
So, in designing MERCURY, I forgot all I knew about 'conventional' ways of designing protocols that cover the Presentation layer to the Transport layer, and sat down and thought about what application developers want, and what the network can provide.
Applications, it would appear, want a variety of things; byte-oriented full-duplex streams, ordered message streams, request/response handlers, unordered message passing, reliable delivery, unreliable delivery, store-and-forward, and so on.
Networks, it would appear, offer one of three kinds of service:
- Point to point links, which although are unpopular between hosts in general are nonetheless sometimes useful for simple connections (plugging one's laptop into one's PC, a dedicated 'failover' link between two critical servers that share a function, etc), offer a totally uncontested fixed-bandwidth link that the peers are in total control of; it may split the data into frames of some kind, or just be a byte stream link like RS232, but since there is no congestion the two are fairly interchangeable.
- Best effort packet switching networks, such as the Internet, which work by letting hosts inject packets with a destination host address; the packets may or may not arrive at the destination, and may be duplicated or re-ordered in transit, but in general the chance of the packet ever arriving drops sharply as the time since it was transmitted increases, and the chance of a packet arriving is high unless the network is overloaded.
- Networks with bandwidth reservation, which (possibly in combination with best effort packet-switching) support 'virtual circuits' which are explicitly set up with an agreed bandwidth reservation; and along these virtual circuits, sent packets arrive in the same order at the far end, and are very, very, likely to arrive as long as they are not sent faster than the agreed rate, since the network will reserve capacity for them. Congestion in these networks is evident when attempts to set up virtual circuits are refused; existing circuits should never suffer from congestion.
Now, the network services can be easily abstracted into a more general form. One can define an abstract network, which provides the following services:
- Send a message to a given endpoint at a given host, requesting either guaranteed delivery or a 'drop priority' that the network may use to decide which messages to discard in the event of congestion.
- Send a request to a given endpoint at a given host, with a return address specifying the sending host and a reply endpoint; guaranteed delivery is then implied, since the sender is waiting for an answer.
- Open a virtual circuit, with an optional minimum and desired bandwidth reservation; the network must refuse the request if it cannot guarantee the minimum bandwidth reservation, and should try and obtain the desired reservation if it can; either way, it will report the bandwidth reserved back to the application.
- Within the context of a virtual circuit, perform any of the above operations; eg, send a message along a virtual circuit, or issue a request along it; the virtual circuit ID serves as the destination address, and the same virtual circuit is used to route the response to a request back to the sender.
- Close a virtual circuit that is no longer required.
This generic API can be implemented for best-effort networks by using ARQ to perform guaranteed delivery, and simply denying any request for a virtual circuit that has a non-zero minimum bandwidth reservation, and giving any other virtual circuit a zero bandwidth reservation; virtual circuits can then be implemented by having the hosts agree upon a virtual circuit ID to tag all packets carrying messages for that circuit; a MTUs can be handled with fragmentation.
For networks with real bandwidth reservation, then network-level virtual circuits can be used to implement abstract virtual circuits. If the network ONLY offers virtual circuits, then non-virtual-circuit traffic must be handled using techniques like AAL5.
And for point-to-point links, the API can be provided by considering the link as a bandwidth-reservation network connecting only two hosts. Virtual circuits may be reserved by having both hosts use a protocol to agree on the allocation of bandwidth in each direction, and using the unallocated bandwidth for non-virtual-circuit messages.
The danger with abstracting over different implementation is loss of efficiency. But in the above API, apart perhaps from a few bytes of header information (which any protocol will require), the only particularly inefficient case is providing non-virtual-circuit functionality over a virtual-circuit-only network which, at best, will involve having to set up a transient virtual circuit between any two communicating hosts, forcing a circuit setup overhead onto every communication (consider a DNS server for an example of a service for which this would be annoying). I've chosen an API that provides the functionality any network provides, while not providing any functionality that is hard to emulate on top of any network (with the possible exception of connectionless operation over a connection-only network).
Ok, in the simple summary above I've skipped how to handle multicast communications, but they're also quite easily integrated if one considers that multicast can be implemented on top of non-multicast networks by forming a spanning tree of members of the multicast group and forwarding messages along the tree.
The result is IRIDIUM.