NEON (by alaric)
NEON is the user interface system for ARGON. I've not designed what the UI will "look like", since I think that would be a stupid thing to do - different people need different UIs, especially when they have differing hardware (think mobile devices, wearable computers, and interaction devices for the blind, in particular). But what I have been designing is the UI architecture, by which applications will expose their interface in such a way that different UIs can map them to their available hardware.
Anyway, I've had some pretty similar ideas to Tuomo Valkonen - who has an idea called VIS. So I finally dropped him an email detailling my thoughts, focussing on how they differ from VIS:
I also came up with the idea that the UI system itself should pull more weight; rather than giving applications free reign on a two-dimensional display surface with keyboard and mouse, allowing them to all act inconsistently and encouraging developers to make their applications DEPEND on the existence of a two-dimensional display surface with keyboard and mouse, the application should instead just expose some kind of abstract interface, which different UI systems can interpret as they (or, rather, as their users) see fit; command line, 2D textual, 2D graphical, 3D graphical, chord-keypad-in-one-hand-with-voice-synth-in-ear wearable computer, direct neural link, etc...
My motives for this were:
Device independence. It all began with the fact that I secretly yearned to replace my PC keyboard with one of those funky Unix workstation ones with loads of function keys - Help, Do, Undo, Menu - but I knew that my applications would be deaf to these unheard-of scancodes. It struck me that the commands we feed applications could just be registered by the app as a list, then menus and hotkeys defined to the UI as mappings to those commands, while at the same time keyboards would be programmed to handle function keys by emitting a symbolic command name rather than a meaningless scancode; by convention, apps would try to reuse common names for common commands, while the keyboard manufacturers tried to comply with the same list, so an app with a Delete Paragraph command would call it "text/delete/para" or some such, thus integrating seamlessly with a keyboard with a Delete Para button. As soon as interfaces like USB started to implement this approach rather than icky scancodes, then it would open the door for specialist keyboards with lots of function keys aimed at a certain type of user, either as separate pads of function-only keys or as part of a conventional keyboard, and for programmable keyboards with either paper key inserts to write on or little displays in each key, which can be programmed along with the command name to generate when that key is pressed. Then I quickly began to wonder about tiny screens, speech synthesis, voice recognition, joypads versus mice, ...
Helping the blind. I had some very interesting email conversations with a guy who was interacting with me solely through touch typing and a speech synth. Until he mentioned this fact to me, I couldn't have told, yet I (as a human being) am a vastly more complex and featureful application than MS Word. That gave me pause for thought.
Rather than preaching to the converted about the points where we agree, I'll tell you where my ideas have differed from yours - I can't really call them disagreements, just I've come up with a few different ideas to yours, with no real judgement on which is better.
Similarly to VIS, I broke applications down into a few categories.
One uses a dialogue as the basic unit of interaction - a question is asked of the user, perhaps a complex question involving tens of fields - and the answer to the question, when it arrives, triggers the application to do something. These I thought of as verb-based applications. They DO something, by accepting one or more commands, where each command may take parameters. The application can provide annotations (the same idea as your stylesheets) to specify things like help text for each parameter of the command or the command itself, suggest a logical layout of the parameters on a 2D surface (not in pixel coordinates, but in terms of "wouldn't it be nice to arrange these four numeric parameters, that specify the widths of margins, like a compass rose, so it's obvious which edges they correspond to?")
The second is more based around a chunk of information, which the user edits, by some combination of direct interaction (dragging things about, typing text) and triggering commands (as defined above, but now optionally including implicit 'this' parameters, since they are invoked in the context of a particular 'object'). THis is perhaps where our ideas really start to differ. I decided that such apps would directly expose their data model, in a manner reminiscent of Applescript; the word processor document is a series of block-level items, one type of which is the paragraph, which is a series of inline items, which can in turn by spans of text (with a style), etc. Each object in the data model has a list of available commands, and some basic editability flags; the document object will state that it's legal to create new objects of any class implementing the "block-level item" interface within it, etc.
Given no annotations, the system would then present a basic browser; perhaps, on a WIMP interface, a collapsible tree view on the left and detail on the selected object on the right - generally just a bit of editable text.
But one might annotate the paragraph objects as "rich text", hinting to the UI that it should use a text flowing rendering engine; the spans can then be annotated to specify that (depending on the user's taste) they should either display their contained text in the chosen style, or put "<em>" and "</em>" tags around the contained text to express the style; if the latter option is chosen, then the UI may, at the user's option, either make the tags special objects or just let the user edit them as free text, with the object model being reconstructed and validity-checked when the user moves away.
So much for word processors. A spreadsheet might annotate the top-level document object, which contains a few items of document-wide metadata such as a title and is dominated by a large 2D array, so that the 2D array is displayed as a scrollable grid with row and column headings; the system would know that anyway, since that would probably be the default display form for a 2D array, but left to its own devices it wouldn't know how to display the cells properly. A formula such as "B3+C5" needs to be passed to the application to get back a value to display, but the object-oriented nature of the object model comes into play there; the Cell class would subclass String, since that's what it is beneath the hood, but would implement an optional getDisplay() method that, in this case, computes the value of the expression and returns it as a Number object for proper display; the UI should, when encountering an object that implements this method, call the method and display the result in its place. However, if the cell is edited, then due to the lack of overriding getEditable() and setEditable(newval) methods, it would edit the underlying formula string.
Annotations would be important in expressing the meaning of a list in the object model. Does the list have a fixed length, or can new elements be added, and existing ones removed? Are there properties of the elements of the list that can be treated as an icon and coordinates for each child, as a 2D draggable icon set?
I wanted a specific "container class" in the object model for a numeric data set. Such a data set declares one or more axis units - pixels, latitude, longitude, degrees Kelvin, metres, etc - and the elements of the container must be identifiable as Points, Lines, Polygons, SampleMaps (sampled data grids covering a region, like bitmaps) etc. A chart showing some data would contain a bunch of Point elements, with appropriate axis units. A map would contain Points, Polygons, and Lines with lat/long axes. The UI would be responsible for scrolling and zooming, and if editability was specified, editing as well. The fun thing is, such abstract data sets could be composed; two maps could easily be overlaid in the UI by checking for axis unit compatability (perhaps permuting the order of axes to make them match) then overlaying them.
Normally, points and lines and polygons and so on would just be displayed with a default style (SampleMaps get converted into lots of little polygon squares), but annotations can either give them styles, or nominate properties of them that should be used to derive styles.
3D and higher-dimensional data sets could be handled in a number of ways, from OpenGL to picking a pair of axes to project onto, or take slices along.
The UI should be modular, with any given UI type (2D WIMP, etc) having a module API to allow external modules to handle any given object type, with the option of switching between multiple modules that can handle the same object at run time.
So, under my model, a lot of present-day applications would end up being split. A word processor would split into several components; one would be a 'class library' of paragraphs and headings and so on, with abstract commands like Check Spelling and Generate PDF, plus code that notices changes to heading objects and updates the table of contents; the others would be a suite of UI modules that register to handle "Flowed Text"-annotated objects with particularly pleasant tty/wimp/direct neural interface/etc viewing and editing.
I'm a bit torn about things like bitmap editors. To what extent should they be a dedicated application, presenting an object model dominated by a SampleMap with commands such as Paint and Fill and so on, as opposed to them being a UI module for editing SampleMaps of Intensity (a subtype of Number) and SampleMaps of RGBColour, which would then be usable wherever an editable image appeared in any application?
But what about things like:
- Computer games
- Remote Desktop Connection / VNC / Xnest
- Audiovisual chat
Some things, I decided, are fundamentally tied to particular interface hardware. For these, I decided not to worry about expanding the object model, although I think quite a few computer games might be doable with it; the object model for VNC would just be a 2D pixel/pixel data set containing a single SampleMap which is being updated in real time, with annotations telling it it wants raw mouse clicks, motions, and key presses mapped to the MouseDown, MouseUp, MouseMove, KeyDown, and KeyUp commands, which strikes me as a bit kludgy.
So for these very hardware-dependent things, I came up with a different idea; a standard interface for "media dialogues". The application emits a setup request listing the channels it provides - eg, "two video channels (640x480 and 320x200), one stereo audio channel" - and the list of return channels it wants - eg, "one raw keystroke channel (a mixture of Unicode characters and function-key command names, as discussed above); one pointer channel; one mono audio channel". The UI decides how to map these to the software it has; the keyboard and mouse and microphone can cover the return channels, and the two video channels can go into two side-by-side windows, and the stereo audio channel can go to the sound card.
The fun thing is, this media dialogue system (especially when considered in an X11-like context as a network protocol) can be used for:
- Text, audio, and video chat (as well as the raw keystroke channel, there would, like in VIS, be a "cooked strings" channel, useful for command lines and for instant messaging)
- Remote desktop work
- VoIP - one can connect to another user's agent and get bidirectional audio; or one can connect to a PSTN gateway and get bidirectional audio plus a raw keystroke channel each way for DTMF and signalling, down which you send the number to dial, and in return you get any DTMF the far end sends, plus commands to notify of HANGUP, ENGAGED, etc. conditions.
- Watching streaming video (the far end gives you one, or maybe more (camera angles?) video streams with an associated audio stream, and maybe a cooked-strings channel of subtitles)
The available channel types might be:
Raw keystrokes. This consists of Unicode characters, plus commands identified by name for when function keys are pressed. Sequences such as Ctrl+a shouldn't be mapped to Unicode control characters, as is the Unix convention, since in practice they're used as commands rather than as characters. I'd be inclined to have them all mapped to commands by a mapping table stored in the UI settings, rather than per-application, so Ctrl+a always maps to cursor-move.to-beginning-of-line, F1 to help, and so on. I suspect that many function keys would be captured by the UI system itself; I'd have F5-F12 selecting my virtual desktops, personally, leaving F1-F4 as applications commands "Help", "Save", "Undo", "Redo" or some such. In fact, I'd disallow any Unicode control characters. Not even CR and LF and Tab. Those should be mapped to commands.
When a UI is asked to provide a raw keystroke channel, it simply forwards key events from its hardware. When it's given a raw keystroke channel from an application, it gets more interesting; it would presumably treat it something like a console, outputting characters in a scrolling area, handling commands like backspace and return appropriately. But I don't want it to be abused as a 2D console output protocol like VT100 - it would handle cursor movement commands by shifting the cursor, so an interactive chat with another user will work sensibly when the user moves the cursor back to insert a character to correct a spelling mistake, but it wouldn't let them change colours or anything like that. Any unsupported command would be displayed as [Command Name], in a distinctive style.
Cooked strings. These consist of Unicode strings, again with the control characters banned, but this time without any way of including commands. A UI asked to provide this can do so with some kind of text editing input control, submitting when they hit Enter. A UI given one of these as a return channel can just spit the strings out, one to a line.
2D video. This one's easy. A UI asked to provide this can do so if it has a webcam, or the user nominates an MPEG file to stream in. A UI given one of these gives the application access to a drawing context. Done! Whether the video is in the form of a lossy-encoded MPEG stream, a sequence of vector graphics commands, OpenGL commands, or a sequence of VT100 text canvas commands, is a minor detail.
2D Pointer. Providing this from a UI is easy in a WIMP environment; you stream in mouse events when the pointer is within one of the 2D video windows being displayed. If the application isn't providing any 2D video, then it can't have pointer events, since there's no frame of reference for the co-ordinates to make sense. The event message sent to the application consist of the ID of the video stream the coordiantes pertain to, coordinates in the appropriate co-ordinate system for that video stream, a bit-mask detailing the state of buttons, and a time-stamp.
There's an interesting question of how a UI given one of these as an output stream might react. It should probably just ignore the channel.
Audio. Mono, stereo, or the various types of surround sound. As with video, whether it's encoded as raw samples, MP3, or MIDI/MOD/S3M commands is a detail. The UI is in charge of converting an offered audio stream to whatever audio output hardware it has, and can use either microphones or a user-supplied recording file as the source of an input stream.
In future, one might have to add binocular video streams (for telepresence or VR or augmented reality), dataglove/body motion sensor streams (how to display those if a UI is offered one as an OUTPUT stream is interesting, as with 2D pointer channels), direct neural interface streams...
What I really need to do is to catalogue all the applications I have on my computers, and all those I've seen others use, and decide how I'd map them into my system - to find its weaknesses! How well would a CAD application work when split into a 2D data set object full of lines, and (if necessary) a bunch of UI modules that specialise the editing of lines?
I eagerly await his response!