An actually usable voice assistant interface (by alaric)
Ok, this is going to sound a bit weird, but I had a dream last night and part of that dream as an actually usable voice assistant system (as in, Google Assistant / Alexa type of thing). I woke up all excited about this, so I need to write it up so that I can see if it's actually a good interface when I come back to it in few days, or just dream-hubris...
So the dream began with me turning up to work in an office, on my first day in a new job (I've recently started a new job, which is probably where that came from). The company (I never found out what they actually did) had a rambling campus of varied buildings, and I was told to just find an unclaimed desk in a building I liked, and claim it.
All the unclaimed desks were clean and empty, apart from a computer and a desk phone. And the desk phone is where it gets interesting.
So I sat down at a desk I fancied the location of, and looked at this phone. It was sort of the size of a paperback book; it didn't invite lifting it to your ear, but instead you held it how you'd hold a book in front of you, comfortably in one or both hands. The front of it was a screen, although seemingly not a touch screen; it was more like an e-ink display, reflective and not drawing power when quiescent so already displaying an image when I picked it up. And what was on the screen was the following text:
"New here? Just say 'Hi, I'm new'."
...and below that some other instructions for existing employees getting a new desk and so on, that didn't apply to me. So I said "Hi, I'm new" as instructed and the phone welcomed me, both with voice and on-screen, and asked me a few questions (mainly, my name).
At all times during the process:
- The voice asked me a question or told me something, and what it was saying was on the screen as well.
- The screen also listed extra information beyond that, including other command phrases I could speak instead of answering the question (eg, "Just say 'Please go back to the last question' if you realise something isn't right").
- When I said something, the screen would show what it had heard and how it was interpreting it, and told me what I should say to correct it if it had gotten something wrong.
This is really interesting compared to a normal voice assistant. It was always clear what voice commands were available on-screen so I knew what options there were and how to trigger them (which is perhaps my BIGGEST frustration with these things - feature discoverability), while the speech from the device didn't need to waste time telling me information I might not need; it could be short and to the point via speech, while still providing extra information on the screen. Having the stuff it was saying echoed on the screen meant I had a second reference if I misheard anything. Having what I said to it echoed back on the screen meant I could double-check it without it having to tediously repeat it back to me.
But perhaps most significantly, because it prompted me on-screen with the command phrases I could use, the speech recognition inside it was much simpler and more robust than real-life voice assistants. I didn't have to worry about learning the correct phrasing because I was prompted for it on-screen, and it didn't need to try and handle a vast variety of different phrasings. And a minor implementation detail of it that came up later in the dream was that some parts of the speech recognition were actually offloaded to humans anyway - it turned out that my answers a bunch of questions it had asked me in my onboarding process were just recorded and turned into a kind of voicemail message the phone sent to the staff members responsible for setting things up for me, and didn't require computer speech processing at all.
When it asked my name, it asked me both how to spell it and how to pronounce it - and it used the spelling to register me in the company directory and to display my name back to me, while it used my pronunciation of my name to say it back to me (in its own voice, however). It seemed to avoid the need to actually understand arbitrary text; it worked by recognising a manageable and context-specific list of words (including the names of letters); the most sophisticated thing it did in the dream was the trick of hearing how I pronounced my name and then being able to record that as well as the spelling of my name so it could say it back to me (and, presumably, to other people looking me up in the directory as well).
Perhaps due to the simplicity of its speech recognition, it was fast - no waiting for processing before it responded. Conversations with it were snappy and efficient, and could be had hands-free; it actually felt like a time saving compared to doing things on a conventional computer.
From that point onwards, the "home screen" of the phone when it was idle had my name at the top, and a list of prompts of command words I could use to do things, which neatly summarised the system's features. It provided a load of information about the offices (but, still, nothing about what the company actually did), a directory of all the staff, and it managed workflow - requests came to me on this device as messages from other people, usually due to them having a dialogue with their phone and it then synthesising a combined voice-and-text message that popped up on my phone. Not once in the dream did anybody actually call anybody else on the phones, although they offered that function in the directory interface - the system presented structured dialogues to make requests, and it would produce a message to send to the right person.
I quickly learnt the common voice prompts I needed, needing to look at the screen less and less, but it was still there in my pocket if I needed it; I just needed to prefix my commands with "Phone:" if I wasn't making "eye contact" with it as I spoke.
The dream itself was mainly about exploring the company campus, on which everyone lived as well as worked, and was full of really interesting and beautiful buildings, and I was trying hard to find out what my job actually was (while still managing to do it, by simply doing the things asked of me through messages that popped up on the phone; but the reasons why I had to do these things never became clear). The company was tremendously efficient - when I was hungry I ordered food through a dialogue with my phone, and somebody delivered it within minutes, because a message popped on their phone asking them to. Before long, the phone instructed me to marry one of my colleagues, and our phones performed the ceremony for us; my dream ended with the two of us sat in the shared accommodation we had picked from the many little apartments dotted around the buildings, telling the apartment's phone that the two of us now lived there and going through its welcome dialogue to set the apartment up to our liking.
So, yeah, weird dream, but I think that "combined display and voice interface with significantly simplified speech recognition" thing might actually be a usable voice assistant, if the speech recognition could be made as real-time as I dreamt it. I think a small camera on the device would also enable it to know if my eyes were pointing at it to address it without the "Phone:" prefix, too.