It's a weekday. I'm standing in my kitchen, jumping around between the counter-tops. Jam on toast, school uniforms, packed lunches, a crazy juggle to get everyone ready. In a moment of clarity, I realise I need to hear something to set a calm mood for the day. "Alexa, play 'Michael Bolton' on Spotify". The kids groan, and in seconds 'Can I touch you there?' fills the room with Michael's dulcet tones and 1980s ideas about romance. Aha! Mood perfection and child annoyance is complete. No screen, no looking, no physical movement, and most importantly, no touching. No touching, Mr Bolton.
This is now possible with Amazon's Echo (opens in new tab), an always-listening Wi-Fi speaker. It's one of those magical, once-in-a-blue-moon products like the iPhone. It recognises my words with remarkable reliability, even when shouted from the next room. It's become pretty central to our family life. And it's got me thinking about the changing nature of design and interaction.
A narrow context
When Apple launched Siri in 2011, I experimented with some random tasks to see what it could do. But as I understood its limits I stopped trying new things. There's no efficient way for a voice interface (VUI) to tell you what it's capable of. You spend time guessing and trying things out, which then leads to disappointment.
This isn't how I feel about the Echo. It is incredibly limited in what it can do, but unlike other VUIs, Amazon has designed it with a narrow context in mind, and its done an awesome job. It fits perfectly in the kitchen or the living room, where we generally need our hands and eyes devoted to other things. It's always on: no extra interaction is needed to trigger it, like holding your home button.
It now manages our oh-so middle class shopping list: "Alexa, add low-fat hummus to the shopping list," (it sends anything you list to the Alexa app for later reference while you're shopping). We also time our cooking with it. Plus it plays podcasts, radio stations and Spotify, so I can pretty much ask for any audio I like.
My kids love it: "Speaking to Alexa's a bit like a Jedi mind trick, isn't it Daddy? Tell her what to do, she repeats it, then does it" (Ellie, 7 years old). They also love adding things like Lego nunchuks to the shopping list in the vain hope I'll buy them.
For years users mostly hated any VUI they encountered (think phone 'help'-lines that made you feel like you were Basil Fawlty trying to communicate with Manuel) but now it seems technology has caught up. A product called Hound (opens in new tab) will now let you add very smart voice searching into your app. And Google is getting into the game with Google Home (opens in new tab). Its TV ads feature voice-search heavily these days: it wants users to understand what's possible and help them feel comfortable doing it.
So how does this affect us? I've started to examine use cases where VUI would be better than a screen, and there are quite a few. Why do I need an app to tell me the local weather? What about playing music – is all the complicated screen interaction really necessary? If you count the number of steps involved in playing music via a screen, it's pretty ridiculous compared to simply asking Alexa or Siri.
There are particular contexts where a VUI might be a useful replacement for a traditional app:
- Complex interactions: Are you making something where the user has to take a lot of steps to find something relatively simple to express in words ('Show me all the red dresses in size 12')?
- Hands-free: Is being able to maintain use of both your hands advantageous?
- Platform integration: iOS and Android now have APIs that allow you to make your app's content and functions available to use via voice.
Voice is definitely not the answer to every interaction, but it's one of humanity's oldest technologies, and arguably one of the most humane. Avoiding a screen lets us preserve our attention for things that matter, like chopping onions, or looking at other people. Try making something with it. Get out of your screen-based comfort zone, because change is coming - it's not if, but when.