Skip to main content

8 tips for designing voice interfaces

Regardless of what you want to call them – voice assistants, voice services, voice UIs, one thing is clear: the artificial intelligence behind voice technology is already advanced enough to make it the most efficient way to perform many common tasks. The user experience is improving at an impressive rate.

In fact, the machines behind voice services such as Amazon Alexa, Google Assistant, and Microsoft Cortana are growing smarter by the day. They can understand different utterances, variations in syntax and different accents – and thanks to cloud computing they can access vast amounts of data in the blink of an eye.

In order to train the next generation of voice designers, tech school CareerFoundry has built a comprehensive Voice User Interface Design (opens in new tab) course in collaboration with Amazon Alexa, and in the process, discovered these eight important things that you need to consider when designing for voice...

01. Users don’t talk the way they type

Users have developed a specific way of instructing devices

Users have developed a specific way of instructing devices

If we want to find a good sushi restaurant in Berlin, we might type ‘best sushi Berlin’ or ‘top sushi restaurant Berlin.’ In contrast, in speech with a friend, we might say something like, ‘Do you know any good sushi restaurants in Berlin?’ or ‘What’s your favourite sushi restaurant in Berlin?’

When using a voice service, we tend to be aware that we’re talking to an artificially intelligent machine – so we don’t necessarily say, ‘Alexa, what’s your favourite sushi restaurant?’ but rather, ‘Alexa, find me a good sushi restaurant in Berlin’ or ‘Alexa, where should I eat sushi?’

There’s a new pattern of commands emerging, where the voice service assumes the persona of a helpful assistant. We speak to them with natural speech, but not as we would to a friend. 

Over time, this may change. Who knows, maybe Alexa will respond, ‘Say please', to ensure we don’t lose sight of our good manners. Otherwise, we might end up in a future where we talk to our partners with the same directness as we do our devices.

02. Personalisation is paramount

Voice interfaces that remember preferences add to the ease of use

Voice interfaces that remember preferences add to the ease of use

Already in voice-operated devices there’s a clear tendency towards personalisation. This is all part of creating a quick, efficient experience. A great way to do this is for a device to remember your preferences so you don’t have to input information every time you use it.

For example, if you use the Deutsche Bahn Alexa skill to check your train times to work, rather than naming your departure and arrival station each day, the device remembers your route to work.

Similarly, you might not want to order from Amazon and have to tell it your address and postcode through speech (‘8TK’ ‘80K?’). However, once you’ve ordered online once, Amazon will remember your delivery address and simply ask you to confirm it when you’re placing repeat orders via voice.

03. We need new conventions for showing system status

Cortana has different animations to indicate what it's doing
(opens in new tab)

When you’re waiting for a webpage to load, not receiving feedback is incredibly frustrating. Mainly because you don’t know if it’s crashed or if it’s just taking its time. Status feedback is an important part of good user experience – if a site has to search through a large database to find what you need, it’s important to keep the user informed on where it’s up to.

When you’re talking to a voice service, you’ll also want reassurance that it’s switched on and listening or performing a given action. However, you don’t necessarily want it to talk over you and disrupt your flow. This is where other indicators like lights or sound effects, which don’t disrupt your speech, can serve a valuable purpose.

Have you ever had the experience of being on the phone and suddenly wondering if the person on the other end is still there? Thankfully we often use subtle audible clues to let the other person know we’re still on the line when they’re in the middle of a long anecdote.

Similarly, voice services can find subtle ways of letting us know they’re switched on and at our service. Amazon Alexa devices reassure their owners that they’re listening with flashing lights and non-disruptive sound effects. Maybe in future we’ll be so used to them being failsafe that we won’t need any reassurance.

04. Adapting for flat navigation is key

Users need to be able to direct voice assistants easily

Users need to be able to direct voice assistants easily

When designing the UX of website, the site navigation is crucial. What are the most common actions a user performs? What options should be available on the homepage? How many click-throughs does it take a user to perform a simple task?

When users interact with the web using voice, they’re likely to bypass many intermediary stages and go straight to the information they need. For example, a user who wants to order from Amazon will not say, 'Alexa, go to, then go to my account, then view my history, then find coffee, then place the order again.' They’ll simply go straight to the final step: ‘Alexa, re-order coffee.’

05. Talking should come naturally 

The aim is to make the voice interaction as intuitive as possible

The aim is to make the voice interaction as intuitive as possible

Users don’t want to memorise hundreds of commands to perform specific tasks. The whole point of voice services is to leverage our most natural communication style and applying this to computers, not to create something new that takes time to learn. 

Graphical interfaces have a few codes we have come to understand. For example, if you can’t find what you’re looking for, it’s probably hiding in the hamburger menu. With voice, we may end up with a few established conventions, but in general the aim is to make voice interactions so intuitive that anyone could pick up a device and start using it.

This will be an exciting task for designers and programmers: understanding the natural cues in conversations and teaching computers to understand us and seamlessly provide an answer or perform a task. 

The database of utterances that machines can understand is growing daily, and it’s very possible that we’ll reach a point where machines are better at deciphering our drunken slurs than our friends are.

06. Accessibility has different implications for voice

Voice interaction designers need to be aware of speech or hearing impediments

Voice interaction designers need to be aware of speech or hearing impediments

As any UI designer will tell you, one of the most important things to worry about is accessibility. Fonts, colours, and graphics are not just aesthetic matters, it’s also about making sure everyone can access the content – for example, is the contrast making your content illegible for people with visual impairments?

Considerations around accessibility are important in voice interactions, but they take a different form. Voice interactions rely on two things working successfully: the device understanding the person talking, and the person understanding the device. 

This means designers should plan for speech impediments (not just regional accents), hearing impairments, and any other factors that could influence the communication, such as cognitive disorders. 

07. It's difficult to curate answers but avoid bias

We’re used to finding hundreds of results when we search on Google, however, when we’re interacting via voice we often just want the device to intelligently pick the best answer for us, rather than reeling off a long list of search results. 

This could get complicated quickly. For example, imagine asking your voice service to tell you what the best headphones on the market are. The information is quite subjective and could easily lend itself to commercial bias. 

Search engines will also have a huge role to play in determining which content is picked up by voice services, and this will no doubt result in some debates and accusations.

08. Privacy and fraud present a problem

We may need a way for our devices to adapt what they offer depending on who's talking

We may need a way for our devices to adapt what they offer depending on who's talking

From your children talking to Amazon Alexa and re-ordering tons of chocolate, to people overhearing your confidential information – there are lots of considerations around privacy and security when it comes to voice interfaces.

An obvious solution is password-locking devices, but again, with voice it probably won’t take much effort for the kids to learn it. In the future, it’s likely that shared devices will recognise users by their voice and personalise their experience accordingly. 

In the meantime, we’ll have to think carefully and recognise that there are some use cases where voice just isn’t appropriate.

Embracing voice interactions 

Alexa is even on smart watches

Alexa is even on smart watches

As voice interaction evolves and machines become smarter and smarter, there will no doubt be loads of considerations that we haven’t even thought of today. 

Voice is potentially the next huge paradigm shift in technology, and with machine learning moving at the rate it is today it’s highly possible that over the next few years machines will become better than humans at deciphering human speech.

If you’re working in technology, you’d be silly to ignore voice. There are lots of exciting initiatives out there, and Amazon, Cisco, IBM and Slack are just some of the big companies investing heavily in voice startups. If you’re keen to learn more about designing for voice, check out CareerFoundry’s 8-week Voice User Interface Design with Amazon Alexa (opens in new tab) online course.

Related articles:

Florence is the blog editor at Berlin tech school CareerFoundry. Her interests include promoting a happier society and devising solutions for the impending age of technological unemployment.