Does Voice Augmented eXperience in apps help emergent users?

Slang Labs and Srishti Institute of Art and Design jointly carried out User Research to find out -- does Voice Augmented eXperiences actually help the Next Billion Users or emergent users to complete their tasks faster and better.

Slang Labs was built with the assumption that people would love using their voice as a means of interacting with their mobile and web apps (what we call as Voice Augmented eXperience). What’s not to love about voice right? It’s natural, fast, easy, no training manuals needed. But then as anyone who assumes love would know, validate! Ask him/her first before you book that wedding hall.

And this is what we set out to do by partnering with Srishti Institute of Arts and Design, a premiere design Institute in Bangalore.

External validation via research has always been important at Slang. It’s so fundamental to our company that even our company’s name was chosen by asking people what name they could recall easily, pronounce easily and resonated with them better. It was a toss-up between Slang Labs and Polyglot Labs. Guess what won.

Research focus

Does voice enable users to complete their tasks better and faster?

Our research focused on finding answers to two primary questions — If users could talk to their apps as opposed to only using touch, will it help them to complete tasks better and will it help them do it faster?

TL;DR

What did we find? Drop-offs reduced for certain type of tasks when they were voice-enabled. If the steps to complete a task was below a threshold, touch was faster but for others, talking was faster. There were also lots of interesting observational data.

Drop-offs reduced for certain type of tasks when they were voice-enabled.

If the steps to complete a task was below a threshold, touch was faster but for others, talking was faster.

Segment of participants

We did the research with about 49 participants. We focussed our research on emergent users — users who are non-technical, not highly educated, use a smartphone but not very comfortable with it, used YouTube, not comfortable with English even though it’s aspirational — as they were a big part of our target audience.

Here are their gender and educational breakdown

Gender breakdown: Male-69.4% and Female-30.6%

Majority of the participants had studied till mid-senior school

Research Methodology

We built two English language apps for the purpose of this research. The app was functional but the transactions themselves were simulations.

  • A banking app
  • An e-commerce app

We asked our participants to do the following tasks

Using the banking app

  • Check the balance
  • Transfer money to someone
  • Reorder a checkbook

Using the e-commerce app

  • Track an order
  • Cancel an order

We asked them to do the above tasks twice -

  • Once with using touch only
  • Once with voice (with either English or Hindi)

All participants had no prior experience with this app. We measured the following -

  • Did they complete the tasks?
  • Did they ask for help?
  • How much help was needed?
  • How long did it take them to complete the task (for those who did complete)?

Learnings from the research

And without further ado, here is what we found

Task completion rates of participants

We saw that when participants could talk to their apps to get things done, there was a 5.71% higher task completion rate as opposed to using the touch-only experience.

Higher drop-off percentage observed in touch-only experience

The task completion rates varied with varying degrees of help we provided


Voice Augmented systems helped the participants to better complete the tasks without the extra cognitive overload of “learning the interface” to do these tasks. But with some help, touch only experiences helped improve completion rates, but they tend to forget that quickly.

Participants were more likely to finish their task when they could talk and get things done as opposed to touch only.

And the type of tasks had a big impact on the drop-off rates

Some tasks with less visual cues (like tracking an order in our case) or tasks which were hidden below two layers of menus (checkbook order) experienced the maximum drop-offs. Discovering these capabilities took higher cognitive effort and it showed in the completion rates.

Poorly designed but easily accessibly content (like checking balance, which was just one click away) also experienced drop-off because the text was in English and did not have any symbols to make it easy.

Similarly, for voice-based tasks, funds transfer saw a bigger drop-off, because participants tried to transfer money to people who were not in the payee list. This could have potentially have been reduced further if we had added the option to trigger adding a payee in the same flow, but that was beyond the scope of the experiment.

Intuitive Visual cues help but training on what to speak is still easier than training on what to touch

Time to complete tasks by participants

We measured the time it took to complete each of the tasks (when they were actually completed). Here is a breakdown of the time for the various tasks.

What is a task time score?

Its the relation between the help provided and the time it took. Lesser the help, lower the score. Lower the score the better.

For a task that could be completed in less than 3 clicks, touch was faster. For tasks that required 3 or more clicks, voice was faster.

3 is the magic number

Other observations included the way they spoke to their apps. They behaved as if talking to a human and started with a “hello”. So some training could have gotten the time down drastically while the same is not necessarily true for touch only. Because even with repeat usage, the slowdown caused by clicks will more or less stay.

Other feedback gathered from the research

Once the participants completed the tasks, we then asked them a series of questions about their usage. Here are some of the results from the same -

Did it feel natural to speak to your app?

87% of people felt natural to talking to an app whereas for the rest 13% it wasn't the case

A vast majority (87%) said it was natural. It sounds high, but it’s probably because we mostly spoke with people who had trouble using complex features of an app

How did you find talking to your app?

Majority of the participants had a positive experience using voice in their app

Most of the participants gave positive feedback on being able to talk to their apps. Participants wanted to use voice in other apps, asking the researchers on how to get voice commands to other apps (we had to tell them not yet ;-)). The 3 people who said they didn’t like talking to their apps, was because they used languages other than what was supported by Slang, which had led to a bad experience.

Which apps would you like to have voice capabilities in?

9 people said that they want voice-enablement inside all their apps. 15 people said they wanted voice inside e-commerce apps and 9 said they wanted it in their banking apps.

Its possible seeing voice in an e-commerce and banking demo app might have biased the participants, which led to a higher number of people wanting voice inside those particular domains.

Other Observations:

Tap vs Tap and Hold

While the voice experience could be triggered by just clicking on the mic button inside the app, we observed that many were trying to tap and hold the mic button while speaking. This seems to be heavily influenced by how they use voice messaging inside apps like WhatsApp, where you hold on the mic button to record a voice message.

Command vs Converse

We asked participants if they would like to “talk” (or converse naturally like to a human) or “command” the app. Interestingly many felt they would like to “command” their app. They did not seem to want to think of this as a “person” and would prefer just referring to it as an “it” and something that can be ordered.

Fear of usage

Some of the comments that we gathered seemed to indicate that they felt less intimidated by voice than by Touch. When they are using Touch to interact with the app, there is a fear about not knowing (ambiguity) what they are doing and also about the fear of making a mistake. With voice, even if they did not know what exactly to speak, they did not feel intimidated by the experience.

Improves level playing field

“Voice is a natural form. For people who are not educated or even for those who are, the ability to talk to get things done is natural.”


Reduces dependency

“Voice is great as it enables one to get things done without needing help from.others. If we depend on others help, they can potentially misuse it”

Conclusion

The research helped us understand the value and the issues associated with enabling people to talk to their apps and also helped us improve our product (a platform to help developers build such Voice Augmented eXperiences inside their own apps). We came back more optimistic and energized about what we are building and the immense potential it could offer to the apps that are out there. Hope it helps you, the reader, as much as it did us.