Voice Assistants
Transforming our lives
one voice interaction at a time

Voice Assistants

We have forever had to learn the language of technology, be it keyboard, mouse or touchscreen. Voice Assistants have turned this around. With voice, technology has to learn to understand us, understand our language.

Our understanding of voice assistants or voice-based products is typically limited to Siri, Alexa or Google Assistant style general-purpose Assistants. The reality is there's a bigger picture out there that many of us are not familiar with. In this blog, we will present you a more holistic picture of Voice Assistants.

What are Voice Assistants?

Voice Assistants have become synonymous with Google Home and Amazon Echo. This cannot be more wrong.  Underlying technology powering both these smart speakers are the actual voice assistants, Google Assistant and Amazon Alexa, respectively.

Today voice assistants are not limited to smart speakers but are also in cars, household devices, smartphones and now in a lot of apps as well.  

Voice Assistants actually are a subset of Virtual Assistants or Intelligent Personal Assistant. These virtual assistants can take inputs in different ways:

  1. Text: Such Intelligent virtual assistants called chatbots are text-based assistants.
  2. Voice: These types of assistants are called Voice Assistants.
  3. Image: These assistants take an image as an input, e.g. Google Lens, Bixby Vision

We will leave the chatbots and Image-based assistant to other experts. Let’s get back to Voice assistants.
Voice Assistant is a virtual assistant that uses speech recognition, natural language processing and speech synthesis to take actions to help its users.

Essential Conditions for being a Voice Assistant:

Every voice solution is not a voice assistant, but every voice assistant is a voice solution. To be called Voice Assistants, a voice solution needs to match these conditions:

  1. Voice as input: Primary mode of input for a Voice Assistant should be through Voice.
  2. Conversational: Voice Assistants should be able to have natural and contextual two-way communication with the user.
  3. Confirmational: Voice assistants should be able to confirm, clarify and answer the user with context.

History of Voice Assistants:

Voice assistant integration has taken two different routes- one is on the smartphone, and the other is on smart speakers (and other connected devices).
Apple, Google and Amazon have been focused on making their Voice Assistants ubiquitous through general-purpose assistants.

Apple has primarily taken the route of getting Siri on all the Apple devices. They launched the Homepod smart speaker without much traction.
Google is uniquely positioned where they catapulted the adoption of Google Assistant by making it available almost mandatory across all the android smartphones and also finding success in the smart speaker market with Google Home lineup.

Amazon had an early mover advantage in the world of smart speakers where they were almost two years ahead of Google and found success with Alexa in Echo devices.

The latter two, have quickly moved to offer voice assistants across a variety of third party devices in different form factors to appeal to different user preferences and contexts.

We don’t want to bore you with the history stuff. We could have written a bunch of paragraphs going into details of what happened when, but it’s best if we just give you a quick graphical snapshot with this fantastic timeline created by our friends at voicebot.ai.

The voice assistant journey can be broken down into two phases:
Phase 1 was all about getting consumers introduced to the idea of using voice to perform tasks. Phase 2 is about voice becoming a pervasive interaction mode that has more capabilities and is used more frequently on more devices, apps and in different contexts.

How do Voice Assistants work?

Do you ever wonder how a single command like ‘Alexa, how is the weather outside ?’ is interpreted by your smart speaker. Don’t worry. We will break it down for you to understand how this magic happens. We have abstracted out the nuances of this works and simplified it to help you understand:

  • Automatic Speech Recognition
  • Natural Language Processing
  • Desired business logic via hooks
  • Text To Speech

Voice Assistants gaining popularity

A report by Activate Forecast mentions how the Smart Speaker market adoption has been faster than the Smartphones in the United States. It’s perhaps the fastest adoption of any new technology in the modern world.
This shows the Paradox of Intelligent assistants: Poor Usability and High Adoption.

However, here is another curveball for all of you:According to Voicebot.ai, There are twice as many monthly active voice assistant users on smartphones as smart speakers, and voice usage in cars also exceeds use on smart speakers.

Usage of Voice in the Developing world

Smart speakers are yet to take off in the developing world, they might be all the rage in the West, but Voice Assistants on smartphones is still the king here.
In India, masses cannot afford a stand-alone Echo dot or a Google Home, but they can use the same voice assistant through their existing smartphone.
According to the latest estimate by eMarketer, a quarter of India's population will use smartphones. Here are a few points from the report published in 2018.

  • There were 291.6 million smartphone users in India by the end of 2017.
  • The number of smartphone users in India is estimated to hit 337 million by the end of 2018.
  • The number of smartphone users in India would reach 490.9 million by 2022.

Voice Assistants will help those 'Next Billion Users' in India who will be coming on the internet for the first time to use your app. These NBUs are present in India 2. Non-metros (India 2) are catching up with metros (India 1) when it comes to internet usage. The three critical pillars too enable this transition - voice, vernacular, and video.  

Google's year in search report highlights the importance of targeting these NBUs. Advancements in speech recognition have enabled a better understanding of Vernacular Indian languages. Leveraging Voice will help NBU who are not familiar with a smartphone to interact with the app intuitively. 70% of the users are bypassing laptops and desktops and directly using smartphones. (Data from Google Insights).

So far, less than 1% of the online content has been available in Indian languages, even as almost 90% of Indians can only speak, read, and write in regional languages.

India has a vast potential to be the biggest market for voice search with an expected YoY growth of 270% as revealed by Google.

Vernacular language users are going to account for roughly 75% of India's internet user base by 2021. A research was done by KPMG and Google which points out that by 2021, nine out of ten new internet users in the country will be a native language speaker and Google search trends show a significant move in this direction as well. This presents an exciting opportunity that can be tapped by voice search to create disruption, especially in a country like India, where people prefer to interact in their language.

Based on a report by Recogn, Dentsu Aegis Network(DAN) India's owned agency's market research division, the market is expected to experience a YoY growth of 40.47% from Rs. 149.95 Cr in 2019 to Rs.210.63 by the end of 2020. The report also states that 76% of the users are familiar with the speech and voice recognition technology.

This number is only going to grow by three times over the coming years, according to Juniper Research. What’s interesting to note here is that the firm also states there will be around 8 billion digital voice assistants in use by 2023, which is a significant increase from 2.5 billion assistants in use at the end of 2018.

Types of Voice Assistants

Different types of Voice Assistants present today.

If we try to understand the type of VA’s available, there are broadly three categories into which we can divide them:

General Purpose Voice Assistants

Google Assistant, Siri, Bixby in Android, iPhone and Samsung devices respectively are great examples of General purpose Voice Assistant that are present on the smartphones, smart speakers and other smart devices.

All of these Voice assistants help you with general-purpose things like setting the alarm, scheduling events, making calls, launching apps amongst other things.

Google Assistant also has a feature called App Actions which can trigger actions inside specific apps using deep links. Recently, Alexa announced the same with Alexa on apps. Unfortunately, Siri doesn’t have any such feature currently and has poor performance compared to both Alexa and Google Assistant.

Increasingly Google Assistant and Alexa are shifting strategy and focusing towards being able to do tasks inside apps encroaching on the niche carved by in-app Voice Assistants.

In-app Voice Assistants

Witnessing the popularity of Voice Assistants, some brands have started adding Voice Assistant to the primary mode of communication - their apps and websites. TrainMan, Flipkart, Bank of America have added in-app voice assistants and yuyiii.com has added it to their website.

These voice assistants are present in an app to ease or elevate the customer experience.
They are generally of two types:

  • Owned

These in-app Voice Assistants are built in house by the businesses mostly using service provided by component providers like Nuance, Google Cloud and others. Some of the companies which have launched their in-app voice assistants are Bank of America with Erica, Capital One, Flipkart and Jio within the My Jio app.  

  • White Labelled

Many small and large players have started adding white-labelled or managed in-app Voice Assistants to their app. They either use full-stack solutions like Slang Labs, which provides Domain-Specific in-app voice assistants. The speed of execution and well thought out conversation designs makes this option attractive for businesses. Some businesses also add custom-built assistants on top of platforms like Houndify, Mycroft AI or Alan to their apps.

Stand Alone Voice Assistants

Nest Home devices by Google and Echo devices by Amazon are some of the First party products. Some of these devices are cheaply available and retail for as low as $25. Amazon has recently launched wearable echo devices such as Echo Ring and Echo Eyeglasses. Google and Amazon have also launched wireless earbuds with Assistant and Alexa built-in respectively, kicking off the race of smart wearables.

Smart Speakers and Smart Devices

First Party Smart Devices

These devices are sold by the companies which make these voice assistants in most cases - Amazon and Google. They have Google Assistant and Alexa built-in.

Nest Home devices by Google and Echo devices by Amazon are some of the First party products. Some of these devices are cheaply available and retail for as low as $25.

Amazon has recently launched wearable echo devices such as Echo Ring and Echo Eyeglasses. Google and Amazon have also launched wireless earbuds with Assistant and Alexa built-in respectively, kicking off the race of smart wearables.

Second Party Smart Devices -  Devices with built-in Voice Assistants 

The key difference between 2P and 1P devices is that 2P devices are manufactured by different OEMs and have integrated the general-purpose voice assistants like Alexa and Google Assistant in them.

Examples of such devices are Smart TVs, Smart Watches, Smart Refrigerators. Some companies like Sonos, Harman Kardon, Bose even have smart speakers with both Alexa and Google Assistant built-in.

Third Party Smart Devices - Support for Voice Assistants

These third-party devices don’t have a voice assistant built into them but can be controlled by Voice Assistants. They are popularly called smart devices and are connected to the internet. Some of these IoT devices can be controlled using Voice Assistants.

Markets have been flooded with such devices in the last few years -  smart bulbs, refrigerators, toilets, faucets, microwaves, dryers, air conditioners and more. 

Domain Specific v/s Domain Agnostic Voice Assistants

Domain Agnostic Voice Assistant

Tech giants like Google, Amazon, Apple have created their own general-purpose assistants, also known as domain agnostic voice assistants with the help of large amounts of data that is available to them. These types of assistants work across domains to do generic tasks. 

Domain Specific Voice Assistant

Domain-specific voice assistants as the name suggests are specific to a particular domain for, eg., Grocery, Healthcare or Hospitality. They are either purpose-built for these domains and optimised for them.These voice assistants have much higher accuracy and support use cases relevant to the domain leading to better user experience. This is possible due to the boundaries of information that come with a particular domain.

Challenges faced by Voice Assistants

Yes, voice technology has problems. Call them challenges or call them opportunities that one can tap into it, pun intended:

Privacy

Privacy is a huge concern especially when it comes to smart speakers; these speakers are always listening for their wake word posing a huge privacy concern. 

The crucial detail that is often missed out, is the difference between listening and recording. Once these speakers or voice assistants get activated using the wake word, then they start recording the audio.  

These audio clips are sent to Google or Amazon. These tech giants have exposed these audio recordings to humans, albeit in an anonymised manner yet, a significant infringement on privacy. There have already been numerous cases where such recordings have led to privacy issues.

Accuracy

Voice Assistants don’t always understand what's spoken. There could be many reasons behind these- sometimes it could be because of how we speak, our accent can cause that.While sometimes it could be because the voice assistant simply doesn’t what to do with your question because it doesn’t have any instruction related to your query.

Lack of vernacular Support

Speech recognition, perhaps the most critical component of a Voice Assistant, is not present for a lot of vernacular languages spoken around the world. Countries like India with a massive Indic speaking population, lack of quality ASR model for vernacular languages is often a limiting factor in providing good voice experience.

Currently, most of the Natural Language Processing is done after translating spoken utterances from vernacular languages to English. In this process, a lot of contextual nuances are lost or are changed.

Future of Voice Assistants

Advancements in Artificial Intelligence, Machine Learning is truly revolutionising the way we are using voice assistants in our daily lives. 

With voice now establishing itself as an ultimate mobile experience, businesses are only beginning to understand how they can integrate voice in all their activities. A recent report by PwC reveals that adoption of voice assistant technology is highest among the 18-24-year-olds. But the group that uses voice assistants most frequently is the 25-49-year-old group.

The coming time presents a lot of opportunities for Voice to grow by leaps and bounds, but lack of skills and knowledge makes it difficult for businesses to get on board with a voice strategy. If one is in it for the long run, voice will present an opportunity to understand and provide experiences to your consumers like never before. The question is, is your brand willing to jump on this opportunity?

Slang Labs Logo

Slang Labs provides accurate and multilingual in-app voice assistants. These voice assistants can be used out of the box for android and web apps with just a few lines of code.

Find us online