We have forever had to learn the language of technology, be it keyboard, mouse or touchscreen. Voice Assistants have turned this around. With voice, technology has to learn to understand us, understand our language.
Our understanding of voice assistants or voice-based products is typically limited to Siri, Alexa or Google Assistant style general-purpose Assistants. The reality is there's a bigger picture out there that many of us are not familiar with. In this blog, we will present you a more holistic picture of Voice Assistants.
Voice Assistants have become synonymous with Google Home and Amazon Echo. This cannot be more wrong. Underlying technology powering both these smart speakers are the actual voice assistants, Google Assistant and Amazon Alexa, respectively.
Today voice assistants are not limited to smart speakers but are also in cars, household devices, smartphones and now in a lot of apps as well.
Voice Assistants actually are a subset of Virtual Assistants or Intelligent Personal Assistant. These virtual assistants can take inputs in different ways:
We will leave the chatbots and Image-based assistant to other experts. Let’s get back to Voice assistants.
Voice Assistant is a virtual assistant that uses speech recognition, natural language processing and speech synthesis to take actions to help its users.
Every voice solution is not a voice assistant, but every voice assistant is a voice solution. To be called Voice Assistants, a voice solution needs to match these conditions:
Voice assistant integration has taken two different routes- one is on the smartphone, and the other is on smart speakers (and other connected devices).
Apple, Google and Amazon have been focused on making their Voice Assistants ubiquitous through general-purpose assistants.
Apple has primarily taken the route of getting Siri on all the Apple devices. They launched the Homepod smart speaker without much traction.
Google is uniquely positioned where they catapulted the adoption of Google Assistant by making it available almost mandatory across all the android smartphones and also finding success in the smart speaker market with Google Home lineup.
Amazon had an early mover advantage in the world of smart speakers where they were almost two years ahead of Google and found success with Alexa in Echo devices.
The latter two, have quickly moved to offer voice assistants across a variety of third party devices in different form factors to appeal to different user preferences and contexts.
We don’t want to bore you with the history stuff. We could have written a bunch of paragraphs going into details of what happened when, but it’s best if we just give you a quick graphical snapshot with this fantastic timeline created by our friends at voicebot.ai.
The voice assistant journey can be broken down into two phases:
Phase 1 was all about getting consumers introduced to the idea of using voice to perform tasks. Phase 2 is about voice becoming a pervasive interaction mode that has more capabilities and is used more frequently on more devices, apps and in different contexts.
Do you ever wonder how a single command like ‘Alexa, how is the weather outside ?’ is interpreted by your smart speaker. Don’t worry. We will break it down for you to understand how this magic happens. We have abstracted out the nuances of this works and simplified it to help you understand:
A report by Activate Forecast mentions how the Smart Speaker market adoption has been faster than the Smartphones in the United States. It’s perhaps the fastest adoption of any new technology in the modern world.
This shows the Paradox of Intelligent assistants: Poor Usability and High Adoption.
However, here is another curveball for all of you:According to Voicebot.ai, There are twice as many monthly active voice assistant users on smartphones as smart speakers, and voice usage in cars also exceeds use on smart speakers.
Smart speakers are yet to take off in the developing world, they might be all the rage in the West, but Voice Assistants on smartphones is still the king here.
In India, masses cannot afford a stand-alone Echo dot or a Google Home, but they can use the same voice assistant through their existing smartphone.
According to the latest estimate by eMarketer, a quarter of India's population will use smartphones. Here are a few points from the report published in 2018.
Voice Assistants will help those 'Next Billion Users' in India who will be coming on the internet for the first time to use your app. These NBUs are present in India 2. Non-metros (India 2) are catching up with metros (India 1) when it comes to internet usage. The three critical pillars too enable this transition - voice, vernacular, and video.
Google's year in search report highlights the importance of targeting these NBUs. Advancements in speech recognition have enabled a better understanding of Vernacular Indian languages. Leveraging Voice will help NBU who are not familiar with a smartphone to interact with the app intuitively. 70% of the users are bypassing laptops and desktops and directly using smartphones. (Data from Google Insights).
India has a vast potential to be the biggest market for voice search with an expected YoY growth of 270% as revealed by Google.
Vernacular language users are going to account for roughly 75% of India's internet user base by 2021. A research was done by KPMG and Google which points out that by 2021, nine out of ten new internet users in the country will be a native language speaker and Google search trends show a significant move in this direction as well. This presents an exciting opportunity that can be tapped by voice search to create disruption, especially in a country like India, where people prefer to interact in their language.
Based on a report by Recogn, Dentsu Aegis Network(DAN) India's owned agency's market research division, the market is expected to experience a YoY growth of 40.47% from Rs. 149.95 Cr in 2019 to Rs.210.63 by the end of 2020. The report also states that 76% of the users are familiar with the speech and voice recognition technology.
This number is only going to grow by three times over the coming years, according to Juniper Research. What’s interesting to note here is that the firm also states there will be around 8 billion digital voice assistants in use by 2023, which is a significant increase from 2.5 billion assistants in use at the end of 2018.
If we try to understand the type of VA’s available, there are broadly three categories into which we can divide them:
Google Assistant, Siri, Bixby in Android, iPhone and Samsung devices respectively are great examples of General purpose Voice Assistant that are present on the smartphones, smart speakers and other smart devices.
All of these Voice assistants help you with general-purpose things like setting the alarm, scheduling events, making calls, launching apps amongst other things.
Google Assistant also has a feature called App Actions which can trigger actions inside specific apps using deep links. Recently, Alexa announced the same with Alexa on apps. Unfortunately, Siri doesn’t have any such feature currently and has poor performance compared to both Alexa and Google Assistant.
Increasingly Google Assistant and Alexa are shifting strategy and focusing towards being able to do tasks inside apps encroaching on the niche carved by in-app Voice Assistants.
Witnessing the popularity of Voice Assistants, some brands have started adding Voice Assistant to the primary mode of communication - their apps and websites. TrainMan, Flipkart, Bank of America have added in-app voice assistants and yuyiii.com has added it to their website.
These voice assistants are present in an app to ease or elevate the customer experience.
They are generally of two types:
These in-app Voice Assistants are built in house by the businesses mostly using service provided by component providers like Nuance, Google Cloud and others. Some of the companies which have launched their in-app voice assistants are Bank of America with Erica, Capital One, Flipkart and Jio within the My Jio app.
Many small and large players have started adding white-labelled or managed in-app Voice Assistants to their app. They either use full-stack solutions like Slang Labs, which provides Domain-Specific in-app voice assistants. The speed of execution and well thought out conversation designs makes this option attractive for businesses. Some businesses also add custom-built assistants on top of platforms like Houndify, Mycroft AI or Alan to their apps.
Nest Home devices by Google and Echo devices by Amazon are some of the First party products. Some of these devices are cheaply available and retail for as low as $25. Amazon has recently launched wearable echo devices such as Echo Ring and Echo Eyeglasses. Google and Amazon have also launched wireless earbuds with Assistant and Alexa built-in respectively, kicking off the race of smart wearables.
These devices are sold by the companies which make these voice assistants in most cases - Amazon and Google. They have Google Assistant and Alexa built-in.
Nest Home devices by Google and Echo devices by Amazon are some of the First party products. Some of these devices are cheaply available and retail for as low as $25.
Amazon has recently launched wearable echo devices such as Echo Ring and Echo Eyeglasses. Google and Amazon have also launched wireless earbuds with Assistant and Alexa built-in respectively, kicking off the race of smart wearables.
The key difference between 2P and 1P devices is that 2P devices are manufactured by different OEMs and have integrated the general-purpose voice assistants like Alexa and Google Assistant in them.
Examples of such devices are Smart TVs, Smart Watches, Smart Refrigerators. Some companies like Sonos, Harman Kardon, Bose even have smart speakers with both Alexa and Google Assistant built-in.
Tech giants like Google, Amazon, Apple have created their own general-purpose assistants, also known as domain agnostic voice assistants with the help of large amounts of data that is available to them. These types of assistants work across domains to do generic tasks.
Domain-specific voice assistants as the name suggests are specific to a particular domain for, eg., Grocery, Healthcare or Hospitality. They are either purpose-built for these domains and optimised for them.These voice assistants have much higher accuracy and support use cases relevant to the domain leading to better user experience. This is possible due to the boundaries of information that come with a particular domain.
Yes, voice technology has problems. Call them challenges or call them opportunities that one can tap into it, pun intended:
Privacy is a huge concern especially when it comes to smart speakers; these speakers are always listening for their wake word posing a huge privacy concern.
The crucial detail that is often missed out, is the difference between listening and recording. Once these speakers or voice assistants get activated using the wake word, then they start recording the audio.
These audio clips are sent to Google or Amazon. These tech giants have exposed these audio recordings to humans, albeit in an anonymised manner yet, a significant infringement on privacy. There have already been numerous cases where such recordings have led to privacy issues.
Voice Assistants don’t always understand what's spoken. There could be many reasons behind these- sometimes it could be because of how we speak, our accent can cause that.While sometimes it could be because the voice assistant simply doesn’t what to do with your question because it doesn’t have any instruction related to your query.
Speech recognition, perhaps the most critical component of a Voice Assistant, is not present for a lot of vernacular languages spoken around the world. Countries like India with a massive Indic speaking population, lack of quality ASR model for vernacular languages is often a limiting factor in providing good voice experience.
Currently, most of the Natural Language Processing is done after translating spoken utterances from vernacular languages to English. In this process, a lot of contextual nuances are lost or are changed.
Advancements in Artificial Intelligence, Machine Learning is truly revolutionising the way we are using voice assistants in our daily lives.
With voice now establishing itself as an ultimate mobile experience, businesses are only beginning to understand how they can integrate voice in all their activities. A recent report by PwC reveals that adoption of voice assistant technology is highest among the 18-24-year-olds. But the group that uses voice assistants most frequently is the 25-49-year-old group.
The coming time presents a lot of opportunities for Voice to grow by leaps and bounds, but lack of skills and knowledge makes it difficult for businesses to get on board with a voice strategy. If one is in it for the long run, voice will present an opportunity to understand and provide experiences to your consumers like never before. The question is, is your brand willing to jump on this opportunity?