Voice Assistants in Ecommerce Apps | Panel Discussion hosted by Slang Labs

An insightful panel discussion on Voice Assistants in eCommerce Apps with the top players of the Ecommerce Industry who have built and added custom voice assistants to their apps.

Slang Labs organised a panel discussion on ‘Voice Assistants in E-Commerce Apps’ on 16th September.

The objective was to explore how the biggest eCommerce platforms of India such as Flipkart, JioMart, Amazon and Big Basket are adopting Voice Assistants as a medium to power the next generation of shopping experience in India and the world to bring the Next Billion Users into the Digital Shopping World. So, we got the who's who of the Voice Assistant domain to join us as panelists - and these are people who have actually designed and implemented Voice Interactions technologies into some of the biggest eCommerce platforms in India!


Aakrit Jain - Co-Founder @Haptik. Haptik is integrated with Jio Platforms recently where Aakrit is powering the adoption of Voice Assisted shopping in Jio Mart.

Nandini Stocker - Senior Product Design Manager @Flipkart. Nandini was the former head of conversation design advocacy and partnerships at Google and is currently heading the team that is tasked with the mission to onboard the next 200 Million users onto the Flipkart Ecommerce Platform - and she sees voice as one of the ways to succeed.

Anupama Kumari - Senior Technical Product Manager @Amazon. Anupama is heading the team that is powering Voice Shopping on Amazon India Ecommerce platform.

Kumar Rangarajan - Co-founder & Obsessive Dictator @Slang Labs. Kumar co-founded Slang with a vision of helping the digitally-challenged users to cross the chasm.

Bret Kinsella - Editor @Voicebot.ai moderated the entire discussion between these stalwarts of Voice Assistants in Ecommerce domain.

The Discussion:

Ved, the MC for the day, kickstarted the session with an introduction to the power panelists and handed over to Bret for taking the discussion further.

Bret kickstarted the discussion with a question on

Whether there was a difference between Chatbots and Conversational AI?

The discussion went on between Aakrit, Nandini, Anupama on the fundamental differences in the way both are designed as well as the necessity of maintaining both user and agent contexts to ensure an accurate response and conversation experience is delivered by the system. You can read about the differences each of them have here.

From there the challenges of designing the conversational AI were discussed and the topic then moved on to the difference between a Voice Assistant and adding a voice interaction capability to an app. The constraints of commonly available Voice Tech enabling channels such as Google Assistant, Alexa, Siri were debated and the conclusion was that the brand experience suffered at the hands of channels and that the actual Voice Assistant would be more subservient to the brand experience from a user point of view.

To that Anupama mentioned that

the biggest hurdle in adding a Voice Assistant in Ecommerce Apps is that the experience has to be delivered on a touch-first device such as mobile phones where a multimodal interaction design is necessary to make the voice addition seamless for the end users.

More importantly, deciding which features need to be voice enabled and how to decide on such prioritizations becomes the biggest challenge for product managers as this is a totally new territory for product design and consumer experience.

This was corroborated by Nandini, who mentioned that the primary research performed by her teams at Flipkart actually showed that smartphone adoption is very high in the non-English speaking parts of the country, but it was still a major hurdle for them to use touch & type interfaces.

Aakrit also mentioned how most of the websites are built English-first and their translated versions in regional languages are really hard to use for those users. So, overall, the panelists concurred that Adding Voice Assistants can be a major advantage in a richly diverse market like India with its multiple regional language users.

At this point, a poll of the audiences on the webinar revealed that the "Voice Search is the most sought after feature" for 56% of those who voted!

56% voted saying Voice Search is the most important feature for a voice assistant.

Then the discussion moved towards why USA was slow in adopting Voice Technology in eCommerce even though it was one of the first places where a voice interaction capability was introduced in shopping apps such as Amazon.

Further the debate went on to question whether the Indian market was more amenable for a faster adoption of voice assistants in eCommerce than the USA. All the panelists had some very good insights about how financial apps and one of them was that the adoption in India could have reduced the barrier for adopting voice-based commerce here as it reduces the consumer's hesitations about privacy concerns. Others agreed that the multilingual market in India, who form more than 300 Million new customer base that is untapped currently, could be a driver for faster adoption of voice tech in shopping.

Bret chimed in that the Google Assistant and Alexa would actually lead to the evolution of a million Voice Assistants soon. Aakrit mentioned how the whole voice integration was like a Vitamins v/s Painkiller kind of a choice and that currently it is more like a Painkiller - and hence absolutely necessary for the eCommerce!

Kumar also chimed in and said

Voice Assistant is neither a Vitamin nor a painkiller but is an antibiotic

It takes time for antibiotics to act but it is necessary to take the whole course of antibiotics to see the results. Antibiotics are actually more important than even painkillers because they are responsible for removing the problem (in this analogy disease from the root) rather than just a temporary solution.

The session was almost ending and then Aakrit bowled a googly to Bret by asking -

Why do you think that the Voice Assistant narrative in USA has become stagnant?

Bret had a very strong view that the Voice Assistant narrative had not really stagnated in the USA and that it was just taking a little time. He said that he had worked in the Retail for more than 15 years and that the underlying complexity of the industry was grossly underestimated. So, he was of the view that it will take time but the voice assistant will always have a major role to play in the future of Ecommerce and Retail in the USA, as it would in any other part of the world like India.

Ved delivered his vote of thanks to everyone and the session was drawn to a close with great gratitude to the moderator and all the accomplished panelists on board.

Audience Questions:

Will the chatbot be quicker than voice?

Chatbots are typically used to get answers to questions rather than explicitly complete transactions. Voice Assistants on the other hand are more focussed on helping the user complete operations. The primary input for a chatbot is text, while Voice Assistants by nature are voice first. But chatbots can also have voice inputs, similarly, Voice Assistants can also take text inputs. 

The boundary between a chatbot and Voice Assistant is vague as both can technically help do the others job. But from an experience perspective, chatbots are typically implemented as a separate window, within which the conversations happen. They don't interact with the main app itself.  Whereas Voice Assistants are meant to help complete transactions on the app itself and are typically multimodal. Ie some part of the journey is via Voice and some part of it is directly with the visual layer of the app itself. 

How do we deal with failures during voice conversion with the app where users typically are not forgiving?

Users' behaviour is typically based on the expectation they have of the system. If the user understands the value of the feature or the system, they would be willing to tolerate its limitations, because the value it offers outweighs its limitations. 

With respect to Voice conversations, the biggest problem is the user now knowing the bounds of what is possible and not possible. Since Voice, unlike Visual inputs, allows unrestricted inputs, the chance of failure is quite high. So helping the user understand the scope of Voice is key. This can be achieved by the following mechanism

  • The point or way the user triggers Voice is one good way to set the context. Eg if the Voice Assistant is next to where the user typically searches for something, he or she knows this assistant is helping them search for something. 
  • But many times, the placement of the Voice Assistant is global because the assistant can do multiple things. In those cases, the Assistant should still be optimized for helping the user with the most common task, at least initially when the user is getting used to the Assistant. Ie the Voice Assistant should ask a very pointed question to the user when they enable it. Eg: “Which place are you travelling too?” or “What item do you want  to buy?”
  • Now even after asking a pointed question, there is no guarantee the user is going to speak an expected answer. When failures do occur, assistants could try to coax the users to retry their request, but this time, guiding them more precisely about what they could say to have their request fulfilled.  Eg: “Sorry I did not understand. Please mention the destination city, e.g. Bangalore”. These prompts can be at all escalating level and should be very contextual so as to not appear very canned and dumb.

Will voice-first apps have issues with respect to privacy?

Depending on how the Voice Assistants are implemented, and how much focus they have on preserving user privacy, voice-first apps may not necessarily have more privacy issues. There are two types of Voice-first apps. 

  • Apps which rely on an always listening, general purpose Assistant (like Alexa or Google Assistant) and integrate themselves inside that. Now these apps might have a higher degree of privacy issues because they might get triggered unexpectedly or because the entire transaction is being passed through the assistant. 
  • Apps which provide an In-App Voice Assistant (either built on their own or using third party Assistants like Slang), the customer is typically using Voice in a very purposeful way. Also the Assistants typically are used only to understand what they user wants and help them perform the same actions they would have normally performed on the app. So there are no additional privacy issues here. But in some apps, the user might not feel comfortable talking to the app at certainly points of his or her day. In those cases, by ensuring that the user has a non-voice modality of doing the same, will put the much needed choice in users hand

In India, there are multiple languages and multiple dialects can this be solved and if yes how can this be solved?

Of course, multiple languages and dialects continue to pose a challenge to speech recognition systems. However, two things are happening continuously - speech recognition systems are getting better at recognising more languages and more dialects, while users who are excited by voice also try to adjust their speech to the best of their abilities to allow the speech recognition system to work for them. Also by smartly using the context of the application, a Voice Assistant can improve its recognition ability significantly. 

Does the accuracy of ASR affect the overall adoption of voice in apps? For user’s with bad experiences, what are some ways to get them to retry voice as an input mechanism?

Like mentioned in the previous answer, Voice recognition quality in general is getting better constantly. Also the limitations of ASR can be managed by knowing the context of the app and improving the language model of the ASR to be biased towards the need of the specific app. 

But that said some users would find the Assistants not working at all for them. For such users, Voice Assistants can help them maximize their success probability in multiple ways - 

  • Provide conversational short-cuts that are clickable. “Check my order” can be shown as a bubble and the user can click on it and the Assistant can treat it similar to the user speaking out the same command.
  • Help them to speak with pauses and longer durations to get them comfortable talking.

How much impact does a voice-enabled system have on the e-commerce platform performance?

This is still early days but from our usage across multiple apps, we are seeing increased search to add-to-cart ratio for a voice user vs a typed user. The improved accuracy and multi-lingual aspects of Voice Assistants play a big role here. 

What is the best way to get real feedback from the users on how they are receiving the voice conversation so we can learn along and make it better?

There are two types of feedback one can get - 

  • Implicit feedback: Where by tracking what the use does immediately after using voice would give feedback about whether the input was right or wrong. Eg if the user search for something and it was wrong, the users next operation (after retrying with voice) would be to type out the same input. 
  • Explicit feedback: Users are also interested in making your app successful if they are sold about the value it provides. So the assistant can collect explicit feedback by showing them a “thumbs up/down” after a command is executed. This should not be over-done and should be collected minimally. 

What principles are you thinking about when designing risk-rewards for the next billion shoppers using voice, given we are a culturally diverse nation?

From a Voice Assistant perspective, the reward is primarily as an enabler of transactions for the user and being a productivity tool. 

“This app is friendly and understands me well” is the positive emotion that Voice Assistants can evoke if they work well. 

What used to take them X minutes, if it can be brought down to Y minutes, where Y - X is big enough to matter. Communicating this benefit to the user and actually making them realize it is the key.

Slang Labs provides out of the box, domain specific Voice Assistants which can be integrated into any eCommerce or travel app in hours. These voice assistants are  specific to the items and services offered by your brand.

Want a bigger piece of the pie, head here: Voice Assistants in eCommerce Apps| Webinar