Voice Assistant in Apps: Udaan B2B Marketplace App

An in-depth analysis of how Slang's voice assistant in Udaan app helped them make their app's user experience better for Tier 2, 3 and beyond cities.

This blog is a part of a long, descriptive and analytical series by Slang Labs, called "Voice Assistants in Mobile Apps''. Here we tear down the voice assistant and search functionalities that have been added by businesses to their mobile applications and analyse them in detail. In this edition, we breakdown the 'Udaan - B2B E-Commerce Platform' mobile application, which has integrated Slang Retail Voice Assistant, powered by the world’s first Voice Assistant as a Service platform, Slang CONVA.

We believe that it's essential to recognise the trendsetters and understand how they are adding voice assistants inside their applications to cater to the next billion users. This analysis will also help businesses understand the usage of voice assistants in applications.

We have already broken down features of voice assistants in ‘My Jio’, ‘Gaana’, YouTube and Paytm Travel.

The Indian E-Commerce market is expected to grow to US$ 200 billion by 2026 from US$ 38.5 billion as of 2017, the industry report by IBEF reports. It further states that the e-retail market registered a CAGR of over 35% to reach Rs. 1.8 trillion in FY20 in India. This indicates a sharp rise in an already increasing e-commerce customer base. According to a KPMG India and Google study, 9 out of every 10 new internet users in the country will likely be an Indian language speaker. It also reports that the Indian language internet users are expected to grow at a CAGR of 18% vs English users at a CAGR of 3%.

What does this mean for you?

These statistics imply millions of users would be coming up online over the next few years and will be a part of the e-commerce customer base. This trend is already underway, according to the data by Amazon and Flipkart, during the 2020 festive sales they saw.

“Over 666 million visits on Flipkart recorded during the Big Billion Days with over 52% of these visits recorded from Tier III cities and beyond. Along with the momentum witnessed from metros and Tier 2 cities, Tier 3+ cities have seen an uptick of 50% new customers.” - DNA India

These users generally have difficulties while interacting with the complex UIs and the English-only experience of the applications. On top of that, the apps are touch-only, which makes things several times tough for them. Top e-commerce brands like Flipkart, Amazon, BigBasket have already added Voice Assistants to reach these users.

“Amazon witnessed 91% of new customers and 66% of new Prime sign-ups from small towns; shopping in 5 Indian languages, and orders from over 98.4% of India’s pin-codes in just 48 hours of their festive sale.” - Amazon Blog

The Slang CONVA Approach!

At Slang, we recognised this early on and built a sophisticated, multilingual and multi-modal Voice Assistant for domain-specific needs. The voice assistant that we built enables your users to interact with your application via voice in regional languages. The assistant is smart enough to do more than just Voice Search. After all, it's 2021.

What does Slang Retail Voice Assistant do in the Udaan app?

1. Voice Search - Performs a search on your platform and takes users to the page with the search results.

“Show me chocolates”, “मुझे मेथी के बीज दिखाओ”

2. Voice Navigation

Show my previous order” - takes the user to the page where they can see their last order.

3. Voice Action

Add 2 Kg potatoes to cart” - adds 2 kgs of potatoes directly to the cart.

Visual Breakdown

  • Mic Icon — Position and colour of the mic

Udaan has placed the Slang Retail Assistant powered by the Slang CONVA platform, on the home screen and at the bottom-centre of the app, making it easily accessible to the user.

The mic is in the characteristic Slang theme, which goes well with the app theme. The contrast makes sure the mic icon catches your eye. This placement of the mic at such a prominent location in the app shows Udaan's commitment towards the in-app Voice Assistants.

  • On-boarding flow
Udaan Slang Voice Assistant On-boarding Flow
Onboarding flow of Slang’s In-App Assistant in Udaan App
  • Coach mark

Once a new user signs up and opens the app for the first time, they are welcomed with a prompt which says “You can now talk to your app!” with a direction to click on the mic icon to begin.

Upon tapping the mic icon, the user is shown a welcome section by Slang. This allows the user to try the assistant now or later.

  • Language Selection

Once the user decides to try out the Voice Assistant, the app asks them to select a language. Currently, Udaan has enabled English and Hindi as per to their business requirements.

Interaction with the Voice Assistant

Upon selecting a language, the Voice Assistant speaks up the commands for what’s next. It asks the user to allow the mic permission upon which it asks the user to speak out what they are looking for.

All the steps in this process are narrated by Retail Voice Assistant to improve the overall user experience. This vocal behaviour of the voice assistant makes the entire process interactive.

Slang Surface

Slang Surface inside Udaan App
Slang Surface on Udaan App

Slang Surface Breakdown

Once the user clicks on the mic icon, the voice assistant’s interface appears on the screen's bottom part. It has the Slang's characteristic green colour and animated wave movement, indicating that the assistant is listening to you. The surface is translucent, and this whole setup is done so that the assistant feels like a part of the app. 

The CONVA surface has four buttons on it to help users perform functions if, at any point, they want to interact directly with the assistant. Let's understand the functionalities of each of these buttons.

  • Language toggle

This button allows the user to choose the language they want to interact with the voice assistant. When pressed, this button opens up a pop-up on the screen asking the users to select English or Hindi.

  • Speaker Button

This button gives the user an option to mute/un-mute narration as per their convenience. This especially comes in handy when the user is present in a public setting.

  • Close Button

The cross button, when pressed closes the surface, and the assistant can be activated again after pressing the mic button.

  • Help Button

When clicked, this button with the question mark helps the user guide through the voice assistant and set a contextual reference.

Slang Retail Voice Assistant Modes

  • Listening mode

While the user is speaking, a green wave moves at the bottom. When the user speaks out the voice command, the spoken text appears in green at the bottom in real-time.

  • Processing Mode

Once the Voice Assistant determines that the user has stopped speaking based on baked in heuristic signals, the voice assistant goes into the processing state. In this state, the waves’ animation disappears and is replaced by straight horizontal moving lines synonymous to processing.

  • Output Mode/Speaking mode

In this mode, CONVA shows the text to be spoken out on its assistant surface. While it speaks out the text, the words are also highlighted on the screen for visual purposes. This makes the whole experience intuitive.

Dynamic Hints

Our user research found out that if the users are given an open-ended question like “What would you like to do?”, the users suffer from choice paralysis.

This leads to user drop-offs due to the users not able to articulate what they want. We added dynamic hints to the voice assistant’s surface. These hints come right on the top of the surface, helping the user always know what they can speak out.

These hints are dynamic and context-specific. They guide the user on how they can interact with the assistant.

“The assistant learns from the user’s behaviour and other signals and provides the most contextual hints.”

Functional Breakdown

  • Multilingual Support

Slang’s voice assistant for retail supports 2 languages out of the box — Indian English and Hindi. Tamil, Kannada and Malayalam are in beta and can be enabled in less than a month based on clients requirement. Udaan has started by enabling two languages English and Hindi for their pilot launch. This is primarily driven by the user demographics they have. The voice assistant in Udaan gives users the option to select between English and Hindi.

“A study by Slang shows nearly 25% of users who tried Voice Shopping used vernacular languages on the App’s Voice Assistant.”

India is a linguistically diverse nation. Supporting vernacular languages should be a high priority for apps that aim to target the Next Billion Users. Today, the users of Udaan can perform all the queries in natural language for both Hindi and English. Udaan doesn’t have to make additional efforts to support this functionality. We take care of it out of the box for all the functionalities. CONVA passes the entities, i.e. the input variables in English regardless of whatever language the user uses to interact with the assistant.

  • Training the User

Users from the beginning of their voice assistant journey are presented with several coach marks. This helps them understand the working of the assistant and its capabilities. Udaan in their pilot project has started with just two most used languages, i.e. English and Hindi. This is a decision based on their primary market. CONVA supports 3 more regional languages which can be enabled upon further request.

The voice assistant speaks out the reason for asking the mic permission and then pops up the mic permission dialogue. This gives the user an idea of why the permission is being requested. While building products for India, it is essential to understand the user’s needs and equipping them with options that instills their confidence in the product.

  • Voice to Action

The Voice Assistant fills in the information provided by the user through voice and takes them to the page that matches their search criteria. For eg, “Show me 1 kg Bikaji Bhujia” would directly apply a filter on quantity and show Bikaji Bhujia of 1 KG packets, if available.

This helps in reducing the thought-to-action latency that is introduced by touch. It helps in reducing drop offs by simplifying the user journey. Not having a voice to action, was one of the issues we discussed in Amazon’s Alexa in their shopping app and PayTm Travel’s Voice Search.

  • Confirmation Prompts

In Slang’s internal user research, another critical insight among the Next Billion Users was noticed. Many of these users feared doing something wrong while performing an action, that led them to not doing it at all.

To solve this, we built confirmation prompts to inspire confidence in the users by telling them what we are searching for. 

While completing a search, our assistant speaks out what we are searching for. This instills confidence among these users and reduces the fear of doing something wrong. It makes them feel comfortable with the entire experience of the app.

  • Error Handling

In case a user speaks out something which the voice assistant isn't able to understand, it asks the user to repeat the request. If the users request something which isn't in the product list of Udaan, it prompts the user about the same by speaking out.

For example, when searched for chocolates in the electronics category, it gives an error prompt and asks the user to search again.  

While trying to understand which specific product out of several options a user wants, and upon several failed requests, it asks the user to directly tap on the product they want to buy from the list.

Slang prompts for the required entities from the user automatically. There are no overheads on the app side. These contextual questions are built into the system and are different for different languages. All of them have been designed keeping linguistic nuances in mind.

Kumar, Slang Labs’ co-founder and obsessive dictator wrote a blog about it mentioning the complexities faced while building a custom In-App Voice Assistant.


We are glad to work with Udaan to enable voice functionalities with our sophisticated and multilingual Voice Assistant for Retail. If you would like to know more about this or other use cases that you can enable for your travel app, checkout Slang for Retail.