Voice Assistant in Apps: Jio Mart
This blog is a part of a long, descriptive and analytical series by Slang Labs, called "Voice Assistants in Mobile Apps''. Here we tear down the voice assistants and voice search functionalities that have been added by businesses to their mobile applications and analyse them in detail. In this blog, we breakdown Buzzo, the voice assistant that powers the JioMart application. Let’s get into the nitty-gritty, and there is a lot to break down.
Mic icon in Standby Mode
The mic icon is in the top right corner of the screen lying beside the search bar. This increases the visibility of the icon and indicates that the mic icon can alternatively be used to search.
JioMart Voice Assistant Surface:
There are three icons on the JioMart Voice Assistant surface -
- Settings and
Jio has used the same icons like that on the My Jio Voice Assistant. We did an in-depth breakdown for that app as well.
Keyboard icon is placed to the right of the mic button. This allows the users to input text to interact with the voice assistant. It is important to note that unlike chatbots, where voice input is optional, this Jio assist is a Voice Assistant with optional keyboard input.
Settings icon is placed on the left side of the mic. The settings menu allows the user to turn on/off the sound of the voice assistant, provide feedback about the usage of the assistant, seek help and close the Voice Assistant window.
On clicking the mic button, the voice assistant window pops up in the lower part of the screen. Upon interaction, the window’s height remains the same to maintain uniformity.
User’s utterance after cleaning is displayed on the screen in bold formatting.
JioMart’s Voice Assistant in its beta phase had a different approach to this. The voice assistant interface opened up its own window rather than continuing on the same screen. The current version is a better approach and helps the user stay on the current screen and doesn’t take away the context from them.
Comparison with Slang e-commerce Assistant’s Visual Experience
The experience provided by Slang’s e-commerce voice assistant differs here. Instead of taking a user to a completely new voice experience, we continue the conversation on the same screen.
This helps the user to be on the same page and still get the context from the app. Slang provides this experience to enable a more inline multimodal interaction allowing the user to interact with the app while the voice Assistant is active.
Fun fact: At Slang, the first Voice Assistant prototype that we built had a similar design too. It showed the entire conversation.
After months of on-field research and continuous iterations over three years, we think there is a better way to design an in-app voice assistant.
With this feature, users can say a list of items in one go. It unlocks a lot of time saving by reducing the repetitive search. This functionality is currently not supported by the voice assistant. We think this is a super cool feature for your power users. These users know exactly what they are going to order and want to save time. It can be super handy for them.
In the background, NLP has to break down a long sentence into appropriate entities with precision. It’s quite a complicated feature to implement when you think of all the possible permutations possible with all the different items.
We provide list processing out of the box with Slang e-commerce assistant as well.
We think this is one critical area where Jio Assist lacks. Voice Assistants should be able to have natural and contextual two-way communication with the user. This is one of the essential conditions for being a voice assistant.
While searching for a product, the assistant would show all available items and won’t continue the conversation forward. This leads to an awkward pause where users expect the assistant to ask “Which one would you like to buy?”
Jio Assist is not conversational, rather instructional. It feels very robotic and transactional. There has been very little focus given on the conversation design. Instead of being polite to the user, the user is being told what to do.
Actions based on Quantity
The voice assistant is capable of understanding the quantity from the user query. When spoken, “Add 12 bananas to the cart”, it performed the desired action accurately. This functionality was missing in the early looks of the assistant that surfaced a few months back.
Unique Item flow
In case the user has mentioned enough details like the brand, and specific item name, Jio Assist is smart enough only to show one result. For, e.g., Jio Mart only carries one type of papaya from one vendor. Hence it just showed one result.
Non-Unique Item Flow
When JioMart Voice Assistant is not able to uniquely identify one particular SKU, the user is asked to make a choice using touch. Alternatively, users could say “Add the first one to cart” to perform the desired action.
Limitations of Jio Assist
Here is a list of things which we think are missing and should be added to Jio Assist.
- Good conversational design:
The assistant goes silent after performing an action which creates an awkward silence. Upon searching for the products, the assistant doesn’t continue the conversation.
- Auto Prompts:
Today the user has to click the mic button to filter explicitly amongst a bunch of options. We built auto prompting in Slang where slang prompts the user to tell about the missing fields.
- Support for other regional languages:
Today, the voice assistant supports English and Hindi. We hope to see the voice assistant supporting regional languages in future.
- Too slow:
Currently, the experience is too slow. It takes anywhere around 1-5 seconds for the voice assistant to trigger based on internet connectivity. On mobile internet, it took the longest to launch the assistant window.
- No onboarding:
There is no onboarding process to help the users understand the flow of the assistant and its functionalities. We suggest adding prompts when the user uses the assistant for the first time.
- Automatic disambiguation:
Jio assist should be able to disambiguate items smartly. This is one of the crucial lessons we learnt at Slang with our user research. Touch-based systems have implicit confirmation, but the same is not valid for voice.
When users try to order using voice, they often miss telling critical attributes of the products, especially if they order them often. Eg brand of milk or size of the Maggi packet. While ordering via touch-based systems, users get to see and make these decisions.
While ordering multiple items in a list, the voice assistant should determine the correct item smartly using various heuristics and business logics. This will cut down the time required to complete a transaction as well.
If you want to add an in-app voice assistant, we have built voice assistants for e-commerce and travel domains. They can be integrated with your android app and website in a matter of hours. If you want a voice assistant for any other domain, let’s talk.