Slang Retail Assistant Better than Google Voice Search - A Benchmarking Report Analysis
Voice is to this decade what mobile apps were to the last. More and more brands now are adding Voice Assistants to their apps to effectively solve the problems of different segments of consumers.
These consumers include:
- Educated, older generations in urban areas who are not conversant with the ever-changing digital technologies and e-commerce apps
- Next 400M Users from Tier 2 and Tier 3 cities who are not conversant with the English language and are accessing the internet for the first time over a smartphone
- Urban, digital-savvy people who are unhappy with the complex and repetitive nature of the touch UI seen in online shopping and transaction apps across industries.
With the intuitive nature of a Voice Interface and with multilingual support, In-App Voice Assistants are helping brands to effectively tap into these Next Billion Users.
The most common use case brands add using In-App Voice Assistants is Voice Search. One of the most common questions that we’ve heard from our customers at Slang Labs is how Slang In-App Voice Assistant compares to Google Voice Search.
So far there have been no comparative studies between these two offerings, and the impact they have on voice-searches performed on the brands’ apps. So we set to work, performed a comparative analysis between these two voice search offerings to try to answer these questions.
In this blog, we have tried to briefly explain the methodology and the setup we have used. A complete in-depth report which covers all the nitty gritties can be downloaded from here.
Why we should care about the Benchmarking Report
- Voice eCommerce is growing
- Voice is the biggest unlock of the decade
- It is what mobile apps were to smartphones 10 years ago
In the benchmark report, we ran a benchmark analysis on over 3000 audio samples, from 10+ states, of over 100+ retail items in different ways. The benchmarking found that the Voice-search performance of Slang Retail Voice Assistant, when compared to that of Google Voice Search, was -
- 46% better overall
- 45.9% better for Search Quality and
- 11.69% better at Speech recognition
The Slang Retail Assistant is optimized for the retail domain and has the ability to recognize product types and other details specific to the retail domain accurately.
It also uses a sophisticated built-in NLU engine to precisely map voice utterances for search into search queries. The Slang Retail Assistant is currently being used by multiple retail brands in India, such as Udaan and Big Basket.
While there is no such distinct offering from Google called Google Voice Search, many Android apps commonly implement a voice-search widget. The widget collects voice utterances from users and converts them into text using Google’s speech recognition service. This widget and the recognition service powered by Google will be collectively referred to as Google Voice Search in this report
Experiment and Analysis:
The experiment was designed in such a way so that we could supply the same input to both voice-search offerings under identical conditions, collect the output, measure the performance of each offering and compare the performances.
We used English utterances of 100 popular retail items as they would be spoken into a voice-search engine and collected 3000 samples of audio data from 10+ states to cover variability in dialect and accent. The utterances were distributed across varying lengths, descriptiveness and verbosity as we tried our best to have a diverse array of input so as to effectively test both of the voice search offerings.
Now, let’s talk about the setup -
We used a Mac Mini and connected it to an external speaker via Bluetooth. For the android device, we used a Samsung SM-A260G. Lastly, we used an open-source grocery store demo app (VAMO) with both Slang Retail Assistant and Google Voice Search integrated. For running this experiment, we built a driver script that would run on the Mac and communicate with the app. The driver plays the audio on the external speaker, which is then picked up by the voice search. The app then collects the results from the search and then sends them back to the driver for further processing.
Here’s a diagram describing the setup -
Our scoring formula needed to accurately reflect how well the voice search performs in converting an input utterance into a valid search query for the app, which further results in a successful search. To achieve this, the formula needed to factor in two abilities of the search engine:
1. Accuracy of speech recognition:
Speech recognition is vital to any voice-search offering and this is why having a voice-search offering with high accuracy of speech recognition is essential. Speech Recognition is the very first step in converting voice input into something that can be searched. For example, it may be common for a speaker to pronounce the term “corn flakes” in a way that sounds like “cornflex”.
This is where you observe that the Slang Retail Voice Assistant excels because it is optimized for the retail domain. When dealing with retail-specific utterances such as the above mentioned “cornflakes”, Slang RVA perfectly-recognized 67% of the utterances while Google Voice Search which cannot be optimized for a specific domain only managed 57%. Slang RVA also managed to recognize all the audio utterances while Google Voice search did not recognize about 8% of them.
2. Effect on search quality:
Passing the search term directly is inefficient, for example, it’s natural for someone to say “show me organic onions” while trying to search via voice. But if the entire utterance is passed then the search isn't optimized and most search engines are unlikely to be optimized for extraneous words such as “show me”. The Slang RVA is aware of this and using its built-in NLU engine will only pass the relevant part to the search engine. From the above example, only “organic onions'' would be passed thus resulting in an optimised search. This is why Slang RVA scores a staggering 89% in successful searches while Google Voice Search manages a 67%.
With these requirements, the following scoring terms and formulas have been proposed:
- Speech Recognition Score (SRS): This score is a value between 0 and 1 that captures the accuracy of speech recognition.
- Search Quality Score (SQS): This score is a value between 0 and 1 that captures the effect of voice search on overall search quality.
- Voice Search Score (VSS): This is the final score that is a function of both SRS and SQS. The idea behind combining these two scores is that a search is likely successful only when the voice search performs well in both categories and not just one.
Applying the scoring formula described above on the results obtained by both voice-search offerings gives us the following results:
Google Voice Search
Avg Speech Recognition Score: 0.77
Avg Search Quality Score: 0.61
Avg Voice Search Score: 0.54
Slang Retail Assistant
Avg Speech Recognition Score: 0.86
Avg Search Quality Score: 0.89
Avg Voice Search Score: 0.79
Based on the Avg VSS scores obtained above, we can conclude that the overall voice-search performance of Slang Retail Assistant is around 46% higher than that of Google Voice Search.
Slang RVA is also 11.69% higher in speech recognition and 45.9% higher in Search Quality. This means Slang RVA not only recognizes your voice better but also searches using your audio input in an optimized way so that you get a better overall search result.
Want to read the full report to learn all the details? Click Here Now!