Bhārat Bhāṣā Stack will catalyze Voice Assistant and Conversational AI innovations for vernacular Indic languages as India Stack did for FinTech.
A decade ago, it was unimaginable.
That one would pay a street vendor in a nondescript small town in India by scanning on mobile a QR code hung on his cart. Even for an amount as little as 50 rupees (less than a dollar).
That there would be many mobile apps and payment wallets from banks and non-banks. All seamlessly interoperable. Any two parties would transact by sharing an email like wallet-address. Without paying any transaction fee.
That myriad of small businesses would send catalogs on WhatsApp. Deliver goods to your home. Accept digital payments at your doorsteps. Without having to build a website or payment gateway.
A decade ago, cash was the king.
A decade ago, it was unimaginable.
But it happened. Thanks to the India Stack. The digital infrastructure for authentication, payment, and authorization. It started in 2009.
India Stack is a set of APIs that allows governments, businesses, startups and developers to utilize a unique digital Infrastructure to solve India’s hard problems towards presence-less, paperless, and cashless service delivery.
Unified Payment Interface (UPI) is at the heart of the cashless instantaneous payment. UPI became the digital payment gateway for small businesses. It eliminated an entry barrier that only big enterprises could afford. India Stack catalyzed FinTech innovations.
Bhārat Bhāṣā Stack can do for Indic language tech what India stack did for FinTech.
Right now, it may be unimaginable.
That anyone at any remote corner of India will harness the power of the Internet. Across linguistic and socioeconomic groups. Even if they can’t read or write English. By talking to mobile apps. In their own language.
That any business will be able to bake voice assistants in their apps for masses. In all Indic languages. With affordable data sets, AI models, and services.
Right now, type and touch are kings. But Bharat is discovering its voice. Voice searches in Indic languages have been growing.
So it can happen. If we build the Bhārat Bhāṣā Stack. The tech ecosystem for conversational AI in Indic languages.
Bhārat Bhāṣā Stack can become voice and language API for small businesses. It can eliminate another entry barrierthat only big enterprises can afford. It can catalyze Voice Assistant and Conversational AI innovations.
Bhārat Bhāṣā Stack can unleash the next wave of unimaginable innovations across India’s linguistic and socioeconomic boundaries to more than a billion people.
The rest of the article:
Conversational AI is essential, but takes huge investment.
Conversational AI makes machines communicate like a human. From research labs, it is now reaching consumers’ hands. It started as standalone voice assistants like Siri. It is progressing towards being in apps and devices in several forms.
Voice Assistants are intelligent virtual assistants that interact using voice. These are also called voice bots, especially when delivered through a voice-only interface. For example, interactive voice response (IVR) systems as customer support voice bots.
Siri was the first famous voice assistant. Then came Alexa in Amazon Echo devices which, among other things, made it possible to buy things Amazon. The next entrant was Google Assistant. It first came as Google Home devices and later in Android phones.
It progressed from being amusing to being useful, even if in a closed and limited ecosystem.
The next logical progression was to make it available in apps. Allow programmers to integrate voice commands to trigger specific app actions. Amazon did this with Alexa for Apps, and Google with App Actions.
Voice Search has emerged as a common use case. Almost all apps have some kind of search:
Though all are a kind of search, each works with a different category of world knowledge.
Voice Actions in apps have serious limitations. The voice journey ends as soon as it starts. Once the assistant invokes an app, users can interact with the app only through touch. That prevents building rich voice experiences suitable for an app.
That’s why several apps built a Voice Assistant inside the app (instead of their app hidden behind Alexa or Google Assistant). Gaana, YouTube, Paytm Travel, My Jio, Amazon, and Flipkart apps have optimised Voice Assistants for their domain.
Building these optimised assistants requires deep pockets. It takes significant investment, effort, and time to build. Most of these apps support English and Hindi. Broad support for most Indic languages is missing. Bhārat Bhāṣā Stack can make these technologies accessible to smaller entrepreneurs..
An open Bhārat Bhāṣā Stack can lower barriers of communication, costs, and spur innovations.
Bhārat Bhāṣā Stack should have a set of models, services, and SDKs for building conversational apps in Indic languages. It should include speech, language, and vision technologies needed for building voice applications. The stack layers should offer convenient entry points to use the technologies. That will make it easy to build chatbots, voice bots, voice assistants, and applications.
Voice Assistants mimics human actions:
Other Conversational AI tasks are:
All these need a machine learning technique called Deep Neural Networks (DNN). Building and training DNN is very expensive as it requires a lot of data and computing time.
To summarise, Bhārat Bhāṣā Stack spans across all three types of problems that DNNs are good at solving:
Application developers should be able to focus on only their business logic. Bhārat Bhāṣā Stack should provide hooks to conversational AI tech for the rest of the steps in voice assistants. This section describes various layers at which applications can hook in.
Scripts for a language are encoded using the Unicode character set. Information exchange between Conversational AI technologies happens using these character sets.
Indic languages are phonetic, i.e., the spelling of a word is the same as its pronunciation. This characteristic might allow using speech data for training across similar languages.
The stack should utilize and address the uniqueness of Indic language speakers:
Bharati Script, developed at IIT Chennai, is designed to be a common script for Indic languages. The work shows that Indic languages can be transliterated using one-to-one character set mappings.
The availability and cost of the data is the biggest obstacle for most entrepreneurs. Curated data sets for speech and language are the lowest layer in the stack.
Many renowned institutes have been collecting data on Indic languages for their research. These institutes have hundreds of hours of data. But the data as well as the knowhow to use it is lost over time.
After a student graduates, thesis data is like grandmother’s precious jewelry box. Few know where it is, and nobody ever opens it. — A professor @ IISc Bangalore
For Speech Recognition, some available data sets:
For Natural Language, some available data sets:
Consolidating and establishing a LibriSpeech-like data set for research and development in Indic languages will pay rich dividends.
Having data is the first mandatory step. But it requires a high level of expertise to train DNN models. It is costly too.
Making pre-trained ready-to-use models available for Indic languages is the logical next step. Privacy sensitive applications can either use these models on the device or host it as service on an on-premise private cloud.
SaaS frees developers from hosting the model and managing the service infrastructure. It makes it easier to start building applications.
All major cloud infrastructure provides have SaaS for speech recognition, natural language understanding, and text-to-speech for some Indic languages.
These services are quite expensive (just as payment gateways were). Having more SaaS providers using these pre-trained models can bring the cost down.
SDKs in popular programming languages and OS platforms form the final layer. SDKs can use the models or SaaS.
Tuning a model or service for an application domain requires some ASR and NLU expertise. Domain-specific SDKs(e.g. for banking, e-commerce, agriculture) are needed to further reduce the entry barriers.
We at Slang Labs have learned this from our customers. We now offer Voice Assistant as a Service (VaaS) to improve customer adoption of Voice Assistants.
It will take systematic and sustained collaboration to design and build the Bhārat Bhāṣā Stack:
The government has been proactive in formulating AI policy:
NASSCOM and FICCI have been conducting workshops bringing companies and universities together. Slang Labs has been an avid participant.
Voice-enabled applications can bridge the internet divide across diverse linguistic and socioeconomic groups. Voice and natural language technologies are maturing, but remain prohibitively expensive.
Data and model training costs are formidable obstacles for entrepreneurs and small businesses. Bhārat Bhāṣā Stack for vernacular Indic languages will remove that entry barrier.
This article outlines the stack, and the components we need to build. We need to:
India Stack for FinTech succeeded because everyone, including the government, came together. Building Bhārat Bhāṣā Stack together for India’s unique needs is pivotal for its success and widespread adoption.
Here is a video of a domestic helper struggling to find her account balance in the SBI app. She hesitantly tries everything though the right button is prominently there on the first screen. Bhārat Bhāṣā Stack can make her life a tad easier.
Let’s do it!