Slang’s Assistant Builder - helps you forget about intents and entities

Using Slang's Assistant is an abstraction over intents and entities, which helps users set up the assistant without needing to understand too many NLP concepts. Part 2 of Slang Assistant blog series.

This is Part 2 in a series about the Slang Assistant Builder.

Follow these links to read the previous part or to skip ahead:

Part 1: Natural Language Processing for Voice Assistants 101

Part 3: Slang Assistant Builder vs Building from Scratch

Part 4: How does the Assistant Builder Work?

Nowadays, to voice-enable apps or IoT devices, or even have a natural language interface such as chatbots the many frameworks out there provide a similar layer of abstraction which involves intents, entities and prompts. However, for a person who is not well versed in these or doesn’t have a background in natural language solutions, it can be hard to eke out good performance from these. Except for the very basic intents, most other intents take a mastery over cleaning up of data, creating intents, listing down possible utterances,  mapping the intents  to the use cases, creating prompts and multi-level prompts etc. It can take time to understand and implement. You can probably use off the shelf packaged language models, but they don’t allow for customizations or you’d need to understand all of these concepts to customize anyway. See the table below for all the processes that are needed to make a good voice-enabled app.

The steps taken in traditional ways of adding voice assistant vs Slang Assistant builder

We have learned that voice tech is in a space where there is a lot of interest in voice enabling your apps, however, there is too much overhead in learning all these aspects of the actual implementation.

So the question we have asked ourselves at Slang is, how do we help a customer voice-enable their apps in  a professional manner without having to necessarily understand all of these new paradigms such as intents and entities etc.

The answer we have come up with is domain-specific assistants and the assistant builder. See this blog for different e-commerce use cases voice can be enabled on, using our assistants. But, how does Slang make an abstraction over the intents and entities layer? For the e-commerce space, we have recognized some of the use cases that are present in an e-commerce platform as we have shown in Table 1 of the previous blog. Across many e-commerce apps, we have seen that these patterns repeat. In the pure intents and entities model, every customer is forced to reinvent the wheel and understand intents and entities from scratch, however, we have realized that while most aspects remain common, only certain aspects like the specific data of the customer’s product list, and the filters that they support change from customer to customer. Therefore, we have created an intelligent layer that will ingest customer’s data and few configuration flags and auto-create the intents, utterances and other elements that are to be consumed by the downstream NLP layers.

In a traditional interface to platforms that help build conversational systems, one would have to list the possible values of the entities, and specify example utterances for intents. The NLU system will use these utterances and entity values to perform the classification and tagging. Since many of these use cases repeat we have found that we can internally create a library of utterances that constantly keep getting better, and share them across eCommerce apps. These are unique as they will not have any company sensitive data while sharing but still get all the benefit of being the best in the domain by serving so many apps. By its sheer nature it would be more elaborate than any one app built one off by a domain player. 

By bringing most elements to the table and allowing the customers to augment their data, we benefit from both general data across apps from the same domain and also data that is specific to the customers.

It is not just intents, entities and sample utterances

Even though most would think first of only the intents and entities when building a conversational system there are otherwise many more aspects to it:

What would be the prompts that would be asked in case a user misses some necessary data?

What would be the prompts that would help the user understand where they are making a mistake when using the voice system

When interacting in another language, we need to use translation systems. However, many MT translation systems still need purpose built translations to be added to improve accuracy.

The ASR system needs a purpose-built list of words that will help bias the models and improve the accuracy.

How would you interact with SKU list or data list when searching for items

How would you allow the user to interact using voice and touch at the same time? 

And many more.

By using our system, we have thought through many of these problems and offer simple interfaces and APIs that help you get the best of both wider domain knowledge and data that is specific only to you, without having to reinvent the wheel or use only off the shelf packaged language models that feel like they don’t really understand your domain or geography well.

Read the next part here.