How does the assistant builder work?

Go in-depth into understanding how the assistant builder works. Blog 4 of the assistant builder series.

This is Part 4 in a series about the Slang Assistant Builder.

Follow these links to read the previous parts:

Part 1: Natural Language Processing for Voice Assistants 101

Part 2: Slang Assistant Builder helps you forget about intents and entities

Part 3: Slang Assistant Builder vs Building from Scratch

This blog goes into some of the low level details of how the Slang eCommerce domain is able to incorporate your specific data and allow for customization while retaining the larger domain knowledge. Feel free to skip this blog if the details become too deep to understand. Also, at the time of writing this blog, while we plan to expose many of these as friendly interfaces that can be used in a self serve manner, we currently have tools that we use internally at Slang that have implemented this idea, and we will be describing our experiences from them.

Traditionally, to configure the NLU system we create what we call a schema. The schema is a declarative structure that encompasses intents, entities, utterances, prompts, translation hints and ASR hints. See Part 1 for an understanding of what these terms are. Using a schema, we are able to create a general purpose NLU processing system at the back, while easily serving multiple apps and domains at the same time. This is a similar concept used across most frameworks for conversational models.

The parts of the schema, as described before are:

  • Intents
  • Entity Types
  • Utterances
  • Prompts
  • ASR hints
  • Translation Hints.

Visit the Slang Docs for even more depth than given in this blog series about intents, entities and utterances. However, there is enough in Part 1 and Part 2 of this series to understand this blog further.

The guts of the NLU system is designed to accept the first three elements, train, and create a model. When presented with a sentence at inference time, it will classify it into an intent and tag the entities and reply to the SDK this information, and in case there is an entity that the user has missed speaking, it will also let the Slang surface know which entity has to be prompted for.

In the previous blogs we showed that there is a need to allow repeatability across apps in the same domain, but allow for app specific customization.

How do we manage this while still maintaining the schema model that is required by the NLU backend? The assistant builder is the answer.

The assistant builder takes in something we call the metaschema, the knowledgebase, the customer-specific data and a customer assistant configuration and outputs the schema that can be used by the NLU system.

Since intents don’t necessarily link to a function, as explained in Part 1, to the user we have abstracted out intents into a paradigm called use cases. For eCommerce some of the use cases are:

For each of these use cases, the meta schema will have knowledge over what intents and utterances are required for the schema. It may be noted that all apps may not need or want all the use cases implemented in the voice assistant. While using most packaged solutions, if a user would like to remove certain use cases, the user would have to understand which intents and which entities are linked to which use cases and remove them as necessary. However, with the assistant builder, it is as simple as filling in a configuration, ticking off which use cases are to be included in the assistant and the assistant builder will take care of including the correct intents and entities

For those of you from this background, you may ask, but how will you generate the utterances needed? The assistant builder will generate utterances for the same, using meta-utterances, which are nothing but utterance templates that are used to create example utterances that will be consumed by the NLU.

In the above set of use cases, two types of special data is required from the customer. The first is the list of SKUs (for the search use case) and the second is the list of filters and possible values. Given these values, the assistant builder will include all of these values for recognition and also intelligently look up into our knowledge base for alternate words, or synonyms or translations in other languages, and augment the schema for the sake of NLU.

See the previous blog, if you haven’t already for reasons why by augmenting customer specific data onto the assistant the accuracy could be increased.

Another important area of assistant building is conversation design. There are many facets to designing the prompts. 

There are welcome prompts that invite and teach a user who may not have seen a voice system before, how to use it and what to say. 

Entity prompts prompt a user to give entities that they have missed before. 

Error prompts that inform the user that the assistant wasn’t able to understand them and helps them learn how better to speak to the assistant. Also, it is important to have many versions of the same prompt, so as to not bore the user. 

Along with having multiple versions of the same prompt, we also have multiple levels of prompts. In case, after being presented with an error prompt the user still makes the same mistake, then having multiple levels of prompts help the assistant to guide the user with increased verbosity.

We have finessed our conversation design creation process to best optimize usage and engagement. We have also created a library of such prompts in other languages. With the assistant builder you get all of this out of the box, while still allowing customers to change prompts and customize them if they need to.

Along with the data, our knowledge base also has a library of domain specific ASR hints and translations hints. These help tune the ASR and translation systems so as to eke more accuracy out of them for the domain. Using the data the customer supplies along with the internal library, we can achieve not only high accuracy for the customer’s data but also high coverage over many words that the customer may have missed.