A case for semi-automated feedback for template improvements
Listen to the blog here:
Slang Blueprints is a blog series that covers systems that we, at Slang Labs, have been thinking about for some time now and are building currently.
The first series of blogs that should have technically been categorized under Slang Blueprints are the blog series that made a case for and described an assistant builder that helped create assistants using predefined templates, yet allowing for some customization such as the data and prompts. Within 4 months, that was what became Slang CONVA. The set of domain specific assistants and the corresponding platform that went with it. See Slang CONVA's introductory blog , for an overview of what Slang CONVA is, and the assistant builder blog for some of the intial views of the design principles behind Slang CONVA.
One of the driving forces of Slang CONVA was that the templates that were available out-of-the-box should be as complete as possible for most cases. Even though we allowed for customization, our aim has always been that most customers should get what they need out-of-the-box and reasons for customization should be very specific. This directly meant that Slang was signing on to take on the complexity of building the actual template and ensuring its completeness.
The number of requests can easily go into the hundreds of thousands, or even more, very quickly soon after launch. If a human had to go through each of them, they would first have to use their cognition to decide whether the outcome (intents classified and entities extracted) were correct or not, then they would have to reason a correction for the utterance if it were wrong. The fix could be anything from correcting code in the NLP engine, adding a missing data entity to the assistant’s template, adding synonyms to a given entity, or by adding translations and transliterations to help the engines that power the alternate languages capability of Slang. Naturally, the amount of time spent per utterance is not trivial and does not scale very easily. However, given the nature of the problem: language and language models, it is not easy to make a completely automated system without introducing the risk of noise and accuracy creep.
Between a completely automated feedback system, that can adversely affect the accuracy of the template and an entirely manual system, that can take too long and be cumbersome, we envision a system that strives to reduce the amount of work a manual worker has to perform by lessening the time they spend on troubleshooting the problem and in making the fix, if any.
The inferencing part of the loop consists of the template and assistants. In the beginning, when creating a new domain, a starting template is seeded with intents, entities, utterances data etc. that was collected using first principles or by crawling in the case of data, or by sourcing from multiple places in the case of translation data. This is all loaded into the template and the domain is released. Assistants are created and released to the general public, which are referred to as users, in the above diagram.
In the section that forms the ‘feedback’ part of the loop, the usage of the users are tracked with analytics events that are fired for certain actions. Among many things, the analytics consists of the actual utterance spoken and the intent and entities recognized.
If we consider the purely manual process, using this data, for example, the human worker would be asked to manually walk through each of the utterances and the corresponding intent and entities that were recognized, search for pairs that were wrong and apply the fix as they see them.
In a semi-automated system, we would have a processing subsystem that would be a library of heuristics, that could consist of pure and traditional language heuristics or they could be heuristics that build on user behavior, for example, the number of times an utterance was repeated in close succession. This processing system would slice and dice the utterances into categories for which it thinks there is a common fix and suggest that to the human worker. The same UI that displays this to the worker would also have provisions for the human to accept that the fix suggested was indeed the correct fix and apply the fix automatically. Thus potentially saving time for obvious fixes, while making sure the harder ones don’t lead to unnecessary creep in accuracy.
We have been hand wavy for some of the heuristics described above. We have covered many of them in detail in another blog. Refer Slang Blueprints: Heuristics for the semi-automated feedback system for more details and discussions.
The semi-automated system described above naturally leads onto the prospect of a fully automated system: if confidence builds in the heuristics used in the semi-automated system, then we can slowly shunt out the human part of the loop and directly apply fixes for the simple ones that have less risk of polluting the system.
We also plan to come up with metrics that track the performance of these fixes, in order to get more objective benchmarks for whether the fixes are working or not. We describe those in another blog soon.
Slang CONVA provides the easiest and fastest way to add In-App Voice Assistants to mobile and web apps. Sign up for an account at https://www.slanglabs.in to create your own retail assistant, or see Introducing Slang Conva for more information.