In Training for Natural Language, you learned how to provide natural language utterances to your capsule that teach Bixby how to understand your use cases.
This guide describes the limitations of training for natural language, provides principles to make your training more effective, and suggests best practices.
Bixby is a powerful and unique tool, but as the famous quote says, "with great power comes great responsibility." Also, with great power comes some limitations:
With these limitations, you might wonder how best to do training. Primarily, you will need to use iteration, experience, and common sense. Trial and error is the best teacher for understanding what works and what doesn't. However, relying solely on iteration is incredibly time-intensive. Having a community of developers is also helpful.
Since you might not have the time or expertise for effective training, we recommend following these three principles:
Using "Big Data" is usually sufficient to solve many natural language problems, but for the Bixby platform, you can inject smaller, representative, "smart" data and let it work as long as it follows these three principles: tightly-scoped, well-discriminated, and obviously-tagged.
Effective capsules minimize surface area.
If a capsule's scope is the surface area of the domain it covers, you should minimize this surface area. Every NL-facing feature that you add increases the amount and variety of data required to address it.
The solution to this is to model and train as little as possible and still have a useful experience. This means striving for a tight foundation:
Keeping your capsule tightly-scoped also means not worrying about utterances that are outside of your capsule's scope. Bixby is responsible for figuring out which NL space the utterance belongs to and would not send utterances that don't fit your scope. For example, if you have a recipe capsule and a user asks "How many calories are in an apple?", you wouldn't need to train Bixby to say I don't know how to do that because Bixby will direct this to a more appropriate capsule. Similarly, you shouldn't have to train something like "Where is the Taj Mahal located?" or "What is the weather like today?" if you're creating a recipe capsule, since these are wildly out of scope.
This circumstance might change with Natural Language categories. If your capsule falls under a specific category but doesn't handle certain requests, there are some utterances you might have to train for Bixby to gracefully handle those situations.
If you go out of scope of your capsule domain, you run into the following possible dangers:
In short, design and build capsules that you can repeatedly use. Your capsule should be robust and well-crafted, which in turn makes it trustworthy (you can predict what it does) and understandable (you can mentally build a model of how it works).
You might need to build different capsules to address different use cases. For example, a capsule that returns recipes should be tightly-scoped. But you can build a separate capsule that does food question and answers (such as "how many calories are in yogurt?"). This is one way you can teach Bixby more, even if your use cases are not that tightly scoped.
Effective capsules clearly claim their queries.
Finding the right capsule for a user query can sometimes be a challenge. If all user queries for a capsule obviously belong to that capsule, we can call that capsule well-discriminated. If not, you risk chaotic and unnatural failures, lowered confidence in query results, and increased training that is needed.
In general, you should reduce ambiguity by following these two recommendations:
If possible, make training queries well-discriminated based on their non-signal text alone. This is the best way for Bixby to be able to classify capsules and works best with speech recognition. In the following examples, if you get rid of the signal text (in brackets), you can still figure out which capsule the queries belong to:
If vocabulary is required, make the vocabulary itself clean and well-discriminated. In the examples below, the vocabulary (in brackets) is limited to unambiguous choices while more confusing options are struck out:
Note that Bixby should be able to generalize user queries from these training entries and then catch more confusing instances, such as the airport code THE
or airline code NO
. If you find that this isn't happening, then you should just train the out-of-vocabulary cases as well, and Bixby will learn when those words mean things like airports or restaurants and also when they don't.
For more information on using vocabulary, see Extending Training With Vocabulary Developers' Guide.
Effective capsules use easy-to-recognize signals.
Value labeling should be robust in order to handle rewording and, at times, speech recognition errors. It is easier to show examples that aren't annotated well. We include some examples of not obviously-annotated signals, as well as some guidance on annotating primitives, not structures.
The following examples show utterances that are not annotated well, explain the issues with each example, and give recommendations on how to avoid these issues.
Example 1: Tip Calculator
Consider the following problematic utterances:
(Split)[r:ChangeBillSplit] the bill.
What is the total split (4)[v:SplitAmount] ways?
The issues with this example are as follows:
ChangeBillSplit
in the first example. In the second, 4 is mapped to SplitAmount
, but split itself is not annotated.Example 2: Weather
Consider the following problematic utterances:
Do I (need an umbrella)[v:WeatherCondition:rain] today?
Will it (rain)[v:WeatherCondition:rain] today?
This example has two major issues. The first is that it abuses enums. Don't use enums unless they can be matched exactly with vocabulary. The more words in a match, the more brittle the vocabulary. Also, as with the last example, the training entry is unclear. Specifically, it is unclear where the match should start and end. Should the match start at "need", "an", or "umbrella"? If a human is unsure, then Bixby will be too!
Example 3: Recipes
Consider the following problematic utterances:
(Butter)[v:IngredientName] recipes
(Pot roast)[v:FoodName] recipes
This example highlights context issues. Specifically, it is difficult to determine that "Butter" is an IngredientName
and "Pot roast" is a FoodName
without using vocabulary. Also keep the following in mind:
IngredientName
items.Ingredient
and Food
can easily be blurred. For example, consider asking for "chicken recipes". Are you cooking a whole chicken or do you want recipes that use chicken?The solution to clarify the meanings in this capsule is to model all the food items as FoodName
, annotate all out-of-vocabulary instances as FoodName
, and then distinguish them using Roles when the language is very clear, for example if the utterance says "including Butter".
You should annotate primitive concepts, not structure concepts, in utterances. In an utterance such as "find shirts under $40", you might have a Currency
structure which includes CurrencySymbol
and CurrencyValue
. Instead of trying to annotate "$40" as Currency
, annotate "$" as CurrencySymbol
and "40" as CurrencyValue
:
find shirts under ($)[v:CurrencySymbol](40)[v:CurrencyValue]
Provide a constructor action in your capsule to combine the two primitive concepts into the structure concept. Bixby will use that action when it needs to during execution.
The exception to this rule is structure concepts that have been trained as patterns, such as the core viv.time.DateTimeExpression
. Patterns provide vocabulary to tell Bixby how to interpret structure concepts, so these structure concepts can be used directly in NL training.
When you are adding training examples, less is more. Let Bixby do as much work as possible. Set the goal, identify any values, and verify the plan.
Here are some other best practices to keep in mind.
Do:
Don't: