Bixby Developer Center

Guides

Merging and Equivalence

Equivalence Definitions

Suppose you are looking for the best restaurant in an area, and you have reviews available from two different sources. Rather than displaying some restaurants twice, you need a way to merge duplicates. Bixby handles this by setting up equivalence definitions for each concept.

Because real-world inputs can be messy, a simple comparison is not enough to decide whether or not two inputs are equivalent. For example, you might want to treat two locations as the same as long as they are close together. Or, you might want to accept business names that contain minor typos or variations. You might also have complex structures where equivalence depends on a subset of the structure's properties, such as the name and the author.

To handle this, use an equivalence definition, which specifies how the system should compare two instances of the same concept. If the function returns true for two concept instances, the system will merge and present them as a single instance. If the function returns false, they are not the same instance. The function can also return uncertain when information is missing or when fuzzy matching returns a value below the confidence threshold. At the top-most level, only values that are considered true matches will be merged. You can modify this behavior when comparing structures.

Note

The equivalence functions discussed below are only used for merging results. They are not used by the Natural Language understanding system and have nothing to do with user input.

Primitive Equivalence

By default, two primitive values will match with true when identical and false otherwise. The equivalence function fuzzy-string-equality relaxes this threshold for strings. Here's an example of this:

name (BusinessName) {
description (The name of a business.)
equivalence: fuzzy-string-equality {
true-tolerance (0.9)
uncertain-tolerance (0.7)
similarity-measure (Edit)
}
}

You can also set tolerances for float values (primitive type decimal), using fuzzy-numeric-equality. The syntax is the same as fuzzy-string-equality.

Note

You cannot use non-numeric concepts with fuzzy-numeric-equality.

You can learn more about primitive equivalence in reference documentation.

Structure Equivalence

Comparing two structures is more complicated. By default, the system walks through all the properties and compares each, descending into sub-properties as needed. Each comparison uses any available equivalence definitions for the properties. Comparison of structures with any missing properties will always return uncertain. Otherwise, comparison returns true if and only if all property comparisons return true.

We can modify this behavior by defining equivalence as part of the concept structure. To do this, there are two primitive constraints and three conjunctions that join them together.

Here are the primitive constraints:

  • convertible-concepts: This returns true if two concept instances can be converted to each other: both instances have the same concept type (for example, they are both Business concepts), or if one is a sub-type of the other. For example, if Restaurant extends Business, then a Business and a Restaurant are convertible types. A Restaurant and a MovieTheater that both extend Business are not convertible types: they have no inheritance relationship between one other.

  • equivalent-values: This returns true if two concept instances have the same value for the specified property. For instance, equivalent-values (name) will return true if the two structures being compared both have a name property with the same value in each concept.

We use joins to aggregate the results of other constraints:

  • join: This acts like a min function across the nested constraints. If any nested constraint returns false, that is the result. Otherwise, if any nested constrain returns uncertain, that is the result. The result is true if and only if all the nested constraints return true.

  • optimistic-join: This modifies the behavior of a join by treating uncertain as true. It returns true if all the nested constraints return true or uncertain, and false otherwise. This conjunction never returns uncertain.

  • pessimistic-join: This modifies the behavior of a join by treating uncertain as false. It returns true if all the nested constraints return true, and false otherwise. This conjunction never returns uncertain.

Here are some examples of equivalence definitions:

Comparison of Convertible Concepts

structure (Business) {
property (address) {
type (viv.geo.Address)
}
// ... more properties ...
// Businesses get merged if their name and addresses match in a fuzzy
// way with an "uncertain" tolerance:
equivalence: optimistic-join {
convertible-concepts
equivalent-values (name)
equivalent-values (address)
}
}

Concepts that extend Business, for instance a Restaurant concept, can be compared to a Business and return true because of the convertible-concepts constraint. Because of the equivalent-values constraints, only the name and address properties will be compared to determine whether the structures are equivalent. Finally, the join is optimistic, so the result will be true as long as the name and address comparisons return either true or uncertain.

This illustrates the utility of returning uncertain. It might not seem very useful when comparing two instances directly, but it can bubble up to any parent concept comparison. For example, a name comparison might return uncertain, while the address comparison returns true. The Business concept specifies an optimistic join across these two properties, so the result would be true.

Comparison Using Latitude and Longitude Properties

structure (GeoPoint) {
property (latitude) {
type (geo.Latitude)
min (Required)
}
property (longitude) {
type (geo.Longitude)
min (Required)
}
// The confidence for a point will be true, false or uncertain
// depending on the specified location tolerances.
equivalence: join {
fuzzy-numeric-equality (latitude) {
true-tolerance (0.00005)
uncertain-tolerance (0.005)
}
fuzzy-numeric-equality (longitude) {
true-tolerance (0.00005)
uncertain-tolerance (0.005)
}
}
}

GeoPoint structures are compared using the latitude and longitude properties. Two points are equivalent if and only if both properties are within the specified tolerances. If either property comparison returns false, the points are not equivalent. Otherwise, the result is uncertain.

Geographic Distance

As a special case, you can define equivalence rules for GeoPoint properties using the distance-equality constraint. This returns true if two points are within a specified geographic distance of each other. In this example, the property centroid is a GeoPoint, and comparison returns true if two centroids are separated by 0.2 miles or less.

equivalence: join {
distance-equality (centroid) {
unit (Miles)
magnitude (0.2)
}
}
Note

You must use viv.core.BaseGeoPoint concepts with distance-equality.

You can learn more about structure equivalence in the reference documentation.

Deduplication of Multi-Value Nodes

By default, Bixby will ensure nodes with max (Many) cardinality have only unique elements by merging duplicate values. For example, imagine an Item structure concept:

structure (Item) {
property (name) {
type (Name)
min (Required) max (One)
}
property (departments) {
type (Department)
min (Required) max (Many)
}
}

The departments property can contain multiple Department values, but those values cannot be duplicates of one another. If departments had the values ["Hardware", "Toys", "Home Goods"], you could add the value "Kitchen" to it, but if you added the value "Toys", it would automatically be merged with the existing value "Toys" and the values would remain unique.

This behavior can be overridden with the no-auto-property-value-merging flag. When this override is set, Bixby will allow multi-value nodes to contain the same value more than once. You can use the Expression Language function dedupe to merge equivalent elements of a specified node. The following action, for example, takes a node with multiple strings and outputs a new node that removes any duplicates.

action (ReduceStrings) {
type (Constructor)
collect {
input (strings) {
type (String)
min (Optional)
max (Many)
}
}
output (String) {
evaluate {
$expr(dedupe(strings))
}
}
}