Skip to content
Prev Previous commit
Next Next commit
And finally the documentation.
  • Loading branch information
gusmith committed Nov 13, 2019
commit ec615d20604f0abea4437cb074f539ab1d9ad4fa
17 changes: 9 additions & 8 deletions backend/entityservice/api_def/swagger.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -509,25 +509,26 @@ paths:

### result_type = "similarity_scores"

The list of the indices of potential matches and their similarity score
The list of the candidate of potential matches and their similarity score
where the similarity score is greater than the mapping threshold.
Data is returned as `json` object e.g.,

{
"similarity_scores":
[
[5, 27, 1.0],
[14, 10, 1.0]
{'group': [[0, 5], [1, 27]], 'sim': 1.0},
{'group': [[1, 10], [0, 14]], 'sim': 1.0}
]
}


The element in the list is of the following format `[indexA, indexB, score]`,
where `indexA` refers to the index of entity from data provider 1, `indexB` is the index of entity
from data provider 2 that is a potential match to entity in `indexA`, and `score` is the similarity score
representing the likelihood that entity in `indexA` and entity in `indexB` is a match.
The element in the list is of the following format `{'group': [[party_id_0, row_index_0], [party_id_1, row_index_1]], 'sim': score}`,
where the value of `group` is a candidate pair represented in a `group` format, i.e. `[party_id_0, row_index_0]`
refers to the record at the index `row_index_0` from the dataset `party_id_0`, similarly for `[party_id_1, row_index_1]`,
and `score` is the similarity score representing the likelihood that the group is a match.

`indexA` and `indexB` starts from 0.
`ds_index_0`, `rec_index_0, `ds_index_1` and `rec_index_1` start from 0, and `party_id_0 != party_id_1` but
not necessarilly ordered.

The value of `score` is between 0.0 and 1.0, where 0.0 corresponds to no match
and 1.0 corresponds to total match.
Expand Down
6 changes: 3 additions & 3 deletions docs/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,14 @@ relationships.
The ``result_token`` (generated when creating the mapping) is required. The ``result_type`` should
be set to ``"similarity_scores"``.

Results are a simple JSON array of arrays::
Results are a JSON array of JSON objects::

[
[index_a, index_b, score],
{'group': [[party_id_0, row_index_0], [party_id_1, row_index_1]], 'sim': score},
...
]

Where the index values will be the 0 based row index from the uploaded CLKs, and
Where the index values will be the 0 based dataset index and row index from the uploaded CLKs, and
the score will be a Number between the provided threshold and ``1.0``.

A score of ``1.0`` means the CLKs were identical. Threshold values are usually between
Expand Down