Skip to content
Last updated

Building an End-User Identity Search with Verato APIs

Introduction

This guide provides suggestions for how to be use Verato’s APIs when you are building a search or query process where the results of the search/query will be displayed in a UI where users can see the results and pick from them.

Verato provides several different APIs designed to let you search or query your Verato MDM Cloud instance.

  • Demographics Search - this API is designed for running ‘loose' searches against your data with minimal input criteria. As a result, it is also designed to return back a list of search results (rather than just a single ‘best’ answer) that will vary in similarity to your input criteria. This API is most useful for scenarios where the list of search results will be displayed to a UI user rather than consumed by an automated process.

    • Suppose your users want to search for a person named “Rebecca Smith” who lives in Virginia - the Demographics Search is the right API in this scenario, and you should expect that the API response could include multiple search results, because there will likely be multiple people in your data set with names similar to “Rebecca Smith” who live in Virginia.

  • Demographics Query - this API is designed for running a high-confidence search against your data with a complete set of identity information. It is intended to return back the single ‘best’ identity that matches your input criteria, and ONLY if that single ‘best’ identity is a high-confidence match that meets the same similarity threshold used to link together your source records into unique identities.

    • Suppose your users want to search for a person named “Rebecca Smith”, who was born on “November 11, 1980”, who lives at address “123 Main St, Vienna, VA 22101”…AND they want to be sure that they only get back a result if the identity in your Verato MDM Cloud instance is definitely the same person that matches to your input criteria - they don’t want to get back other people named “Rebecca Smith” who live in “Vienna, VA” that don’t have a similar birth date and address. The Demographics Query API is the right API in this scenario, and you will. get back either 1 or 0 identities in your API response.

  • Native ID Query - this API is designed to retrieve a single identity that contains the combination of Source + Native ID that you provide as input criteria.

  • Identity ID Query - this API is designed to retrieve a single identity that contains the Verato Link ID that you provide as input criteria.

The input criteria and search behavior differ across all four of those APIs, but all the APIs return data in a consistent manner - all four APIs return a JSON object that contains 1 or many (in the case of Demographics Search) unique identities from your Verato MDM Cloud instance.

Verato’s UI portal is built to allow users to search using any of these three APIs: Demographics Search, Native ID Query, or Identity ID Query approaches.

If you are building a Verato search experience into your own application, you can use any of the 4 APIs mentioned above based on the search experience you are trying to create.

Planning For Your Use Case(s).

If you want to allow your UI users to search for and retrieve identities using the combination of Source and Native ID, the Native ID Query API is the API to use. The API request input is very simple - you must include a Source and Native ID value in the request, and you can optionally include a ‘Response Format’ parameter in the request. If a ‘Response Format’ parameter is omitted from your request, the DEFAULT format will be returned. Users will typically only search using Source and Native ID if the Native ID value is a data value that is commonly used or available for them. Here are two contrasting examples of Native ID value visibility.

  • Suppose one of your data sources is Salesforce CRM. The identities that are synched from your Salesforce instance to your Verato instance will have a Native ID that is a unique key generated for the underlying Salesforce object, such as a contact ID or an account ID. This unique key is unlikely to be a data value that your users ever see or know of, and therefore it’s unlikely your users would want to search for identities in Verato using Source and Native ID.

  • Suppose another one of your data sources is the Epic electronic health record application. Each patient health record in Epic will have a medical record number, or MRN, and this value is likely to be the value used as the Native ID for this source. In the case of Epic, the MRN number IS a data value that is visible to users and known by users (even known by the patient themselves) - therefore it is possible that your users might want to search for identities in Verato using Source and Native ID for Epic patient records.

If you want to allow your UI users to search for and retrieve identities using the Verato-assigned Link ID, the Identity ID Query is the API to use. It works in a very similar manner as the Native ID Query API described above. However, the Verato Link ID value is not typically a data value that is visible to or commonly used by UI users, so it is uncommon that UI users want the ability to search by Link ID.

If you want to allow your UI users to search using any other data values other than Source+Native ID or Link ID, in nearly all cases the Demographics Search API is the correct API to use. The Demographics Search API is designed to accommodate both thin and rich input data criteria, and it is designed to return back 1-to-many search results over a variety of similarity scores, which is likely what your UI users are expecting.

Verato’s data model treats the Source, Native ID, and Link ID values as ‘special’ values. Other types of identifiers, such as an SSN, a Drivers License Number, or a Physician NPI number, are stored as standard attributes in Verato’s data model - retrieving data using even just one of these identifiers is still considered a demographic attribute search.

Additional Guidance For Using The Demographics Search API

The input criteria and parameters available in the Demographics Search API are much broader than the other search or query APIs.

Demographic Search Criteria.

The search input criteria itself is very flexible - you can include as few or as many input criteria as you want in the API request, and Verato will use whatever you provide to retrieve a set of the most similar identities.

The search process does account for varying degrees of similarity - you might search for somebody named Rebecca, but your search results could include identities with similar names such as Becky or Rebeccah along with identities that have an exact match on the name Rebecca. The search process also applies and returns a similarity score, with the search results sorted from highest to lowest similarity score. Going back to the example of search for an identity with the name Rebecca - with all other things being equal, the search result with the exact-matching name of Rebecca will be at the top of the list with a slightly higher similarity score than the search results with names of Becky or Rebeccah.

The search process can also return unhelpful results to the user if the search input criteria are both few in number and non-specific in nature. For example, if you submit a Demographics Search using input criteria of only First Name = JOHN and Gender = M, you can expect to get back a long list of identities whose first name is JOHN, all of them with an equal similarity score - the list is unlikely to be useful for the user to look through, because they will need to enter more search criteria if they hope to narrow down to an identity that might be the one they are really looking for. By contrast, you could submit a Demographics Search using input criteria of just an SSN - while it is only a single input value, it is also a highly specific type of attribute, and you would expect to get back a very short list of search results.

Input Parameters and Their Meaning.

  1. Similarity Score Threshold (matchScoreThreshold) - this is a numeric value ranging from 0 to 1 that defines the minimum score below which search results will be filtered out. If left blank, the default is 0.0, meaning all results (up to the maxSearchResults limit) will be returned. If you set the minimum score to 0.45, then the API resonse will only include search results with a similarity score greater than or equal to 0.45.

    • Verato’s similarity scores range from 0 to 1, where 0 indicates that no similarity weight at all was calculated between the search input data and the identity returned in the search, and where 1 indicates the search input data represents complete certainty that the search result is THE SAME PERSON as your search input data. It’s important to note that a score of 1 is calculated taking all possible identity attributes into account. If you search using a name, birth date, SSN, and address, and the search result is very clearly the same person (because it strongly matches all of those data values), you could expect to see a score of 1.0. If you search using just a first and last name, you will get back search results with the exact same first and last name, but the similarity score will NOT be 1.0. The exact match on first and last name will result in some partial weight towards a maximum 1.0 score, but the fact that you didn’t have any other identifying information to work with (such as a DOB or address) makes it impossible to be sure that your input data is THE SAME PERSON, just a person with some partial overall similarity.

    • The best use of the similarity score in the search results is as a relative sorting and comparison value. If you search for Rebecca Smith and you get back 10 results, the higher scoring results will have more similarity to that name than the lower-scoring results. Results that score very close to each other (0.44 and 0.42 for example) have very close similarity to eachother, while results that score far apart (0.44 and 0.13 for example) will have a much larger discrepancy in similarity, making the higher-scoring result a more likely candidate as the right person you were looking for.

    • The search similarity score will also vary based on the amount of data you provide as input. If you search using just a first name, you will have lower possible scores - even an exactly-matching first name by itself will score quite low. If you search using first name, last name, birth date, and city, you will have higher possible scores, and you will see more separation of better and worse search results reflected in the score.

  2. Number of Results (maxSearchResults) - this is an integer value ranging from 1 to 100, and it controls the maximum number of search results that Verato will return in the API response. The upper limit of 100 can be adjusted on your behalf by Verato’s support staff if needed - but this will have performance implications to the API response times, and the value of search results to a UI user becomes very low if there are 100 or more results for them to look through. In most cases where 100 search results isn’t good enough to distinguish between the ‘good’ and ‘bad’ search results, the only way to improve the results is to provide more search criteria. Most clients use a value in the 10-20 range as a reasonable limit for how many search results their UI users will realistically read through.

  • This parameter controls the maximum possible results returned, but you can have fewer results than the maximum. For example, you could search using just a single SSN value with a maxSearchResults setting of 50, but you are likely to only have a few identities in your Verato instance with some partial degree of similarity, so you might only see a few search results instead of 50.

  • The maximum search results limit is applied based on similarity score, but there is not tiebreaker at the end of the limit. If you search for Rebecca Smith with a maxSearchResults setting of 10, the highest-scoring identities in your Verato instance should be returned at the top of the list, but the 11th highest-scoring identity (which will not be returned because your limit was 10) might be a tie with the 10th-highest scoring identity. This only becomes problematic when users are doing searches using just name values, or just a first name and birth date - in these cases, you might have 30 identities with similar or tied scores, but you’re only getting back 10 of them, so your user won’t know if the ‘right’ person they are trying to find was in the top 10 candidates or not. If you encounter this scenario, contact Verato support - there are other search behavior parameters they can adjust in your Verato instance.

  1. Response Format (responseIdentityFormatNames) - this is an array of string values that control which formats to use when returning identity data in the API response. You can request more than one format to be returned, in which case the API response will include multiple JSON objects within the body, one for each format. The response format controls the structure of, and the contents of, the data elements returned for each identity. Response formats are explained in more detail here (Missing Link).

Helpful Tips

  • You may find that dynamically changing the matchScoreThreshold parameter based on the number and type of input values helps you get the best results. If your users sometimes search with 4 or more input values (for example, a first name, a last name, a birth date, and a phone number) but you are using a low matchScoreThreshold, you might get back more results than you want, with some of those results being of low similarity (for example, the 20th of 20 results might only have the same last name as your input criteria, but everything else is dissimilar). In this case, a higher matchScoreThreshold in the range of 0.6 could be most useful. If your users also sometimes search using just a phone number or just a first name and city, those searches will need a lower maxScoreThreshold to ensure you don’t filter out the responses. In this second scenario, a 0.6 threshold would filter out ALL possible results, so you would need something lower in perhaps the 0.4 range.

  • The search score will vary a lot when you add or remove entire attributes, but it will vary in smaller increments when the search results have partial similarity with your input criteria. For example, if you search for Rebecca Smith and your best result has a 0.52 score, you might find that there are many other results within the next 0.05 increment down that have similar names (nicknames or spelling variations).

  • The search behavior takes ‘fuzziness’ into account, both when selecting potential candidates to retrieve from the database and when calculating the final similarity score. This means that if you search for identities using just a phone number, for example, your search results can include identities with the exact phone number you searched for along with identities whose phone number is 1 or 2 digits off from your search criteria. The non-exact-matching results will have a lower similarity score than the exact-matching result. If you want to strictly filter out any non-exact-matching phones, you can do so by controlling the matchScoreThreshold, setting it in such a way that only the exact-matching phones will meet or exceed the threshold. Your Verato implementation and support team can help you determine appropriate scores to use.

  • In some cases there are attribute values that can be searched on, but they do not contribute any weight to the similarity score. For example, you might want to store a health insurance ID number in the ‘Alternate Identifiers’ attribute cluster in your Verato instance because you occasionally search for identities using that number, but you don’t want that health insurance number to influence the actual matching outcomes. In this case, Verato would assign zero ‘weight’ to the attribute in the scoring configuration. If you subsequently did a search for a specific health insurance ID, any identity with a matching health insurance ID would be returned in the list of search results, but it would be returned with a search score of 0.0. This is the only scenario in which you would expect to get back a search result with a 0.0 score.