Skip to content
Last updated

Automating Identity Search/Retrieval (No End-User)

Introduction

This guide provides suggestions for how to be use Verato’s APIs when you are building a search or query process where the results of the search/query will be processed by an automated system with no end-user interaction.

Verato provides several different APIs designed to let you search or query your Verato MDM Cloud instance.

  • Demographics Search - this API is designed for running ‘loose' searches against your data with minimal input criteria. As a result, it is also designed to return back a list of search results (rather than just a single ‘best’ answer) that will vary in similarity to your input criteria. This API is most useful for scenarios where the list of search results will be displayed to a UI user rather than consumed by an automated process. However, there are cases where you might use this API in an automated process.

    • Suppose your automated process want to retrieve the person or persons associated with the phone number 312.555.1212 - the Demographics Search is the right API in this scenario since it is designed to return results even if there is minimal search criteria. However, you should expect that the API response could include zero-to-many search results, because there could be multiple people in your data set associated with that phone number.
  • Demographics Query - this API is designed for running a high-confidence search against your data with a complete set of identity information. It is intended to return back the single ‘best’ identity that matches your input criteria, and ONLY if that single ‘best’ identity is a high-confidence match that meets the same similarity threshold used to link together your source records into unique identities.

    • Suppose your automated process want to find a person named “Rebecca Smith”, who was born on “November 11, 1980”, who lives at address “123 Main St, Vienna, VA 22101”…AND you want to be sure that you only get back a result if the identity in your Verato MDM Cloud instance is definitely the same person that matches to your input criteria - you don’t want to get back other people named “Rebecca Smith” who live in “Vienna, VA” that don’t have a similar birth date and address. The Demographics Query API is the right API in this scenario, and you will get back either 1 or 0 identities in your API response.
  • Native ID Query - this API is designed to retrieve a single identity that contains the combination of Source + Native ID that you provide as input criteria.

  • Identity ID Query - this API is designed to retrieve a single identity that contains the Verato Link ID that you provide as input criteria.

The input criteria and search behavior differ across all four of those APIs, but all the APIs return data in a consistent manner - all four APIs return a JSON object that contains 1 or many (in the case of Demographics Search) unique identities from your Verato MDM Cloud instance. The identity returned in the API response is uniquely identified by a Verato-assigned ID called the Link ID. Each identity can be made up of one or more of your ‘source’ records, which are defined by the combination of Source + Native ID.

Verato’s UI portal is built to allow users to search using any of these three APIs: Demographics Search, Native ID Query, or Identity ID Query approaches.

If you are building a Verato search experience into your own application, you can use any of the 4 APIs mentioned above based on the search experience you are trying to create.

Planning For Your Use Case(s)

If you want to allow your automated process to search for and retrieve identities using the combination of Source and Native ID, the Native ID Query API is the API to use. The API request input is very simple - you must include a Source and Native ID value in the request, and you can optionally include a ‘Response Format’ parameter in the request. If a ‘Response Format’ parameter is omitted from your request, the DEFAULT format will be returned. Automated processes will typically only search using Source and Native ID if the Native ID value is a data value that is commonly used in business logic in your organization. Here are two contrasting examples of Native ID value visibility.

Suppose one of your data sources is Salesforce CRM. The identities that are synched from your Salesforce instance to your Verato instance will have a Native ID that is a unique key generated for the underlying Salesforce object, such as a contact ID or an account ID. This unique key is unlikely to be a data value that your users ever see or know of, but it could be a data value that is used in an automated process to synchronize data to or from Salesforce. Therefore it’s possible your automated process would want to search for identities in Verato using Source and Native ID.

Suppose another one of your data sources is the Epic electronic health record application. Each patient health record in Epic will have a medical record number, or MRN, and this value is likely to be the value used as the Native ID for this source. In the case of Epic, the MRN number IS a data value that is used frequently in interoperability integrations - therefore it is likely that your automated process might want to search for identities in Verato using Source and Native ID for Epic patient records.

If you want to allow your automated process to search for and retrieve identities using the Verato-assigned Link ID, the Identity ID Query is the API to use. It works in a very similar manner as the Native ID Query API described above. However, the Verato Link ID value is not typically a data value used by customer processes other than data warehousing, so it is uncommon that automated processes want the ability to search by Link ID.

If you want your automated process to search for and retrieve a single identity only when that identity is a high-confidence match to your search criteria, the Demographics Query API is the correct API to use. This API expects a full identity’s worth input data to perform Verato’s referential matching process against both Verato’s reference data and your own data. Typically a fully identity will contain a name, birth date, address or phone number, and sometimes even a social security number. The more input data used, the more likely you are to find the right matching identity above the high-confidence matching threshold set for your instance of Verato.

If you want your automated process to search for and retrieve identities using a single attribute value that is typically only associated with one person (other than Source+Native ID or Link ID), then the Demographics Search API is the correct API to use. Verato’s data model treats the Source, Native ID, and Link ID values as ‘special’ values. Other types of identifiers, such as an SSN, a Drivers License Number, or a Physician NPI number, are stored as standard attributes in Verato’s data model - retrieving data using even just one of these identifiers is still considered a demographic attribute search. While the Demographics Search API is often used for end-user based searching, it can be useful for automated processes if you are retrieving a single or small number of identities based on a fairly unique attribute value such as SSN or phone number.

Additional Guidance For Using The Demographics Query API

The Demographics Query API is a more targeted search API, designed to return either zero or one identity that is a high-confidence match to your input critieria. If you have an automated process that is searching for and retrieving an identity based on a set of identity data, this API will be most useful. One example of such an automated process would be a process that takes the identity information from a medical claim and uses it to search for the matching person in your data set loaded into Verato. The medical claim might contain a name, DOB, and address - and you want to retrieve the corresponding person from your Verato instance only if it is a high-confidence match, because you want to associate the medical claim to the right person.

Demographic Query Criteria

As noted above, this API expects a full identity’s worth input data to perform Verato’s referential matching process against both Verato’s reference data and your own data. Typically a fully identity will contain a name, birth date, address or phone number, and sometimes even a social security number. The ‘thinnest’ set of data that would typically return a high-confidence match would be either a full name plus birth date, or a full name plus SSN. Address and phone information is also very helpful in narrowing down the query to the correct identity. Your organization may have also incorporated other identifiers, such as a health insurance ID or a drivers license number, in which case these attribute values can also be very useful in the query process.

Input Parameters and Their Meaning

Other than the attribute values used as query input, there is only one other input parameter for Demographics Query. The API does not have an input parameter for score or number of identities, because the query is performed using the instance-wide parameter for the match threshold (typically a 0.8 threshold, though it could be adjusted per customer) and is only designed to return 0 or 1 results.

  • Response Format (responseIdentityFormatNames) - this is an array of string values that control which formats to use when returning identity data in the API response. You can request more than one format to be returned, in which case the API response will include multiple JSON objects within the body, one for each format. The response format controls the structure of, and the contents of, the data elements returned for each identity. Response formats are explained in more detail here.

Additional Guidance For Using The Demographics Search API

The input criteria and parameters available in the Demographics Search API are much broader than the other search or query APIs. If not used carefully, the Demographics Search API can return too long of a list of search results to be useful to an automated process that doesn’t include a human to review the results.

Demographic Search Criteria

The search input criteria itself is very flexible - you can include as few or as many input criteria as you want in the API request, and Verato will use whatever you provide to retrieve a set of the most similar identities.

The search process does account for varying degrees of similarity - you might search for somebody named Rebecca, but your search results could include identities with similar names such as Becky or Rebeccah along with identities that have an exact match on the name Rebecca. The search process also applies and returns a similarity score, with the search results sorted from highest to lowest similarity score. Going back to the example of search for an identity with the name Rebecca - with all other things being equal, the search result with the exact-matching name of Rebecca will be at the top of the list with a slightly higher similarity score than the search results with names of Becky or Rebeccah.

This is also true when using Demographics Search to retrieve identities associated with a fairly unique attribute such as an SSN or phone number. If you use Demographics Search to search for identities associated with just a single SSN value, it’s possible that there is more than one identity in your data set with that SSN value, and it’s also possible that your data set includes an identity with a very similar SSN (which differs by only 1 or 2 digits) which will be returned in the search results.

Input Parameters and Their Meaning

  1. Similarity Score Threshold (matchScoreThreshold) - this is a numeric value ranging from 0 to 1 that defines the minimum score below which search results will be filtered out. If left blank, the default is 0.0, meaning all results (up to the maxSearchResults limit) will be returned. If you set the minimum score to 0.45, then the API resonse will only include search results with a similarity score greater than or equal to 0.45.
  • Verato’s similarity scores range from 0 to 1, where 0 indicates that no similarity weight at all was calculated between the search input data and the identity returned in the search, and where 1 indicates the search input data represents complete certainty that the search result is THE SAME PERSON as your search input data. It’s important to note that a score of 1 is calculated taking all possible identity attributes into account. If you search using a name, birth date, SSN, and address, and the search result is very clearly the same person (because it strongly matches all of those data values), you could expect to see a score of 1.0. If you search using just a phone number, you will get back search results with the exact same phone number, but the similarity score will NOT be 1.0. The exact match on phone number will result in some partial weight towards a maximum 1.0 score, but the fact that you didn’t have any other identifying information to work with (such as a DOB or address) makes it impossible to be sure that your input data is THE SAME PERSON, just a person with some partial overall similarity.

  • The best use of the similarity score in the search results is as a filter to ensure your search results only contain exact matches. If you search for phone number 312.555.1212, you might get back 4 results - the highest scoring results will be exact matches to your input phone number, but there could also be slightly lower-scorings results with a similar, but not exactly matching, phone number. Setting an appropriate similarity score threshold is important if you want to use Demographics Search to retrieve only exact matches on single attribute values such as an SSN or phone number.

  1. Number of Results (maxSearchResults) - this is an integer value ranging from 1 to 100, and it controls the maximum number of search results that Verato will return in the API response. If you are using Demographics Search to target identities that exactly match a single attribute value such as a phone number or SSN, you should not need to use a high maxSearchResults parameter - a search that returns more than a few results is likely going to disrupt whatever automated process is trying to retrieve a single identity.

  2. Response Format (responseIdentityFormatNames) - this is an array of string values that control which formats to use when returning identity data in the API response. You can request more than one format to be returned, in which case the API response will include multiple JSON objects within the body, one for each format. The response format controls the structure of, and the contents of, the data elements returned for each identity. Response formats are explained in more detail here.

Helpful Tips

  • In the event that you use the Demographics Search API to search using a single attribute value such as an SSN or phone number, you will need to dynamically change the matchScoreThreshold parameter to ensure you only get back exact matches for your specific attribute value. The appropriate matchScoreThreshold for finding an exact SSN match will be different than for finding an exact phone match. Your Verato implementation and support team can assist you in finding the appropriate parameter values to use.

  • The Demographics Query API uses your input data to search against both your own data and Verato’s reference data. If a matching identity is found in Verato’s reference data set, that reference identity’s attribute values are used to enhance the search. This can result in search results that may seem to be lower-quality matches at first glance. For example, suppose you search for Rebecca Smith, born November 10, 1980, living at address 123 Main St. Verato may find a matching reference identity based on that information, and the reference identity will have multiple additional addresses known for Rebecca. These additional addresses will be added to the search input when searching within your own data. Further suppose that your own data contains a Rebecca Smith, born November 10, 1980, living at address 456 Oak Ln. If that address was present in Verato’s reference data, it will be used to determine that your Rebecca Smith identity in your data set is a high-confidence match to the input criteria. The search result returned will seem to have a different address than your input criteria, but that didn’t stop Verato from finding the right person.

  • The Demographics Search and Demographics Query behaviors take ‘fuzziness’ into account, both when selecting potential candidates to retrieve from the database and when calculating the final similarity score. This means that your search or query results can contain identities with less-than-perfect matches to your input criteria. In the case of Demographics Query, this is a good thing - a type on an address or a nickname is not going to prevent you from finding the ‘right’ identity. In the case of Demographics Search, you need to take care to use the matchScoreThreshold setting as mentioned above if you want to only get exact matches.

  • In some cases there are attribute values that can be searched on, but they do not contribute any weight to the similarity score. For example, you might want to store a health insurance ID number in the ‘Alternate Identifiers’ attribute cluster in your Verato instance because you occasionally search for identities using that number, but you don’t want that health insurance number to influence the actual matching outcomes. In this case, Verato would assign zero ‘weight’ to the attribute in the scoring configuration. If you subsequently did a Demographics Search for a specific health insurance ID, any identity with a matching health insurance ID would be returned in the list of search results, but it would be returned with a search score of 0.0. This is the only scenario in which you would expect to get back a search result with a 0.0 score.

  • Some product behavior is controlled by configuration settings on a per-instance or per-user basis, but not on a per-API-call basis. One such example is the match threshold used for Demographics Query API calls - as mentioned earlier, the default configuration is set to 0.8. If you have a scenario where you want a different setting JUST FOR searches (but not for adds or updates), one option is to ask your Verato support team to create a second user account with a different setting just for that user. Then you can use two different user accounts to make API calls, resulting in two different settings - this is a way of getting some additional control on a per-API-call basis by using different users as a way of ‘simulating’ a per-API-call setting.