Entity Matching in the Wild: a Consistent and Versatile Framework to Unify Data in Industrial Applications
Entity matching is a fundamental operation that occurs in virtually all modern data management tasks. In this paper, we explained three main challenges when deploying identity resolution systems in real-world, large-scale data applications.
These challenges include:
How to support clustering at multiple confidence levels to enable downstream applications with varying precision/recall trade-off needs
How to combine different sources of data to create a more comprehensive profile of their customers without incorrect entity merges.
How to cluster records overtime and assign persistent cluster IDs that can be used for downstream use cases such as A/B tests or predictive model training