This blog provides information, news, tips, and announcements about the SQL Server Data Quality Services (DQS) feature introduced in SQL Server 2012.
SQL Server Data Quality Services (DQS) enables you to match records using your matching rules defined in DQS Knowledge Base. There are two common use cases where you would like to match related records. One is to identify duplicate in your records, such as same customers appear as separate records in your dataset. Another common case is to identify related customers. For example, you would like to identify two separate customer records that belong to the same parent company. In this blog article, I am going to show how to create a simple matching rules and use them to match duplicate and related records.
For this article, I am using the same data set and the knowledge base created in the previous blog article: How to cleanse Customer Data using SQL Server Data Quality Services.
The following are the high level steps:
A. Add matching policy to DQS Knowledge Base
B. Create and execute a DQS matching project
In this blog article, I discussed how to create a Knowledge Base to define matching policy. You can then refer to the Knowledge Base to create Data Quality Projects to match your customer data. In this article, I use a simple example to walk through the most common functionalities. For more detailed explanation of DQS Matching, refer to the blog article wrote by my colleague, Gadi Peleg:
It is also recommended that you run a separate cleansing project on your data set before you run matching project. Although DQS matching process uses the same domain definition for the cleansing, it does not make the correction nor does it flag invalid values. In addition, DQS matching only use the internal domain definition and does not automatically send your data to external third party providers. You must explicitly run a cleansing project to send the data externally to the Reference Data Service providers. However, the Service Provider may include additional information about the data you send. For example, Dun & Bradstreet includes DunsNumber in the output. This may be another useful information to use as part of your matching rule. Refer to my previous article on Cleansing your Customer data using Dun & Bradstreet DQS Service for more information on how to use DQS to cleanse data using Reference Data Service.
In the next blog post, I will describe how to automate the DQS matching project using SSIS.