This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Privacy‐preserving record linkage

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

It has been recognized that sharing data between organizations can be of great benefit, since it can help discover novel and valuable information that is not available in individual databases. However, as organizations are under pressure to better utilize their large databases through sharing, integration, and analysis, protecting the privacy of personal information in such databases is an increasingly difficult task. Record linkage is the task of identifying and matching records that correspond to the same real‐world entity in several databases. This task implies a crucial infrastructure component in many modern information systems. Privacy and confidentiality concerns, however, commonly prevent the matching of databases that contain personal information across different organizations. In the past decade, efforts in the research area of privacy‐preserving record linkage (PPRL) have aimed to develop techniques that facilitate the matching of records across databases such that besides the matched records no private or confidential information is being revealed to any organisztion involved in such a linkage, or to any external party. We discuss the development of key techniques that solve the three main subproblems of PPRL, namely privacy, linkage quality, and scaling PPRL to large databases. We then highlight open challenges in this research area. This article is categorized under: Algorithmic Development > Association Rules Commercial, Legal, and Ethical Issues > Social Considerations Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing
Overview of the PPRL process
[ Normal View | Magnified View ]
Example of a Bloom filter–based similarity calculation, with and 2 hash functions. Using the Dice coefficient, the similarity between “john” and “johny” is calculated as . As can be seen, there is one collision (the seventh bit, shown in italics) where two hash values are mapped to the same bit, which affects the similarity being calculated.
[ Normal View | Magnified View ]
Example secure two‐party edit‐distance calculation. The initial empty split matrices are shown in the top row, whereas the bottom row shows the final calculated matrices. Values in italics in the two bottom matrices show the edit‐ distance between “g” and “gae.” The final edit distance between “gail” and “gael,” in the right‐most bottom cell, is .
[ Normal View | Magnified View ]
Basic steps involved in two‐ and three‐party PPRL protocols.
[ Normal View | Magnified View ]

Browse by Topic

Fundamental Concepts of Data and Knowledge > Data Concepts
Commercial, Legal, and Ethical Issues > Social Considerations
Technologies > Data Preprocessing
Algorithmic Development > Association Rules

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts