This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

# Privacy‐preserving record linkage

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

It has been recognized that sharing data between organizations can be of great benefit, since it can help discover novel and valuable information that is not available in individual databases. However, as organizations are under pressure to better utilize their large databases through sharing, integration, and analysis, protecting the privacy of personal information in such databases is an increasingly difficult task. Record linkage is the task of identifying and matching records that correspond to the same real‐world entity in several databases. This task implies a crucial infrastructure component in many modern information systems. Privacy and confidentiality concerns, however, commonly prevent the matching of databases that contain personal information across different organizations. In the past decade, efforts in the research area of privacy‐preserving record linkage (PPRL) have aimed to develop techniques that facilitate the matching of records across databases such that besides the matched records no private or confidential information is being revealed to any organisztion involved in such a linkage, or to any external party. We discuss the development of key techniques that solve the three main subproblems of PPRL, namely privacy, linkage quality, and scaling PPRL to large databases. We then highlight open challenges in this research area.

• Algorithmic Development > Association Rules
• Commercial, Legal, and Ethical Issues > Social Considerations
• Fundamental Concepts of Data and Knowledge > Data Concepts
• Technologies > Data Preprocessing
Overview of the PPRL process
[ Normal View | Magnified View ]
Example of a Bloom filter–based similarity calculation, with $l=14$ and 2 hash functions. Using the Dice coefficient, the similarity between “john” and “johny” is calculated as $2×5/(5+7)=10/12=0.83$. As can be seen, there is one collision (the seventh bit, shown in italics) where two hash values are mapped to the same bit, which affects the similarity being calculated.
[ Normal View | Magnified View ]
Example secure two‐party edit‐distance calculation. The initial empty split matrices are shown in the top row, whereas the bottom row shows the final calculated matrices. Values in italics in the two bottom matrices show the edit‐ distance between “g” and “gae.” The final edit distance between “gail” and “gael,” in the right‐most bottom cell, is $0.4+0.6=1.0$.
[ Normal View | Magnified View ]
Basic steps involved in two‐ and three‐party PPRL protocols.
[ Normal View | Magnified View ]