Anonymizing transactional datasets
Abstract
In this paper, we study the privacy breach caused by unsafe correlations in transactional data where individuals have multiple tuples in a dataset. We provide two safety constraints to guarantee safe correlation of the data: (1) the safe grouping constraint to ensure that quasi-identifier and sensitive partitions are bounded by l-diversity and (2) the schema decomposition constraint to eliminate non-arbitrary correlations between non-sensitive and sensitive values to protect privacy and at the same time increase the aggregate analysis. In our technique, values are grouped together in unique partitions that enforce l-diversity at the level of individuals. We also propose an association preserving technique to increase the ability to learn/analyze from the anonymized data. To evaluate our approach, we conduct a set of experiments to determine the privacy breach and investigate the anonymization cost of safe grouping and preserving associations.
Collections
- Computer Science & Engineering [2402 items ]
- Information Intelligence [93 items ]