δ-Dependency for privacy-preserving XML data publishing

https://doi.org/10.1016/j.jbi.2014.01.013Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Semantic relationships in XML make the data vulnerable to privacy attacks.

  • We propose a privacy approach for XML that considers semantic relationships.

  • We compare our model and algorithm against other common privacy approaches.

  • Diversification and dissection techniques can be used to protect privacy in XML.

Abstract

An ever increasing amount of medical data such as electronic health records, is being collected, stored, shared and managed in large online health information systems and electronic medical record systems (EMR) (Williams et al., 2001; Virtanen, 2009; Huang and Liou, 2007) [1], [2], [3]. From such rich collections, data is often published in the form of census and statistical data sets for the purpose of knowledge sharing and enabling medical research. This brings with it an increasing need for protecting individual people privacy, and it becomes an issue of great importance especially when information about patients is exposed to the public.

While the concept of data privacy has been comprehensively studied for relational data, models and algorithms addressing the distinct differences and complex structure of XML data are yet to be explored. Currently, the common compromise method is to convert private XML data into relational data for publication. This ad hoc approach results in significant loss of useful semantic information previously carried in the private XML data. Health data often has very complex structure, which is best expressed in XML. In fact, XML is the standard format for exchanging (e.g. HL7 version 31) and publishing health information. Lack of means to deal directly with data in XML format is inevitably a serious drawback.

In this paper we propose a novel privacy protection model for XML, and an algorithm for implementing this model. We provide general rules, both for transforming a private XML schema into a published XML schema, and for mapping private XML data to the new privacy-protected published XML data. In addition, we propose a new privacy property, δ-dependency, which can be applied to both relational and XML data, and that takes into consideration the hierarchical nature of sensitive data (as opposed to “quasi-identifiers”). Lastly, we provide an implementation of our model, algorithm and privacy property, and perform an experimental analysis, to demonstrate the proposed privacy scheme in practical application.

Keywords

Medical privacy
XML
Hierarchy
Dissection
Privacy-preserving healthcare data

Cited by (0)