Re-identification of Clinical Data Through Diagnosis Information

Gkoulalas-Divanis, Aris; Loukides, Grigorios

doi:10.1007/978-1-4614-5668-1_3

Aris Gkoulalas-Divanis³ &
Grigorios Loukides⁴

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSELECTRIC))

Abstract

In this chapter, we present an attack that can associate patients with their diagnosis and genomic information. The attack involves linking the published data with external, identified datasets, based on diagnosis codes. After motivating the need to prevent the attack, we discuss the type of datasets that are involved in the attack, in Sect. 3.2. Then, a measure that quantifies the susceptibility of a dataset to the attack, as well as a study of the feasibility of the attack in an Electronic Medical Record (EMR) data publishing scenario, are presented in Sect. 3.3. Last, a set of measures that capture the utility loss that sharing the published data in a way that prevents the attack is discussed in Sect. 3.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.bbmri.eu/index.php/catalog-of-european-biobanks
2.
We refer to ICD-9 codes, which are used to assign health insurance billing codes to diagnoses in the United States.

References

Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMOD Record 29(2), 439–450 (2000)
Article Google Scholar
Barbour, V.: Uk biobank: A project in search of a protocol? Lancet 361(9370), 1734–1738 (2003)
Article Google Scholar
Emam, K.E.: Methods for the de-identification of electronic health records for genomic research. Genome Medicine 3(4), 25 (2011)
Article Google Scholar
Gurwitz, D., Lunshof, J., Altman, R.: A call for the creation of personalized medicine databases. Nature Reviews Drug Discovery 5(1), 23–26 (2006)
Article Google Scholar
Lin, Z., Hewett, M., Altman, R.: Using binning to maintain confidentiality of medical data. In: AMIA Annual Symposium, pp. 454–458 (2002)
Google Scholar
Loukides, G., Denny, J., Malin, B.: The disclosure of diagnosis codes can breach research participants’ privacy. Journal of the American Medical Informatics Association 17, 322–327 (2010)
Google Scholar
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proceedings of the National Academy of Sciences 17(107), 7898–7903 (2010)
Article Google Scholar
Mailman, M., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al.: The ncbi dbgap database of genotypes and phenotypes. Nature Genetics 39, 1181–1186 (2007)
Article Google Scholar
Malin, B.: A computational model to protect patient data from location-based re-identification. Artificial Intelligence in Medicine 40, 223–239 (2007)
Article Google Scholar
McGuire, A., Fisher, R., Cusenza, P., et al.: Confidentiality, privacy, and security of genetic and genomic text information in electronic health records: points to consider. Genetics in Medicine 10(7), 495–499 (2008)
Article Google Scholar
Organization for Economic Co-operation and Development, Directorate for Science, Technology and Industry: Towards a global biological resource centre network. http://www.oecd.org/document/51/0,3746,en_2649_34537_33791027_1_1_1_1,00.html (2007)
Roden, D., Pulley, J., Basford, M., Bernard, G., Clayton, E., Balser, J., Masys, D.: Development of a large scale de-identified dna biobank to enable personalized medicine. Clinical Pharmacology and Therapeutics 84(3), 362–369 (2008)
Article Google Scholar
Rothstein, M., Epps, P.: Ethical and legal implications of pharmacogenomics. Nature Review Genetics 2, 228–231 (2001)
Article Google Scholar
Samarati, P.: Protecting respondents identities in microdata release. TKDE 13(9), 1010–1027 (2001)
Google Scholar
Stead, W., Bates, R., Byrd, J., Giuse, D., Miller, R., Shultz, E.: Case study: The vanderbilt university medical center information management architecture (2003)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS 10, 557–570 (2002)
MathSciNet MATH Google Scholar
U.S. Department of Health and Human Services, Office for Civil Rights: Standards for protection of electronic health information; final rule (2003). Federal Register, 45 CFR: Pt. 164
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research - Ireland, Damastown Industrial Estate, Mulhuddart, Ireland
Aris Gkoulalas-Divanis
The Parade, Cardiff University, Cardiff, UK
Grigorios Loukides

Authors

Aris Gkoulalas-Divanis
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Loukides
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gkoulalas-Divanis, A., Loukides, G. (2013). Re-identification of Clinical Data Through Diagnosis Information. In: Anonymization of Electronic Medical Records to Support Clinical Analysis. SpringerBriefs in Electrical and Computer Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5668-1_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5668-1_3
Published: 13 September 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5667-4
Online ISBN: 978-1-4614-5668-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics