Skip to main content

Re-identification of Clinical Data Through Diagnosis Information

  • Chapter
  • First Online:
Anonymization of Electronic Medical Records to Support Clinical Analysis

Abstract

In this chapter, we present an attack that can associate patients with their diagnosis and genomic information. The attack involves linking the published data with external, identified datasets, based on diagnosis codes. After motivating the need to prevent the attack, we discuss the type of datasets that are involved in the attack, in Sect. 3.2. Then, a measure that quantifies the susceptibility of a dataset to the attack, as well as a study of the feasibility of the attack in an Electronic Medical Record (EMR) data publishing scenario, are presented in Sect. 3.3. Last, a set of measures that capture the utility loss that sharing the published data in a way that prevents the attack is discussed in Sect. 3.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.bbmri.eu/index.php/catalog-of-european-biobanks

  2. 2.

    We refer to ICD-9 codes, which are used to assign health insurance billing codes to diagnoses in the United States.

References

  1. Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMOD Record 29(2), 439–450 (2000)

    Article  Google Scholar 

  2. Barbour, V.: Uk biobank: A project in search of a protocol? Lancet 361(9370), 1734–1738 (2003)

    Article  Google Scholar 

  3. Emam, K.E.: Methods for the de-identification of electronic health records for genomic research. Genome Medicine 3(4), 25 (2011)

    Article  Google Scholar 

  4. Gurwitz, D., Lunshof, J., Altman, R.: A call for the creation of personalized medicine databases. Nature Reviews Drug Discovery 5(1), 23–26 (2006)

    Article  Google Scholar 

  5. Lin, Z., Hewett, M., Altman, R.: Using binning to maintain confidentiality of medical data. In: AMIA Annual Symposium, pp. 454–458 (2002)

    Google Scholar 

  6. Loukides, G., Denny, J., Malin, B.: The disclosure of diagnosis codes can breach research participants’ privacy. Journal of the American Medical Informatics Association 17, 322–327 (2010)

    Google Scholar 

  7. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proceedings of the National Academy of Sciences 17(107), 7898–7903 (2010)

    Article  Google Scholar 

  8. Mailman, M., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al.: The ncbi dbgap database of genotypes and phenotypes. Nature Genetics 39, 1181–1186 (2007)

    Article  Google Scholar 

  9. Malin, B.: A computational model to protect patient data from location-based re-identification. Artificial Intelligence in Medicine 40, 223–239 (2007)

    Article  Google Scholar 

  10. McGuire, A., Fisher, R., Cusenza, P., et al.: Confidentiality, privacy, and security of genetic and genomic text information in electronic health records: points to consider. Genetics in Medicine 10(7), 495–499 (2008)

    Article  Google Scholar 

  11. Organization for Economic Co-operation and Development, Directorate for Science, Technology and Industry: Towards a global biological resource centre network. http://www.oecd.org/document/51/0,3746,en_2649_34537_33791027_1_1_1_1,00.html (2007)

  12. Roden, D., Pulley, J., Basford, M., Bernard, G., Clayton, E., Balser, J., Masys, D.: Development of a large scale de-identified dna biobank to enable personalized medicine. Clinical Pharmacology and Therapeutics 84(3), 362–369 (2008)

    Article  Google Scholar 

  13. Rothstein, M., Epps, P.: Ethical and legal implications of pharmacogenomics. Nature Review Genetics 2, 228–231 (2001)

    Article  Google Scholar 

  14. Samarati, P.: Protecting respondents identities in microdata release. TKDE 13(9), 1010–1027 (2001)

    Google Scholar 

  15. Stead, W., Bates, R., Byrd, J., Giuse, D., Miller, R., Shultz, E.: Case study: The vanderbilt university medical center information management architecture (2003)

    Google Scholar 

  16. Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS 10, 557–570 (2002)

    MathSciNet  MATH  Google Scholar 

  17. U.S. Department of Health and Human Services, Office for Civil Rights: Standards for protection of electronic health information; final rule (2003). Federal Register, 45 CFR: Pt. 164

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Gkoulalas-Divanis, A., Loukides, G. (2013). Re-identification of Clinical Data Through Diagnosis Information. In: Anonymization of Electronic Medical Records to Support Clinical Analysis. SpringerBriefs in Electrical and Computer Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5668-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-5668-1_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-5667-4

  • Online ISBN: 978-1-4614-5668-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics