Elsevier

Computers & Security

Volume 97, October 2020, 101939
Computers & Security

Efficient determination of equivalence for encrypted data

https://doi.org/10.1016/j.cose.2020.101939Get rights and content

Highlights

  • This work provides an alternative approach to record linkage that may be preferred in many settings.

  • The approach is less computationally intensive than previous ones.

  • Precomputation significantly reduces the computational load during matching.

  • The integrity of the security model is also fully preserved.

Abstract

Secure computation of equivalence has fundamental application in many different areas, including healthcare. We study this problem in the context of matching an individual's identity to link medical records across systems under the socialist millionaires’ problem: Two millionaires wish to determine if their fortunes are equal without disclosing their net worth (Boudot, et al. 2001). In Theorem 2, we show that when a "greater than" algorithm is carried out on a totally ordered set it is easy to achieve secure matching without additional rounds of communication. We present this efficient solution to assess equivalence using a set intersection algorithm designed for “greater than” computation and demonstrate its effectiveness on equivalence of arbitrary data values, as well as demonstrate how it meets regulatory criteria for risk of disclosure.

Section snippets

Introduction and related work

People may benefit from having their data stored by multiple parties combined. For example, a patient may want to share her medical records with a new provider so that her past medical history can inform her current medical care. Yet, sharing identifiers to match records may reveal private information about where the patient was treated and in some cases the nature of the patient's illness. To protect privacy, combining records must follow regulatory protocols. A common practice is to set up a

Problem statement and structural assumptions

We first present the problem statement and then the notation and structural assumptions made in the rest of this work.

Main result

We first review (Lin et al., 2005)’s result for secure computation of strict inequality (Theorem 1). We then show how it implies equivalence under repeated evaluation (Corollary 1), but is inefficient requiring 4 rounds. Finally, we develop a 2-round protocol for equality using this special coding in two rounds (Theorem 2). The following two definitions are a necessary prerequisite to the theorems:

Theorem 1

(Lin et al., 2005). Let x and y be positive integers. x > y if and only if the one-encoded set for

An illustrative example

For simplicity, we give an example using an abbreviated 3-bit binary string that generalizes to any string length, Theorem 2 together with the Paillier cryptosystem and an encryption table to identify matches. In this example, Alice and Bob each hold a case with the unique identifier ``5'' and would like to identify if this is a matching record.

For each binary position in the 2 × 3 table, Alice uses Paillier cryptosystem to encrypt the binary positions that match her string with the encrypted

Methods

We implement the algorithm described above with respect to Theorem 2. However, to reduce the number of comparisons, we implement a modified Merge Sort algorithm (Goldreich et al., 1987; McCool et al., 2012) that reduces a × b paired comparisons matrix to a + b comparisons to sort the data. The algorithm proceeds as follows:

  • (i)

    Alice and Bob each sort their data from least to greatest in separate arrays noting ties.

  • (ii)

    They compare least previously not compared values.

  • (iii)

    If Alice's least previously not

Protocol analysis

We first analyze the computation and communication cost of the proposed protocol and then analyze its security.

Discussion

We develop an approach to equivalence that may be useful for several applications. In addition to secure matching of buying and selling prices as well as authentication, the procedure may be used for identity matching in large data medical data sets. Here there is the advantage that Alice's encrypted computations may be done “off-line”. In other words, Alice may build and store her encrypted data at any time prior to the matching exercise. In addition, the approach offers some efficiency

Conclusion

In conclusion, efficient determination of equivalence is possible using a set-intersection approach. This may be particularly useful in matching identifiers, for example across health systems, because the method allows for considerable precomputation in preparation for future efforts to identify matches.

Funding

This work was supported by pSCANNER, which is supported by the Patient-Centered Outcomes Research Institute (PCORI), Contract CDRN-1306-04819, by the National Science Foundation under awards CNS-1564034 and the National Institutes of Health under awards P30AG024968, R33AG057395, R01GM118574 and R35GM134927. The content is solely the responsibility of the authors and does not necessarily represent the official views of the agencies funding the research.

CRediT authorship contribution statement

Jason N. Doctor: Conceptualization, Methodology, Writing - original draft, Formal analysis, Validation, Software, Investigation. Jaideep Vaidya: Conceptualization, Methodology, Writing - review & editing, Formal analysis. Xiaoqian Jiang: Writing - review & editing. Shuang Wang: Writing - review & editing. Lisa M. Schilling: Writing - review & editing. Toan Ong: Data curation, Writing - review & editing. Michael E. Matheny: Writing - review & editing. Lucila Ohno-Machado: Funding acquisition,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank Kayleigh Barnes for assistance in identifying the zero information leak function presented in Footnote 2.

Jason N. Doctor is Professor and Chair of the Department of Health Policy and Management at the University of Southern California's Price School of Public Policy. He also holds the Norman Topping Chair in Medicine and Public Policy and is the Director of Health Informatics at the USC Leonard D. Schaeffer Center for Health Policy & Economics. His research program centers on decision-making in healthcare and health informatics. Dr. Doctor specializes in behavioral economics and the use of choice

References (25)

  • O. Goldreich et al.

    How to play ANY mental game

  • R. Ihaka et al.

    R: a language for data analysis and graphics

    J. Comput. Graph. Stat.

    (1996)
  • Cited by (0)

    Jason N. Doctor is Professor and Chair of the Department of Health Policy and Management at the University of Southern California's Price School of Public Policy. He also holds the Norman Topping Chair in Medicine and Public Policy and is the Director of Health Informatics at the USC Leonard D. Schaeffer Center for Health Policy & Economics. His research program centers on decision-making in healthcare and health informatics. Dr. Doctor specializes in behavioral economics and the use of choice architecture to affect policy in health and medicine.

    Jaideep Vaidya is the RBS Dean's Research Professor (tenured full professor) in the MSIS Department at Rutgers University. He received the B.E. degree in Computer Engineering from the University of Mumbai, the M.S. and Ph.D. degree in Computer Science from Purdue University. His general area of research is in security, privacy, data mining, and data management. He has published over 150 technical papers in peer-reviewed journals and conference proceedings, and has received several best paper awards from the premier conferences in data mining, databases, digital government, security, and informatics. He has also received the NSF Career Award, the Rutgers Board of Trustees Fellowship for Excellence in Research, as well as the Dean's Meritorious Research Award.

    Xiaoqian Jiang is an assistant professor in the Department of Biomedical Informatics at the University of California San Diego. He received his PhD in Computer Science from Carnegie Mellon University. He is an associate editor of BMC Medical Informatics and Decision Making and serves as an editorial board member of Journal of American Medical Informatics Association. He works primarily in health data privacy and predictive models in biomedicine. Dr. Jiang is a recipient of NIH K99/R00 award and he won the distinguished paper award from American Medical Informatics Association Clinical Research Informatics (CRI) Summit in 2012 and 2013.

    Shuang Wang (S’08–M’12) received the B.S. degree in applied physics and the M.S. degree in biomedical engineering from the Dalian University of Technology, China, and the Ph.D. degree in electrical and computer engineering from the University of Oklahoma, OK, USA, in 2012. He was worked as a postdoc researcher with the Department of Biomedical Informatics (DBMI), University of California, San Diego (UCSD), CA, USA, 2012 - 2015. Currently, he is an assistant professor at the DBMI, UCSD. His research interests include machine learning, and healthcare data privacy/security. He has published more than 60 journal/conference papers, 1 book and 3 book chapters. He was awarded a NGHRI K99/R00 career grant.

    Lisa M. Schilling, MD, MSPH is a general internist and informaticists at the University of Colorado School of Medicine, where she is a co-director of the Data Science to Patient Value Initiative. She is a co-PI with pSCANNER Clinical Data Research Network.

    Toan C. Ong, PhD, is an assistant professor at the University of Colorado Anschutz Medical Campus. He has a PhD in Computer Science and Information Systems. He has been involved in large projects funded by AHRQ and PCORI such as SAFTINet, PEDSNet or PORTAL. He has extensive experience with record linkage methods including privacy preserving record linkage (PPRL) and record linkage workflows. Dr. Ong's other research interests include data harmonization, schema mapping, machine learning and natural language processing.

    Michael Matheny is an Associate Professor of Biomedical Informatics with secondary appointments in Medicine and Biostatistics at Vanderbilt University Medical Center. He is a board certified in Internal Medicine and Clinical Informatics. Since joining Vanderbilt University Medical Center faculty in 2007, Dr. Matheny has become a nationally recognized investigator in predictive analytics, machine learning, automated medical device surveillance, and natural language processing. His work focuses on studying how to best leverage large electronic health record data for discovery of risk prediction using both structured data and NLP derived data, as well as conducting automated adverse event surveillance in medical devices.

    Lucila Ohno-Machado, MD, MBA, PhD is Associate Dean for Informatics and Technology, and the founding chair of the Health System Department of Biomedical Informatics at UCSD. Dr. Ohno-Machado directs the patient-centered Scalable National Network for Effectiveness Research funded by PCORI (and previously AHRQ), a clinical data research network with over 24 million patients and 14 health systems, as well as the NIH/BD2K-funded Data Discovery Index Consortium. She was the director of the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, ‘anonymization,’ and Sharing) based at UCSD with collaborators in multiple institutions.

    Daniella Meeker, PhD is an Assistant Professor in the Department of Preventive Medicine and Director of the SC-CTSI Informatics Program. She leads the Keck-CHLA Research Data Warehousing Program and is the faculty sponsor of the Los Angeles Department of Health Services Informatics Core. She completed her PhD in Computation and Neural Systems at Caltech and Post-Doctoral training in Health Economics at the RAND Corporation.

    View full text