Secure Sharing Scheme of Sensitive Data in the Precision Medicine System

: Numerous industries, especially the medical industry, are likely to exhibit significant developments in the future. Ever since the announcement of the precision medicine initiative by the United States in 2015, interest in the field has considerably increased. The techniques of precision medicine are employed to provide optimal treatment and medical services to patients, in addition to the prevention and management of diseases via the collection and analysis of big data related to their individual genetic characteristics, occupation, living environment, and dietary habits. As this involves the accumulation and utilization of sensitive information, such as patient history, DNA, and personal details, its implementation is difficult if the data are inaccurate, exposed, or forged, and there is also a concern for privacy, as massive amount of data are collected; hence, ensuring the security of information is essential. Therefore, it is necessary to develop methods of securely sharing sensitive data for the establishment of a precision medicine system. An authentication and data sharing scheme is presented in this study on the basis of an analysis of sensitive data. The proposed scheme securely shares sensitive data of each entity in the precision medicine system according to its architecture and data flow.


Introduction
Owing to recent advancements in internet technology, all objects are connected through the Internet of Things (IoT) in various industries, leading to the evolution of increasingly intelligent societies. Numerous industries have emerged, of which the medical domain is of great significance; therefore, various studies are being conducted in this field [Liu, Li, Qu et al. (2017)]. In particular, subsequent to the announcement of the precision medicine initiative (PMI) by the United States in 2015 [Hudson, Lifton and Patrick-Lake (2015)], the global interest in precision medicine has increased considerably. In the medical field, genetic, clinical, and lifestyle data are required to implement precision medicine. Thus, sensitive data, such as patient history, deoxyribonucleic acid (DNA), and personal information, are considered to be key data. The success of efforts to collect a broad spectrum of patient information depends on broad public support and willingness to participate [Sankar and Parker (2017)]. This is recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support. However, also challenges relating to the collection, storage, and re-use of research data [Xafis and Labude (2019)]. In the near future, much larger volumes and complex datasets for precision medicine will be generated [Qian, Zhu and Hoshida (2019)]. However, publishing data may divulge individual sensitive data. Currently many existing privacy protection schemes cannot provide the balance of utility and protection. Accordingly, research on measures to protect privacy in various fields is ongoing [Gu, Yang and Yin (2018); He, Zeng, Xie et al. (2017); Min, Yang, Wang et al. (2019); Yin, Shi, Sun et al. (2019)]. This is where the discussion of privacy and techniques often ends in the scientific health literature when internet-related technologies have made privacy a much more complex challenge with broad psychological and clinical implications [Aboujaoude (2019)]. Especially, an individual's genetic data forms the bedrock of precision medicine [Beauvais and Knoppers (2020)]. This is recognized as sensitive for multiple social reasons, raising concerns about privacy and questions about best practices for governance of personal genomics data access [Rubin and Glusman (2019)]. Also, technological advances require collecting and sharing the massive amount of data and thus generate concerns about privacy [Noorbakhsh-Sabet, Zand, Zhang et al. (2019)]. In addition, patient health data are often found spread across various sources. But, precision medicine and personalized care requires access to the complete medical records [Chen, Jiang, Wang et al. (2018)]. Precision medicine data storage requirements are ever increasing and long-term data protection schemes become more complex. The assurance of sensitive data integrity has almost not been discussed yet. Sensitive data needs to be secured against loss and forging [Buchmannm, Geihs, Hamacher et al. (2019)]. So, data protection and privacy law are key determinants in precision medicine's future [Beauvais and Knoppers (2020)]. Cloud computing with protected patient privacy would become more routine analytic practice to fill the gaps within data integration along with the advent of big data. Integration of multitudes of data generated for each individual along with techniques tailored for big data analytics may eventually enable us to achieve precision medicine [Qian, Zhu and Hoshida (2019)]. Thus, to implement precision medicine, studies must be conducted on the development of security techniques to protect privacy and share sensitive data. In this paper, techniques are presented, based on related works, for securely sharing sensitive data among the entities participating in the precision medicine system. Section 2 presents the definition of precision medicine and an analysis of the infringement threats to sensitive data that can occur in a precision medicine system. The corresponding security requirements are also outlined. This section also describes a technique for applying the keyless signature infrastructure (KSI) that is used to reduce the workload when sharing sensitive data that is considered to be big data. Section 3 presents an analysis of the sensitive data defined in the PMI of the United States. Section 4 details the scheme for securely sharing sensitive data, and Section 5 presents the conclusions of this study.
(AI), IoT, and cloud techniques, for storage, processing, and analysis. They require a system that provides a platform for storing and managing cloud-based genomic analysis data and enables data mining techniques to obtain meaningful results. Such a precision medicine system can be configured through entities including a cohort that is a patient specific group that donates the data, a data controller that collects the data, and a data processor that processes the collected data. The collected or processed data significantly vary between the different entities. They include omics/diagnostic data, clinical data, such as electronic medical record (EMR)/electronic health record (EHR), drug compliance, personal diet, wearable sensor data, environmental data, and information regarding personal preferences. These sensitive data can be classified into four types, namely healthcare, genetic, lifelog, and privacy data. Therefore, in contrast to the existing medical methods, precision medicine involves the collection and analysis of data using Information and Communication Technology (ICT). Sensitive personal information, such as genetic data, entails significant risk and potential ripple effects if disclosed to the public or misused. Moreover, an infringement of these data can result in penalties as stipulated in the patient privacy protection regulations of the Health Insurance Portability and Accountability Act (HIPAA), General Data Protection Regulation (GDPR), or Personal Information Protection Act (PIPA) [Klonoff and Price (2017)]. Owing to the economic value of the personal and medical information generated in the healthcare sector and the high profitability of cyber-attacks against medical institutions, cyber security infringements, such as Distributed Denial of Service (DDoS) attacks and ransomware, are steadily increasing. User anxiety is also increasing owing to the handling of sensitive information such as patients' private data and disease and genetic information in cloud environments. Security can thus be viewed as a key factor in the implementation of precision medicine. Therefore, it is necessary to develop methods for securely sharing sensitive data to establish a precision medicine system.

Analysis of infringement threats to sensitive data and security requirements for a precision medicine system
This section presents an analysis of the infringement threats that may occur when sharing sensitive data that are collected and processed in a precision medicine system. The security requirements for preventing these threats are also presented.

Sensitive data infringement threats
• Threat 1: Data exposure In a precision medicine system, when sharing data between entities, data may be exposed owing to data eavesdropping in sniffing attacks. Privacy may be compromised in such cases. • Threat 2: Data forgery and falsification Data may be forged or falsified due to a change in data in spoofing attacks or factors such as network errors. In such a situation, the results obtained through big data analysis may vary, and difficulties may be encountered in providing precision medicine services. • Threat 3: Unauthorized entities To provide precision medicine services, accurate data based on facts are required. If unknown or unreliable data are collected and processed by an unauthorized entity, the authenticity of the services may be compromised. • Threat 4: Replay attack If the same data are repeatedly collected owing to data retransmission, it may become difficult to process the collected data to establish accurate statistics and devise a classification system for the diseases. • Threat 5: Repudiation After the data are sent or received in the process of data sharing, repudiation on sending or receiving data may occur. This may result in difficulties in providing precision medicine services.

Security requirements
• Requirement 1: Data confidentiality guarantee To prevent the exposure of data during data sharing, an encryption technique that facilitates the sharing of data in the form of cipher text must be implemented. During this step, data confidentiality must be guaranteed by configuring a method for encrypted communication with entities that are authenticated. • Requirement 2: Data integrity verification To prevent data forgery or falsification, data integrity must be verified-the results obtained before and after sharing the data must be compared using a hash function. Recently, techniques such as blockchain and KSI have been used for data integrity verification. • Requirement 3: Entity authentication To prevent the sharing and processing of data by unauthorized entities, a mutual authentication technique must be implemented on the entities that are configured in a precision medicine system. • Requirement 4: Data validity verification To prevent data replay attacks, data must be validated by applying techniques such as the use of a sequence number or time stamp on the data transfer protocol. • Requirement 5: Nonrepudiation To prevent repudiation of data sharing, a digital signature technique must be applied in the data sharing process. In addition, as the sensitive data used in a precision medicine system is considered to be big data, a method of securely and efficiently processing these data must be developed. The KSI technique can reduce the data processing load by reducing the cryptographic computations for protecting the data.

Keyless signature infrastructure
According to a recent survey conducted among the top management teams of medical institutions, approximately 89% of the respondents stated that data integrity is an important issue in precision medicine in terms of decision-making regarding data utilization when analyzing sensitive data on a big-data scale [Safavi and Kalis (2018)].
Thus, the data integrity assurance technique can be considered a key technique in providing precision medicine services. This section presents an analysis of KSI as a data integrity assurance technique [Buldas, Kroonmaa and Laanoja (2013)]. KSI was developed by Guardtime, which is a security company based in Estonia. It can be used as a replacement for the existing public key infrastructure (PKI) signatures. The term "keyless" in KSI does not mean that no cryptographic keys are required during the signature generation process; instead, it means that keys are still required for authentication, but the signatures can be reliably generated and verified without assuming the continued secrecy of keys. Keyless signatures perform signer identification and integrity protection separately and are implemented as multisignatures. Therefore, KSIbased research is being conducted in various fields [Ra and Lee (2018); Mylrea, Gourisetti, Bishop et al. (2018)]. The signing process for the data is detailed below.
Step 1 Hashing: The data to be signed are hashed, and the hash values are used to represent the data in the rest of the process.
Step 2 Aggregation: A global temporary per-round hash tree is generated to represent all the data signed during one round.
Step 3 Publication: The root hash values of the aggregation trees from each round are collected into a perpetual hash tree, known as a hash calendar, and the root hash value of this tree is published as a trust anchor. An infrastructure is established to implement such signature processes in practice. It consists of a hierarchy of aggregation servers that generate hash trees every round through collaboration. It comprises an aggregation network, a core cluster, and a gateway. • Aggregation network: An aggregator is a system component that creates hash trees from the received requests and passes the root hash values to upstream aggregators. Further, upon receiving a response, the aggregator delivers the response to the child aggregators. As each aggregator has its own reserved spot in the hash tree, the servers involved in the creation of a specific signature token can be proved. • Core cluster: The core cluster comprises top-level aggregators from each round. It is responsible for creating the hash calendar and propagating and synchronizing it through the aggregation network. To verify the integrity of the root hash values of the calendar, they are archived and distributed to verification servers through the archiving and caching layers. In addition, the roots of the intermediate aggregation trees are stored only in relevant signature tokens. During this process, the gateways copy their calendars from the cache servers using the hypertext transfer protocol, which are then used for signature token verification. • Gateway: The gateway operates as a protocol adapter where it accepts the requests of an application and sends them to the designated aggregators. The first level of aggregation occurs at a gateway node. The gateway uses an extender service to validate the signature token. The process of using the KSI is illustrated in Fig. 1.

Figure 1: Keyless signature infrastructure
Step 1 The hash of the data to be signed is first computed by the application, and a request is sent to the gateway that provides services to the user.
Step 2 The received requests are aggregated during the period (round) by the gateway that received the request, and the aggregate request is sent to the upstream aggregation cluster to request the top hash value.
Step 3 The requests are aggregated through multiple layers of aggregators, and a globally unique top hash value is generated by the core cluster.
Step 4 The responses consisting of a verifiable hash tree path are sent back through the aggregation layer.
Step 5 The top hash values for each period are collected into the hash calendar archive layer and distributed through the calendar cache layer to the extender service, which is co-located with the gateway host.
Step 6 The application utilizes an extender service to verify the signatures.

Analysis of sensitive data in a precision medicine system
In this section, we analyze the sensitive data collected and processed by each entity of the precision medicine system as regulated by the PMI project of the NIH of the United States [Hudson, Lifton and Patrick-Lake (2015)]. In the analysis, the data used to provide precision medicine services were classified into core and subgroup data that referred to essential and subsidiary data, respectively.
• Individual demographics and contact information The individual demographics and contact information included twelve examples such as the participant's name, date of birth, gender, race, occupation, contact information, and income. These data were provided by the research participants and medical service providers and comprised core data that were collected and processed in the precision medicine system. • Terms of consent and personal preferences for participation in the project This information included details of the project participation options such as receiving the results of the research. These data were provided by the research participants and were core data.
• Self-reported measures These comprised self-reported measurement information that included six examples, namely pain scales, disease specific symptoms, functional capabilities, quality of life and well-being measures, gender identity, and family health history. These data were provided by the research participants and comprised core data as well as subgroup data necessary for specific research.
• Behavioral and lifestyle measures These included information regarding behavior and lifestyle, along with the six examples of diet, physical activity, alternative therapies, alcohol consumption, smoking, and assessment of risk factors. These data were obtained from the participants of prospective or retrospective research studies and medical service providers and comprised core as well as subgroup data.
• Sensor-based observations through phones, wearables, and home-based devices Sensor-based information obtained through mobile and home-based devices included the four examples of location, activity monitoring, cardiac rate and rhythm, and respiratory rate. These data were obtained using mobile device sensors and commercial biomonitoring services and comprised core as well as subgroup data. • Structured clinical data derived from EHRs Structured clinical data derived from EHRs included the four examples of international classification of diseases (ICD)/current procedural terminology (CPT) billing codes, clinical laboratory values, medication, and problem lists. These were core data obtained from several providers having the information regarding research participants or from direct or institutional management channels for personally uploaded or downloaded information by participants. • Unstructured and specialized types of clinical data derived from EHRs These included the three examples of narrative documents, images, and electrocardiogram/electroencephalogram data that were provided by multiple providers and not included in the core dataset. They were obtained by an integrated query and comprised subgroup data necessary for specific research. These types of data were only collected and processed in the precision medicine system. • PMI baseline health examination This included information related to three examples, namely vital signs, medication assessments, and past medical history, that was provided by the research participants interacting with the medical service providers, and it comprised core data.
• Healthcare claims data The healthcare claims data included three examples, namely periods of insurance coverage for the patients participating in the research project and the charges and associated billing codes as received by public and private payers. These data were provided by public and private payers and pharmacy insurance coverage management organizations, and they comprised core data. • Research-specific observations These included the four examples of research questionnaires, ecological momentary assessments, physical performance measures, and disease specific monitoring. They were provided by the research participants and research organizations, and they comprised subgroup data that were necessary only for a specific study.
• Biospecimen-derived laboratory data The biospecimen-derived laboratory data included eight examples such as genomics, proteomics, and cell-free DNA. These data were provided by the research participants, genetic information providers, and outsourced laboratories, and they comprised core data. • Geospatial and environmental data Geospatial and environmental data included seven examples, including weather, air quality, and food desserts. They were provided by the statistics of public and private information and comprised core as well as subgroup data.
• Other data Other data included information obtained through social networks that were based on the statistics of public and private information, and it comprised subgroup data.

Proposed scheme
In this section, we develop a scheme for securely sharing sensitive data in a precision medicine system. To define the system and sensitive data to which the data sharing scheme is applied, we first propose a system architecture and data flow to provide the precision medicine service. The sensitive data, required to be restructured according to this scheme, are collected and processed. Next, the data structure and contents of the PMI project are restructured into the four major categories defined in this study, namely healthcare, genetic, lifelog, and privacy data, and mapped to the flow of the established precision medicine system. We then propose a secure data sharing scheme using the KSI-based technique.

Precision medicine system architecture and data flow process
In this section, the precision medicine system is established, and a process of data flow is proposed, according to which, the restructured sensitive data are collected and processed.
Government has an important role in helping to fund primary research in precision medicine and precision public health, defining and optimizing measures of health care quality and security, and ensuring data privacy standards and protections, interoperability, and integration with surveillance systems. Government partnership and collaboration with the non-profit and private sectors can optimize precision medicine and precision public health for the benefit of global population [Whitsel, Wilbanks, Huffman et al. (2019)]. The proposed system consists of healthcare, genetic, lifelog, and privacy data, depending on the type of sensitive data used, based on the centralized precision medicine data center. Core techniques were applied for data management, processing, and security in the precision medicine system environment. The detailed definitions of the entities that comprise the precision medicine system are as follows: • Cloud-based precision medicine data center (C-PMDC) A cloud-based centralized data center collects and processes the sensitive data from entities that constitute the precision medicine system. The analyzed data obtained from the entities can be used to provide precision medicine services.
• Healthcare data area This area includes the cohort participating in the precision medicine projects and entities that provide and demand healthcare data.
• Cohort This comprises a group that shares characteristics related to a specific subject investigated in the precision medicine projects. The cohort receives medical treatment and prescriptions from healthcare service providers and fills out self-reported and behavioral data. It creates lifelog data and provides them to the entities included in the lifelog data area. • Healthcare service providers (HSPs) These include institutions such as hospitals and pharmacies that offer healthcare services. They provide clinical data obtained from the cohort through treatment and prescriptions ( , ) and the collected healthcare data such as self-reported and behavioral data ( _ ) to the cloud-based cohort data center. • Cloud-based cohort data center (C-CDC) This is a cloud-based data center that collects and manages data from the HSPs. Healthcare data related to cohorts provided by the HSPs are categorized according to specific subjects and the characteristics of each cohort. Therefore, for each cohort, the data center demands and uses clinical data ( , ), along with selfreported and behavioral data ( _ ) from the HSPs. Accordingly, the provided healthcare data are shared with the C-PMDC. In the United States, the NIH may assume this role. • National institution curation resources (NICR) It provides the collected healthcare data, including health examination data ( ) related to the cohort, to the C-PMDC. In the United States, the Food and Drug Administration (FDA) may assume this role.
• Insurance Institution (II) It provides the collected healthcare data, including claims data ( ) related to the cohort, to the C-PMDC. In the United States, the Center for Medicare and Medicaid Innovation (CMMI) and private insurance companies may adopt this role.
• Genetic data area This information consists of entities that provide genetic information.
• National institution genetic resources (NIGR) It provides genetic data ( ) related to the cohort to the precision medicine data center. In the United States, the National Cancer Institute (NCI) may assume this role.
• Lifelog data area This area comprises an entity that provides the lifelog data generated by cohorts and environmental factors. • Healthcare information technology (IT) companies These provide lifelog data, including sensor information ( ) generated from mobiles and wearable devices of individuals in the cohorts, to the C-PMDC.
• Geo-spatial data center (GDC) This provides lifelog data, including geo-spatial data ( ) generated by environmental factors, to the C-PMDC. In the United States, the National Weather Service (NWS) may assume this role. • Social network service (SNS) companies These provide lifelog data, including SNS data ( ) generated from social media accounts (such as Twitter and Facebook) of individuals in the cohorts, to the C-PMDC. • Data management and processing techniques There are six core techniques for managing and processing data that are required to establish a precision medicine system; these include the IoT, cloud, big data, mobile, genetic analysis, and AI. • Data security techniques The data security techniques that must be implemented in a precision medicine system include standards, information security, data security policy principles and frameworks, and privacy policy principles and frameworks as defined by the Office of National Coordinator (ONC) for health information technology. Through privacy-related techniques, sensitive data types related to personal information, collected and processed in a precision medicine system, are strictly regulated. Fig. 2 shows a type of sensitive data flow in a precision medicine system that is composed of defined entities.

Reestablishment of sensitive data of a precision medicine system
This section presents the reestablishment of the data structure and content of the PMI project into four categories, namely healthcare, genetic, lifelog, and privacy data. Fig. 3 illustrates the reestablished data for a precision medicine system. • Healthcare data Healthcare data are reestablished into clinical, self-reported and behavioral, PMI baseline health examination, and healthcare claims data.
• Clinical data Clinical data consist of a dataset, referred to as . They are reestablished into nine examples of EHRs, disease data, and medications. Another dataset, referred to as , is reestablished into two examples of oral statement and other images data. The data are provided by the HSPs and include the core data demanded by the C-CDC and subgroup data required for specific studies. • Self-reported and behavioral data These are reestablished into twelve examples, including pain scales, gender identity, alcohol, and smoking, and are referred to as − . The data are provided by the HSPs and comprise core data demanded by the C-CDC and subgroup data required for specific studies. . The data are provided by the NICR providers and comprise core data demanded by the C-PMDC. • Healthcare claims data Claims data comprise a dataset that is reestablished into four examples, including periods of coverage, charges as received by public and private payers, and associated billing codes; they are referred to as . The data are provided by the II and comprise core data demanded by the C-PMDC. • Genetic data Genetic data, referred to as , comprise a dataset that is reestablished into nine examples including DNA, proteomics, and histopathology. These data are provided by the NIGR and comprise core data demanded by the C-PMDC. • Lifelog data Lifelog data comprise a dataset that is reestablished into sensor data, geo-spatial data, and SNS data.
• Sensor data Sensor data, referred to as , comprise a dataset that is reestablished into two examples of location and physical activity monitoring. These data are provided by IT companies and comprise core data demanded by the C-PMDC and subgroup data necessary for specific research. • Geo-spatial data Geo-spatial data, referred to as , comprise a dataset that is reestablished into seven examples including weather, air quality, and food desserts. These data are provided by the GDC and comprise core data demanded by the C-PMDC and subgroup data necessary for specific research.
• SNS data SNS data, referred to as , are reestablished into five examples including location, emotional, and spirituality data. These data are provided by SNS and comprise subgroup data necessary only for specific research demanded by the C-PMDC. • Privacy data Privacy data comprise a dataset that is reestablished into demographics and consent data. The demographics data are reestablished into twelve examples including the name, contact details, occupation, and race of the patient. The consent data are reestablished into three examples including fine-grained consent for options to participate and receive the research results. In the case of privacy data, data providers and demanders are not specified because they are collected and processed by all entities constituting the precision medicine system. Privacy data protection follows the data protection technique specified by the standard authority for medical information techniques.

Proposal of scheme for secure sharing of sensitive data in a precision medicine system
In this section, a scheme is proposed for securely sharing sensitive data in a precision medicine system. The entities that constitute the scheme are listed in Tab. 1. The scheme consists of three phases, namely the registration phase, authentication phase, and data transfer phase.  Fig. 4 presents the proposed registration phase for performing authentication between the entities constituting the precision medicine system.

Figure 4: Registration phase in the scheme
Step 1: <Provider sends 1 to Demander> The Provider enters , and generates a random nonce with (•). Accordingly, ( || ) is calculated, and 1 = { , ( || )} , encrypted with the Demander's public key, is sent to the Demander, along with the Provider's identity .
Step 2: <Demander checks 1 and sends 2 to KSI server> The Demander obtains plain text 1 by running { { , ( || )}} and decrypting cipher text 1 received from the Provider with its private key. The identity of the Provider is then checked, and , which is used as the authentication value, is calculated, as shown below, using its own identity and secret value . = ( || || ) (2) Then, ( || ) , which is obtained by decrypting the cipher text of the Provider, and the authentication value are stored. Next, the Demander sends 2 = { , , }, which denotes the encryption of the Provider's identity , Demander's own identity , and the authentication value with the public key of KSI server, to the KSI server.
Step 3: <KSI server checks 2 , and sends 3 to Demander> The KSI server obtains plain text 2 by executing D PR KSI {E PU KSI {ID Pro , ID Dem , ρ}} and decrypting cipher text 2 sent from the Demander with its private key. The identity of Demander is then checked, and , which is used as the authentication value, is calculated using its own identity and secret value . , which is used as the authentication value, is calculated as shown below by using , which is obtained by decrypting the Demander's cipher text.
= ( || ) (5) = ⨁ (6) The calculated is then stored, and 3 = { , , } , which is the encryption of the Provider's identity , KSI's own identity , and the authentication value with the public key of the Demander, is sent to the Demander.
{ ∶ Step 4: <Demander checks 3 and sends 4 to Provider> The Demander obtains plain text 3 by executing { { , , }} and decrypting cipher text 3 sent from the KSI server with its private key. The identity of Provider, , and that of the KSI server, ID KSI , are then checked. Next, , which is used as the authentication value, is calculated as shown below using that is obtained by decrypting ( || ) stored in Step 2 and the KSI server's cipher text. = ( || ) ⨁ (9) Then, 4 = { , }, which denotes the encryption of the Demander's own identity ID Dem and authentication value with the public key of the Provider, is sent to the Provider.
{ ∶ Step 5: <Provider checks P 4 and the end of the registration phase> The Provider obtains plain text 4 by executing { { , }} and decrypting cipher text C 4 sent by the Demander with its private key. The Demander's identity ID Dem is then checked, and the authentication value φ is stored to end the registration phase. Fig. 5 presents the proposed authentication phase to perform authentication between the entities constituting the precision medicine system.

Figure 5: Authentication phase in the scheme
Step 1: <Provider sends ℎ message to Demander> The Provider sends its own identity ID Pro and a request message ℎ to the Demander to request for authentication.

� ∶ ∶ ′
Step 2: <Demander checks the identity and sends C 1 to Provider> The Demander checks the received Provider's identity ID Pro and sends 1 = { , }, which denotes the encryption of its identity and time stamp with the Provider's public key, to the Provider.
{ ∶ Step 3: <Provider checks 1 and sends 2 to Demander> The Provider obtains plain text P 1 by executing { { , }} and decrypting cipher text C 1 sent from the Demander with its private key. Then, the Demander's identity ID Dem is checked, and using ( || ), calculated in Step 1 of the registration phase, and time stamp value ′ , authentication value is calculated as follows: Then, 2 = { , , , ′ } , which denotes the encryption of the Provider's identity ID Pro , authentication value stored in Step 5 of the registration phase, authentication value ω, and time stamp value ′ calculated in this phase with the public key of the Demander, is sent to the Demander.
{ ∶ Step 4: <Demander checks P 2 and sends 3 to KSI server> The Demander obtains plain text P 2 by executing { { , , , ′ }} and decrypting cipher text C 2 sent from the Provider with its private key. Then, the Provider's identity ID Pro and validity of time stamp value ′ are checked. A search on ID Pro is performed to obtain ( || ) , which was stored in Step 2 of the registration phase. Using this, the verification value ′ is calculated as follows: The calculated ω′ and ω obtained by decrypting the Provider's cipher text are compared for verification. The verification value ′ is then calculated, as shown below, using the Provider's authentication value and ( || ) . ′ = ⨁ ( || ) (18) Then, 3 = { , , ′ , , } , which denotes the encryption of the Provider's identity ID Pro , Demander's identity ID Dem , calculated verification value ′, authentication value stored in Step 2 of the registration phase, and time stamp with the public key of the KSI Server, is sent to the KSI Server.
{ ∶ Step 5: <KSI server checks P 3 and sends 4 to Demander> The KSI server obtains plain text 3 by executing { { , , ′ , , }} and decrypting cipher text C 3 sent from the Demander with its private key. The Provider's identity ID Pro and Demander's identity ID Dem are then checked. Using the verification value ′ and authentication value ρ obtained through decryption, the verification value ′ is calculated as follows: { ∶ Step 7: <Provider checks P 5 and 6 , and the end of the authentication phase> The Provider obtains plain texts 5 and 6 by executing { ′ }} and decrypting cipher text 5 sent from the Demander with its private key. The Demander's identity ID Dem is then checked, and using the authentication value and verification value ′, the verification value ′′ is calculated as follows: ′′ = ⨁ ′ (26) Then, using ( || ) and the calculated verification value ′′, the verification value ′ is calculated as follows: ′ = ( || )⨁ ′′ (27) The calculated ′ and authentication value φ stored in Step 5 of the registration phase are compared for verification, and the authentication phase is ended. Fig. 6 presents the proposed data transfer phase required to authenticate the entities that form a precision medicine system. The involved procedure is detailed below.

Data transfer phase
Step 1: The KSI server access is provided to the Demander and Provider. In addition, the Demander access is provided to the Provider.
Step 2: <Provider sends _ to KSI server> For the KSI-based signature required to transfer the data value , the Provider uses a secret key for the personal signature and KSI to compute the following: The computed KSI-based signature values { _ ( ( ))} are then sent to the KSI server. Step 4 and the token value _ ( (( ))′ received from the KSI server in Step 6. It then verifies the data value using the signature value ( ) and token value _ ( (( ))′ . After a valid verification, the data value is stored, and the data transfer phase is terminated.

Security analysis
In this section, the analysis conducted in Section 2.2 to determine whether the proposed scheme meets the security requirements for each infringement threat to the sensitive data of the precision medicine system is verified. • Data exposure (Threat 1)-Data confidentiality guarantee (Requirement 1) The data sent or received between the entities during the authentication or data-transfer phase can be exposed through a sniffing attack by an attacker. To prevent such attacks, the proposed scheme applies public key cryptography that uses { , } on each instance of data sent or received between entities during the authentication or datatransfer phase. Hence, the data sent or received represent a ciphertext { = { }} . As only the valid entities encrypt and decrypt data for the authentication or data-transfer phase, data confidentiality is guaranteed. • Data forgery and falsification (Threat 2)-Data integrity verification (Requirement 2) Data can be forged or falsified by a forgery or falsification attack on the data sent and received between entities during the data-transfer phase or owing to errors that occur during the data communication process. This leads to reliability issues with respect to the data collected from the precision medicine system. As a preventive measure, the proposed scheme utilizes authentication values { , , , , } during the registration and } is performed according to the Merkle hash tree. When the obtained in this process is verified to be valid, the integrity of signature value is also verified. Therefore, data forgery or falsification threats can be prevented by verifying the integrity of the data { } sent by the provider. • Unauthorized entities (Threat 3) -Entity authentication (Requirement 3) If data are shared by an unauthorized entity, reliability issues arise with respect to the data collected from the precision medicine system and infringement of the availability of precision medicine services. As a preventive measure, the proposed scheme performs mutual authentication in the registration and authentication phases. During the registration phase, an entity is registered using the authentication values { } generated with nonce values { } that are known only to the provider, authentication values { } generated by secret values { } that are known only to the demander, and secret values { } that are known only to the KSI server. Finally, the provider stores the authentication values { } generated with the initial authentication values during the registration phase. During the authentication phase, each entity is authenticated. Comparing the final authentication values { } stored during the registration phase and those values { ′} that were mathematically computed using other authentication values { } generated during the authentication phase, a mutual authentication is carried out. Thus, threats from an unauthorized entity can be prevented. • Replay attack (Threat 4)-Data-validity verification (Requirement 4) By launching a replay attack on the data sent and received during the authentication or data-transfer phase, an attacker can threaten the authentication availability or make the precision medicine system collect data that are duplicated unnecessarily. Assuming that an attacker intercepts the ciphertexts sent or received between entities, the proposed scheme applies time stamps { } on the data shared between entities to carry out validation of data { ′ ≤ ∆ }. Accordingly, the replay attack can be prevented if the time stamps are not valid. In the data-transfer phase, the data can be protected from replay attacks using the time stamp values {Time: day of the week, month, day, HH:MM:SS, standard time, year}, aggregated or published during the data signature process via the KSI-based technique. • Repudiation (Threat 5)-Nonrepudiation (Requirement 5) In this situation, the provider, participating in the precision medicine system, or the demander may deny a history of sharing data. To prevent repudiation, the provider performs { ( ) ← ( ( ))} to generate signature values on the data using private keys { } during the data-transfer phase. The signature values { _ ( ( )} are then registered on the KSI server to use the KSIbased technique, and the corresponding signature token values { _ ( ( ))} are received from the KSI server. By sending the signature and token values with the data, non-repudiation is achieved against the provider. Upon receiving the data from the provider, the demander receives the signature token values { _ ( ( ))′} that correspond to the data from the KSI server. Then, the demander performs integrity verification (Requirement 2). As the KSI server owns a log that keeps track of the data receipts at this point, non-repudiation is achieved against the demander as well.

Conclusions
In precision medicine, big data related to advanced science and medical services are incorporated into existing medical techniques to establish treatment objectives. This process is followed by precise targeted therapy. As the key data in this case include sensitive data, such as patient history, DNA, and personal information, data security must be ensured during the process of sharing. Therefore, in this study, possible infringement threats to sensitive data in a precise medicine system were outlined, and security requirements were established. Additionally, the sensitive data in the existing PMI were categorized and reestablished according to the proposed architecture of the precision medicine system and data flow. A scheme for securely sharing sensitive data was proposed, and security analyses were performed for the various infringement threats. The sensitive data used in precision medicine are considered to be big data; therefore, to reduce the workload of sharing sensitive data in such a system, a KSI-based technique was implemented to reduce the cryptographic computations while processing the data. The results of this study are expected to help in determining a secure and more efficient method of sharing sensitive data when establishing a precision medicine environment.