Ethical challenges related to biomedical data sharing in China: A systematic literature review

Background : With the advancement of information and communication technology, sharing biomedical data across organizations has become more feasible. However, a large part of China’s biomedical details still “silent” or “isolated”. Methods: To identify what ethical challenges are considered hinder biomedical data sharing, we performed a systematic literature review pertaining to biomedical data sharing in China. A total of 1919 publications were initially identied, 56 of which were included in the nal synthesis after full-text screening. We used the International Ethical Guidelines for Health-related Research Involving humans (CIMOS) to identify the related ethical principles and norms. Results: We observed an abundance of ethical challenges based upon the following ve overarching themes: capacity building; equitable distribution of benets and burdens; scientic and social values; the data subjects’ rights; public trust and engagement. Conclusion :Based on these analyses, we nd current sharing practice need to balance the protection of privacy and condentiality with the promotion of data sharing. We believe privacy concerns can be attenuated by different stakeholders’ responsible participation within a principled proportionate governance model.


Introduction
Sharing data is essential for responsible biomedical research practice, translation biomedical ndings into improved diagnostics, patient care and health service planning. Chinese State Council issued "Guiding Opinions on Promoting and Regulating the Development of Big Data Applications for Health Care" (Guiding Opinions) on June 21 2016. The Guiding Opinions declare that biomedical big data is a fundamental national resource and the using and sharing of biomedical big data has become a national priority. [1]Although the stakeholders have taken steps to promote biomedical data sharing, a large part of China's massive data cannot freely ow across research teams and borders and converted into "big data".
The evolving landscape of biomedical data also challenges the current governance frameworks for privacy protection. To provide some understanding of the current concerns about how to balance the biomedical data sharing and privacy protection, we conducted a literature review on research that has been conducted on these topics. It describes how biomedical data sharing in China tests current ethical principles.

Methods
We conducted a systematic review of academic, peer-reviewed literature that reported on ethical issues of data sharing in China. Our work ow consisted of the following working steps: Choosing the methodological approach Searching publications related to ethical issues of biomedical data sharing in China Selecting the retrieved documents by applying prede ned inclusion and exclusion criteria Abstracting the relevant arguments and synthesizing

Methodological approach
We seek to use the International Ethical Guidelines for Health-related Research Involving humans(CIMOS) to identify the related ethical principles and norms which includes ve items: capacity building, interoperability, scienti c and social value, the data subjects' rights, public trust and engagement. To present the results in a more reproducible way, we refrained from addressed each ethical problem we collected in detail, but rather summarizing the results according to dimensions of CIMOS.

Information retrieval
We searched four databases to nd related research articles. English articles were mainly searched from (1)PubMed,(2) EMBASE, and related Chinese articles are searched from (1)CNKI (2)WANFANG. Google scholar was searched for additional source, including grey literature. Table 2 presents the number of results returned using the search terms.
The search strings were developed by the rst author(XJL)in consultation with the second author(YLC).The nal search strategies were reviewed and validated by an independent librarian from Peking University Health Science Center The search was conducted in October 2019. We developed a search strategy including three main search terms: ("biomedical data sharing", OR "clinical data sharing") AND "China " AND "ethical challenges" OR "governance" within the databases that we used. Other relevant search terms pertaining to these terms were identi ed. . (See additional le 1) Articles were selected based on a set of inclusion and exclusion criteria. The identi ed articles were managed in an EndNote.

Selection and Data extraction
A ow diagram of the selection process is presented in gure 1.
Frist, the literature database search resulted in a total of 1919 unique records. Duplicates were removed manually based on title-abstract screening. A total of 788 articles were removed at this stage.
Second, full text screening was performed on the remaining 1131 articles.
For inclusion, publications were required to satisfy the following criteria. 1 The paper is peer-reviewed primary studies. There were no limits regarding the research area. 2 The paper is written and published within the past ten years(2010-2019) 3 The paper evaluates the ethical challenges related to biomedical data sharing in China. 4 The paper discusses the data sharing in the biomedical and health-care domains. 5 Must be published in English or Chinese. Exclusion criteria included studies do not focus on the sharing of biomedical data in China. For example, publications that were limited to IT infrastructures were not deemed relevant to our research. Ultimately, we included 56 articles for nal review. (table 1) Third, to make the ethical values and principles at stake more explicit, we decided to order the ethical challenges presented in the studies based upon CIMOS ethical guidelines as guidance. CIMOS ethical guidelines are regarded as universal and its virtues and protections are essential to reliably safeguard the rights and welfare of humans. Also, we would like to nd out what kind of challenges these value face in the context of Chinese culture. We read articles and highlight the relevant parts. A synthetic frame where ethical problems or challenges appear relevant to CIMOS are presented.The extracted ethical challenges were grouped together into a system of themes.
We followed the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Fig.1)

Results
Fifty-six articles met our inclusion criteria .The great majority of the included articles were published in Chinese(n=53).Our results reveal various ethical challenges related to the sharing of biomedical data in China . We created an alluvial diagram to specify the result. Due to the limited amount of data processed by RAWGraphs at one time, the results of documents 1-28 and 29-56 are shown in two graphs respectively. We create four different steps. The rst step is the number of literatures we collected. The second step is Year. In the case the size of each node is proportional to the number of rows that containing that speci c value, it reveals that a growing body of literature on the biomedical data in the time period of 2011-2019.The largest ow of related published literature is in 2018.The third one is Research Method, and contains four nodes(case study, document analysis, questionnaire survey, interview). The fourth one is Ethical challenges, and contains ve nodes, capacity building; interoperability: balancing different stakeholders' interests; data subjects' rights; scienti c and social value; public trust and engagement. The largest ow is capacity building and interoperability: balancing different stakeholders' interests ranked second. The ow among nodes in the four steps represent the number of lines sharing the same couple of values. In combination of two graphs, most documents in 2018 pay attention to capacity building. Case study are often presented in coordination with capacity building, while questionnaire surveys mainly focus on the data subjects' rights and public trust and engagement.
Thematic analysis (table 1) identi ed a number of ethical challenges in the ve overarching domains. These categories describe a landscape of challenges that interconnected and dynamic.

Capacity building
There is a wide spectrum of literature concern about the capacity building for sharing biomedical data. The challenge of capacity building was mainly discussed from the following three aspects: (1) isolated information island. Data was fragmented distributed in different institutions and data are often analyzed in institutional silos.
[2] [9][12] [25]For example, most hospitals in China use their own intranet respectively, which involves dozens of manufactures and each manufacturer has its own schemas(e.g.,data templates).[1] [5]There is no data sharing platform that can support nationwide data sharing. [18][23] [43] (2)Administrative sections' poor overall planning abilities. The biomedical data sharing projects are short of long-term and integral plans, resulting in poor expansibility and interactivity systems. Regulatory systems were not designed nor updated to foster widespread collaboration. It is suggested more investment in platforms that can standardize and curate data into the uniform formats are required for sharing data effectively. [50] (3) Insu cient supporting policy for the training programs of biomedical and health informatics.
Many biomedical scientists and physicians do not have any familiarity with the ideas of data science. There are not enough training programs for biomedical scientists to narrow the gap between needed and existing biomedical data science. skills. [12][20][41] [49] Interoperability: balancing different stakeholder' interest The nancial, organizational, technical structures and data resources vary considerably across regions. While some developed regions bene t from su cient nancial support and solid infrastructure, other less developed regions do not have a sustainable basis for the data sharing activities. Even they collect biomedical data,they do not have the skills and expertise to analysis. Their sharing activities may be put at risk by participation (eg.by deepen the gap). As such, both the distribution and the amount of bene ts are important. According to our review, data sharing activities in China are mainly driven by the following two forces. One is executive order driven. Under the requirements of govern agency, the data controllers are forced to share their data. The other is active data sharing. Data controllers actively took part in the sharing activates under the great potential of biomedical big data.
For the executive order driven data sharing activities, although government has issued a serious of policies, China is lack of speci c laws and ethical guidelines to balance the competing considerations. For example, some sources states that there is no law that clearly de nes the ownership of data and bene t sharing. The subject of responsibility is not clear either. [10][20] [48] The active data sharing activities also face challenges. Many articles report there are limited incentives for researchers and physicians to share data. They treat data as private property, and keep the data they obtained con dential and do not willing to share. [3][5][6] [9] Other studies point out the sharing activity cannot bring bene ts to the researchers in the short term. The bene t brought by data sharing is quite limited. And the sharing activity made their research more transparent and injure their interest. The data controllers also concern about the inherently unknowably variables of data sharing. Researchers may be reluctant to share data they spend years to collect because they worry about the rewards and recognition not be equitably shared within the corporators.

Social bene ts and value
Maximize public health bene ts and social value are the fundamental justi cation for data sharing. The typical argument is that collaboration and sharing allow for more effective analysis of the massive datasets and quicker translation of research ndings into clinical practice and government decision. According to the various studies we collected, however, the utility rate of the biomedical data is still too low .For instance, report that the utility of the data is low because the data quality and validity have not been established. Raw data large and di cult to integrated. [58] Patients' data has not been widely shared between different hospitals, which resulted in repeated examination and bring burden to the patients. [11][20][28] [43] Doctors are unable to analyze patient's conditions comprehensively and accurately because of the lack of the complete diagnosis and treatment history of patients. And these data have not been integrated into clinical diagnosis and treatment.
Biomedical data has not been integrated into public health services and infectious disease monitoring. [3] [8] [11] [20] The data subjects' rights In general, most included studies indicate that the data subject's right is lack of attention and there is a huge risk of data abuse. Some studies states that the reason of it is hard to protect data subject's right is because the government do not have speci c regulation towards data sharing. [22][24]Some sources state that the data subjects' right cannot be supported using the existing administrative and legal system. The data subjects do not have ownership of their data and it is easily gotten out of their control. The process of using this data is not transparent and the data subjects even do not have accessibility to their data. The laws tend not to be robust enough to adequately address ethical issues related to data sharing that matter to data subjects and researchers. [4][10] [12] [20] Many sources state that the management system cannot technically guarantee the security of data. Poor data protection system also limits the sharing scope. According to "Risk Investigation Report on Sensitive Data Leakage of Medical Internet Services" conducted by Tencent company, there are serious logical loopholes in the third-party medical service platforms accessed by domestic 3-A-Grade hospitals. These vulnerabilities can lead to the disclosure of patient information, including name, mobile phone number, ID number, home address, registration record, inspection report, hospital record, medical report, payment record, etc.The report pointed out that The sensitive ports of medical Internet assets are more open, and the core business assets are directly exposed to the outside world, reducing the technical barriers of illegal hacking and unauthorized access. [57] The current informed-consent procedures may have researched their limits with an increased datasharing demands. Institutions do not have a governance system to obtain authorization for future use of these data in research. Some research emphasize that the data subjects lack the sense of privacy protection. Patients do not consider informed consent and privacy protection as important. According to an interview with doctors in 52 hospitals in ve prefecture-level cities: Guangzhou, Dongguan, Shenzhen, Foshan, and Qingyuan, patients have poor health literacy and lack of judgment ability. [12] On the other hand, physicians do not treat the rights and welfare of individuals from whom the data were collected as important.
Public trust and engagement According to CIMOS, the potential participants and communities' participatory process do not only involve informed consent process, but also include the design, development, implementation, and monitoring of research and in the dissemination of the results. It appears that reliance on informed consent have failed to secure trustworthiness and public engagement. It is tightly connected to transparency, the accessibility of information about management and accountability. However, many studies point out that the data sharing process is not transparent and the data subjects cannot even obtain their clinical data. [10][25] [28] Ineffective governance system is also an important reason for data subjects' poor engagement. The governance structure does not clearly outline the responsibilities of involved data processors. (eg. whether the secondary use has met the intended purpose) and management agency acquiesces in the utilization of patients' biomedical data by third parties. [4] In a questionnaire survey of 397 domestic hospital patients, some indicated a preference to share biomedical data with their physicians and government agencies. They were convinced that the physicians and the relevant government departments might be related to "public". When referred to the biopharmaceutical companies, their willingness to share had dropped signi cantly. The patients were not willing to share their data with other hospitals, which might be due to a signi cant increase in risk as data ows to the hospital. Their preferred informed consent model was one-time authorization submission and con rmed it periodically. [44] IRB approval for biomedical data sharing is lagging. None articles we reviewed have mentioned the importance of review and approval procedures by an independent REC in the context of respect for individuals. The research ethics committees are considered as the gatekeepers and can limit the occurrence of unethical data sharing activities. Yet they have not been given to enough attention.

Scientific and Social value
The ethical justification for undertaking health-related research involving humans is its scientific and social value: the prospect of generating the knowleg4e and the means necessary to protect and promote people's health Patients have to repeat exanimate because biomedical data are not shared between hospitals, resulting in waste of time and money 1 Data utilization is low and has not used for medical decision making, prevention and monitoring 4 Low data utilization, not used for scientific research 1 Poor data quality misleading patients (1) Wearable devices data is idle and wasted. 2 Primary medical institutions cannot meet the needs of patients 3 When data are stored, institutions must have a governance system to obtain authorization for future use of these data in research Management system cannot guarantee the security of data 1 The existing legal system cannot effectively protect the rights of data owners 7 The government acquiesces in the Page 10/19 The data subjects' rights development and utilization of patient data by third parties (1) Patients do not have ownership of their biomedical data, the data is out of the control 3 Excessive collection of data (1) Government departments have not made clear provisions on residents' health information sharing 2 Custodians of the data must arrange to protect the confidentiality of the information linked to the data,by sharing only anonymized or coded data with researchers, and limiting access to the material of third parities Information theft and data tampering (1) Doctors have little awareness of protecting patients' privacy (1) Data subjects' privacy is at risk 4 Technology cannot guarantee the security of data and faces privacy risks 3 Patients have little awareness of privacy 4 Seek and obtain consent, but only after providing relevant information and ascertaining that the potential participant has adequate understanding of the material facts Poor interoperability, asymmetric doctor-patient information (3) Citizens have low health literacy and lack of ability to safeguard their rights 3 The controllers share data without obtaining informed consent from the data subjects (4) Public trust and engagement researchers, sponsors, health authorities and relevant institutions should engage in a meaningful. That involves them in an early and sustained manner in the design, development, implementation, design of the informed consent process and monitoring of research It is still in the early stage of development and cannot guarantee the trust and benefit of data subjects (4) Patients are reluctant to share their data 1 The data is not transparent, and the data subjects cannot obtain their clinical data (1)

Strengthening research ethics review and oversight capacity in host communities
It is difficult for data subjects to protect their rights and find responsible parties 2 One-way information sharing, lack of interaction with data subjects 2

Discussion
This systematic review of the academic literature provides an overview of the current ethical challenges related to biomedical data sharing. As we nd out, China is now in early stage of biomedical data sharing. Facing the technical challenges of multi-standards multi-party collection, multi-party use and lack of effective integration method. Patients' diagnosis and treatment records are fragmented and distributed in various medical institutions, but lack of high-quality data management and descriptors. Data governance structures have not yet caught up to the pace of policy, thus many established policies are not very well performed. At present, China's medical data mainly relies on different scales of platforms, and due to the lack of corresponding operational mechanisms and incentive mechanisms, medical and health institutions cannot deliver data to the platform on time. Data quality seems not high either and hard to become a scienti c decision-making basis.
The uneven development of information construction has brought di culties for data integration and sharing. For the uneven distribution of medical data resources, there is a risk of widening the gap between the low-resource and high resource areas. Accessibility is enhanced through harmonization of data access conditions and procedures and by communicating these to stakeholders. The enthusiasm for data sharing between medical institutions is insu cient, and there is no relevant constraint mechanism and incentive mechanism. Advancing data sharing requires the coordination of needs, responsibilities and obligations of stakeholders such as governments, medical and health institutions, and enterprises.
Although expanding data sharing has become the main demand of stakeholders, it has also met challenges of privacy protection. Many surveys found that data subjects did not willing to share their data because of privacy considerations. [6] [14] Technologies and models currently exist in China facilitated dissemination of data without compromising privacy. The sensitive nature of biomedical data indicated that emphasis should be given to the data subjects' right protection to help motivate data subjects to participate. It relies on stakeholder-informed principles and policies that ensure the needs and concerns of the data subjects are addressed. According to our review, the attention of the public and the government on issues surrounding the safety of personal information are not considered enough.
The complex interactions between these ethical challenges can severely limit the effectiveness of data sharing promotion policies. So how can we work toward a data sharing system that is cognizant of the need to incentivize data producers to share, while address data subject concerns about privacy? It requires an ethics and regulatory framework that fosters collaborative, open data sharing to balance between data sharing and privacy protection.
To overcome these obstacles, a number of innovative solutions need to be proposed. At the technical level, data standardization and standardization should be strengthened, and data subject rights authentication should be performed on data sharing objects. By aggregating and analyzing clinical data, the clinicians can develop earlier and more targeted treatment strategies for their patients. More investment is needed in platforms that can standardize and curate data into usable formats.
Privacy rights would necessitate that prospective contributors who are directly associated with the data sharing. Rather than being passive providers of clinical information, the general public need to have the opportunity to become more active partners in the data sharing and research process. The government agency should pursue a balance between privacy protection and bene t sharing in accord with respect for internationally recognized fundamental values. It is necessary to clearly share the responsibilities and interests between departments in the process of data collection, storage, transmission and use, and achieve orderly sharing of data. It stipulates that the shared subject is responsible for protecting the privacy of the data subject, so that the scope of use of the data can be controlled. The participation of different stakeholders can better address the data sharing obstacles identi ed in the previous section. Data sharing is not in opposition to privacy and should be conducted in a responsible way that does not infringe on the individual's privacy rights.
Respect for data subjects right is represented by an abundance of principles, rules and recommendations. In the protection of the rights of data subjects, the process of informed consent is intending to protect the unauthorized usage of data subject's data in ways that are unknown. At the same time, data subjects ' awareness of privacy is enhanced if they participate in the whole process of sharing. This implies a promotion of autonomy-to be self-governing, make decisions as well as with her values in data sharing.
Enhanced data subjects' autonomy is associated with empowerment. China's privacy protection regime's broad contours are beginning to take shape in the recent years. So far, the data protection system consists of the cybersecurity law, and a handful of accompanying measures. China's cybersecurity law, which came into effort on June 2017, built a set of data protection provisions applied to personal data collected over internet. It has laid out broad principles related to personal information and important data protection, however, the implementation issues left unresolved. Therefore, the corresponding supervision department shall formulate speci c implementation rules to ll in the gaps. The administers agency should develop and endorse solid sharing plans and train medical informatics talents.

Limitations
Limitations of this review include the fact that the search strategy was limited to some databases. We also acknowledge the potential bias of selecting only journal articles, but excluded for example book chapters. However, we assume that we have captured the main ethical arguments, since our results were highly saturated.All the processes, from search methodology to data collection, exclusion and analysis, followed reproducible procedures that were explicitly noted.

Conclusion
This literature review demonstrates that there are relatively few studies focusing on ethics in connection with biomedical data sharing. This review was also limited to a selection of peer-reviewed literature on the topic of biomedical data sharing. To ensure ethical and responsible data sharing and maximize the use of available data,it need to encourage engagement from researchers, physicians and data subjects. PRISMA2009checklist.doc Searchstrategyimplementedinsearchstring.docx