Community-based Health Data Cooperatives Towards Improving the Immigrant Community Health: A Scoping Review to Inform Policy and Practice

Abstract Background In the case of immigrant health and wellness, data are the key limiting factor, where comprehensive national knowledge on immigrant health and health service utilisation is limited. New data and data silos are an inherent response to the increase in technology in the collection and storage of data. The Health Data Cooperative (HDC) model allows members to contribute, store, and manage their health-related information, and members are the rightful data owners and decision-makers to data sharing (e g. research communities, commercial entities, government bodies). Objective This review attempts to scope the literature on HDC and fulfill the following objectives: 1) identify and describe the type of literature that is available on the HDC model; 2) describe the key themes related to HDCs; and 3) describe the benefits and challenges related to the HDC model. Methods We conducted a scoping review using the five-stage framework outlined by Arskey and O’Malley to systematically map literature on HDCs using two search streams: 1) a database and grey literature search; and 2) an internet search. We included all English records that discussed health data cooperative and related key terms. We used a thematic analysis to collate information into comprehensive themes. Results Through a comprehensive screening process, we found 22 database and grey literature records, and 13 Internet search records. Three major themes that are important to stakeholders include data ownership, data security, and data flow and infrastructure. Conclusions The results of this study are an informative first step to the study of the HDC model, or an establishment of a HDC in immigrant communities. Key words community health, health data, cooperative, and citizen data empowermen


Introduction
Canada is becoming an increasingly multicultural society, welcoming 250,000 immigrants each year, and composing 20% of the Canadian population [1]. Immigrant communities add to the economic, labor, and cultural diversity of Canada. An immigrant, as defined by Statistics Canada, is any person re-siding in Canada who was born outside of the country, excluding temporary foreign workers, Canadian citizens born outside Canada, and those with student working visas [2]. Comprehensive national knowledge on immigrant health and health service utilisation is limited to allow for meaningful comparisons either within immigrant sub-groups or between immigrants and the native-born population [3]. Key knowledge gaps in immi-grant health include long-term health outcomes, preventable conditions, and chronic disease outcomes, especially amongst subgroups such as refugee and non-European immigrants [4]. Immigrants comprise relatively small portions of the population given that approximately 18% of the Canadian population were born outside of the country. However, meaningful comparisons and generalisable results require large-scale studies that sample an effective number of immigrants, which is often not done with studies using statistics Canada data sets of national population health surveys. This includes the National Population Health Survey (NPHS), Canadian Community Health Survey (CCHS), the National Longitudinal Survey of Children and Youth (NLSCY), the General Social Survey (GSS) [5,6].
In the case of immigrant health, data is the key limiting factor, where comprehensive national knowledge on immigrant health and health service utilisation is limited. There has been a push to increase available immigrant data through linking Canadian immigrant databases to administrative databases in order to explore patterns of mortality, cancer incidence, hospitalizations, physician claims, prescription medication use, and socio-demographic factors of health [4]. However, there is a lack of immigrant data to carry out meaningful comparisons either within immigrant sub-groups or between immigrants and the native-born population. Indeed, to understand how the process of immigration and acculturation affect health, data specifically on immigrant are needed. Further, immigrant groups have culturally based understandings of health and illness, or differences in definition of appropriate ways to treat illness. Health data that is sensitive to these differences would allow for a deeper understanding of health in this group, and may prove fruitful for public health research, policy, and tailored treatments [7].
'Health Data Cooperatives': Are they the solution?
The Health data co-operative (HDC) model is that of a health data bank that is established as an approach to make health data available to society. The cooperative members can contribute to, store, and manage their health-related information and are the rightful data owners and decision-makers to data sharing (e g. research communities, commercial entities, government bodies). Individuals and the community at large may profit from the value of the data both financially and/or through in-kind contributions by providing access to the data to health-related establishments, including research organisations and government services. Combining the HDC members' demographic, socioeconomic and lifestyle information through a community owned HDC is expected to create tremendous potential for community mobilisation and empowerment [8].
Further, new data and data silos are an inherent response to the increase in technology. For example, gene sequencing gives us access to personal information that may inform the effectiveness of drugs, and health risk factors to certain disease. Further, with the rise of smartphones, there are more than 40,000 health applications that allow for longitudinal monitoring of health without a visit to the doctor's office. These examples, along with physician offices, hospitals, and laboratories create data silos, where the data becomes isolated in a physically, judicially, or intellectual defined area, and is un-available for integration across multiple platforms. This result is a lack of autonomy and empowerment in the control of data by citizens, and unnecessary costs by the healthcare systems due to the inaccessibility of this data for analysis and research. This can be especially useful for under-represented population such as immigrants [9].
A health data cooperative is a collective where health related data are integrated, stored, used, and shared under the control of the cooperative members. The brackets in the following examples simply show the type of data a potential cooperative can host, along with potential stakeholders who can use the data. Most HDCs follow a similar structure that starts with 1) members who are any persons who wish to join a cooperative and contribute their data (e.g. health condition, lab results, hospital admission information, 23andMe, data generated from apps, doctor offices, clinic visits); 2) data storage safe space; 3) data sharing (e.g. authorities, doctors, research, patient group) [10]. The HDC model has been common in Europe, with the establishment of the Federation of National HDCs. This federation shares a common IT structure and common data storage. Further, the federation benefits from political support from Swiss stakeholders and truly encapsulates all that HDCs stand for, including being citizen-owned and citizen-centered and secure in storage and management of data [10].

Research Objectives
The need for immigrant groups to contribute to an HDC model is an important target for public health research. The integration of and access to immigrant specific health data can empower individuals to better manage their health, help contribute to the care of family and friends, help professionals commission the most effective and efficient interventions to benefit the community, allow researchers to develop next generation medicine, and enable the innovation of health technology [10].
Given the importance of data in immigrant health research, it is imperative to scope the literature to understand the complexities of HDC at the national and international level. To systematically map the available literature on the topic of HDCs, the current scoping review will aim to: 1) identify and describe the type of literature that is available on the HDC model; 2) describe the key themes related to HDCs; and 3) describe the benefits and challenges of the HDC model. This scoping review will serve to inform the next stages of research and community engagement to develop a community-based health data cooperative in immigrant communities in Canada. To our knowledge, this is the first review undertaken to synthesize HDC literature.

Methods
We conducted a scoping review to systematically map and synthesize the information available on HDCs utilising the reporting guidelines of the PRISMA extension for scoping reviews (PRISMA-ScR) [11] (Supplementary Appendix 1). Further, we followed the framework outlined by Arksey and O'Malley in their methodological paper on scoping reviews [12]. In their paper, the authors discuss the purpose of scoping reviews as an attempt to examine the extent, range, and nature of the research question, without describing the findings of the studies in detail or doing an assessment of quality. The same approach is taken with this review, using the Arksey and O'Malley fivestage framework.

Stage 1. Identifying the research question
An effective scoping review requires a research question situated within a specific area, but it must also remain broad so as not to exclude potentially useful literature. For this review, we posed the following non-limiting questions: 1) what are the major themes associated with HDCs; and 2) what are the benefits and challenges related to the HDC model.

Stage 2. Identifying relevant studies
Academic published literature searches were conducted using bibliographic databases presented in Table 1 to identify relevant records. An experienced librarian (MV), oversaw the development and execution of the database search strategies, which included a predefined list of keywords and medical subject heading (MeSH) terms (see Table 1) (see Supplementary Appendix 2 for MEDLINE (Ovid) search strategy). For grey literature, our search strategy included electronic institutional repositories, Canadian and international professional and government websites, online literature sites, and a manual review of reference lists of relevant publications (see Table 2).
We also conducted a literature search using the three most popular search engines, Google [13], Yahoo! [14], and Bing [15] as these search engines represent more than 96.4% of all search engines worldwide. In addition, meta search engines that blend Web results from Google, Yahoo and Bing, namely MetaCrawler [16] and Monster Crawler [17] were consulted as well. The Internet was used both to identify relevant Web-based information on HDCs and to identify references to non-Web-based information. Our Web search included all relevant webpages from government and non-government organisations, news sites, blogs, discussion boards, social media platforms etc. We executed searches using the following keyword strings: (1) health data cooperative, (2) patient data cooperative and (3) personal health records. Adhering to the recommended search methodology of the Canadian Institute for Health Information (CIHI, 2011), only the first ten pages of the search were utilised for screening. Search results were downloaded and managed in EndNote software (Clarivate Analytics, Philadelphia, Pennsylvania, USA).

Stage 3. Study Selection
We limited studies to those published in the English language, with no publication date limits. For the grey literature and database search, after removal of duplicates, IN and HN reviewed titles and abstracts for each paper or document for inclusion. Abstracts were classified as relevant, potentially relevant or not relevant to HDCs. Relevant records were those that directly discussed the HDC model or health data, data sharing, stakeholder engagement in data, and data ownership. Records were excluded if they did not discuss any of the concepts above. Abstracts that do not provide enough information on outcomes to determine eligibility were included for further review. Full texts were obtained of the abstracts that met eligibility criteria and were read, reviewed and reexamined for relevance. Two researchers reviewed the full text of the remaining papers to determine eligibility using the same criteria listed above. If no agreement was reached between the two researchers, TCT arbitrated.
Owing to the dynamic nature of the Internet, the screening and full review of webpages identified in the internet search was conducted concurrently. NH performed the first search string, archiving the list of the first ten pages (i.e. Web address, page title, brief description and date searched) and classified each as potentially useful or not useful based on the same criteria listed above for the database and grey literature search. The reviewer then opened all pages considered potentially useful, archived each full page and, should the page contain information about an HDC, classified that tool as eligible or ineligible for inclusion. Reasons for exclusion were recorded. The second reviewer (IN) then independently opened the archives created by the primary reviewer and assess random selections of ten percent of the websites classified by the primary reviewer as eligible and ten percent of those classified as ineli-  gible. The second reviewer also archived the full page of each website opened for assessment (i.e., included and excluded) in case the content had been modified since the primary reviewer undertook screening and full review. If no agreement was reached between the two researchers, TCT arbitrated.

Stage 4: Charting the Data
Information was extracted on the citation, study location, study objective, the health-data-related variable, how the cooperative was established, the main outcome variables, how the outcome variables were measured, and benefits and challenges of HDC. Further, due to the scoping nature of this review, additional descriptive information was collected for selected records including country of origin, author affiliation, and target audience. Two reviewers (IN and NH) independently carried out the data extraction, and discrepancies were resolved by consensus.

Stage 5: Collating, Summarising and reporting the Results
To explore our first research question, we conducted a content analyses of the abstracted data from the records to discover, and generated themes that were adjusted iteratively. For our second research question, the abstracted information on benefits and challenges related to HDC models was stratified by these themes (Tables 3 & 4). • This report discusses a common community data set • Today, discharge data provides the full community of users with information that is relatively current, has provider identifiers, and is cost efficient. • The information is used in public displays, such as websites, dynamic web query systems, and in traditional reports. It provides a broad array of information not found in individual registries and is more cost efficient to collect than other sources. • It can also be de-identified to allow broader use, than is possible with other clinical data sources. Because they are widely available and broadly used, hospital discharge data could serve as the backbone for a hybrid EMR/discharge "package" of information. Statewide discharge data combined with clinical data in an EMR can supply both the numerator and denominator for examining outcomes of care and cost effectiveness of treatments. • In addition, the common structure and relative uniformity of hospital discharge data across providers and states, allows for regional and national comparisons. 9 Author Year • A growing number of organisations are making progress in integrating health and care record data at the local level, but the complexities surrounding • Information Governance (IG) modelling are impacting associated timescales as well as the potential for such data to be put to beneficial secondary uses. • The process took those this report interviewed up to twelve months to finalise, and none plans to integrate substantial information from social care home providers at present, which would almost certainly take more time. • Only one of the interviewees used the data collected for purposes other than direct care and provided third party access for research based upon informed consent.  Are digital data co-ops an alternative to the commercialisation of health? https://thisisnotasociolog y.blog/2017/02/27/are-digi tal-data-co-ops-an-alterna tive-to-the-commercialisat ion-of-health/ • Individuals will be able to refuse access to their data to organizations or companies and the membership will be able to vote on who is accepted as a client.
• Dangers to having central repositories for all data. Certainly, if they did become the main repository for health data they will be a very big target for attacks by hackers • A safe and secure platform on which people can store, manage and actively share data on their terms • A not-for-profit cooperative organizational structure of the personal data platforms so that they are owned by the citizens • Revenues from citizen-controlled secondary use of data are invested in projects and services that benefit members and society at large This co-op lets patients monetize their own health data -Fast Company https://www.fastcompany.co m/90207550/this-co-op-lets -patients-monetize-their-o wn-health-data • "When people become members, they have a voice in what we do, and they also share in our profits," • Any patient who wants to become a Savvy member pays a buy-in fee of $34

Results & Discussions
We present our results separately for our two research streams: 1) academic database and grey literature search; and 2) internet search. We start by discussing the descriptive characteristics of the records found by these two search streams. Next, the benefits and challenges of HDCs are discussed in the context of three unifying themes that have emerged from the data: 1) data ownership and control; 2) data security; and 3) data flow and infrastructure.

Search Results
Our database and grey literature search yielded nine hundred and one records, which underwent two levels of screening, first being title and abstract screening and the second being a full text reading, to be included in the final synthesis. The final number of records that were read in detail for data abstraction was twenty-two ( Figure 1). For the internet search, we initially identified eight hundred and seventy webpages across the multiple search engines. After the removal of duplicates (either internal, or external corresponding with the database and grey literature search) and a screening of the landing-page for the inclusion and exclusion criteria, thirteen webpages were included in the final analysis ( Figure 2).

Study Characteristics
Of the twenty-two records identified in the initial search stream, eight were grey literature records, and fourteen were academic literature published in journals. Of these twentytwo, no records were published before the year 2000, whereas records increased after the year 2014. This is expected as HDC as a topic is relatively unexplored, and technological advances leading to increased data will rightly increase the interest in study of the HDC model. Only two records were written by authors with a non-academic affiliation. Much of the target audience seems to be academia, policy, and public health workers, or simply for academia. The majority of the records originated from the United States, with Europe in second place. Only one record was identified from Canada (Table 5).
Of the thirteen internet search webpages identified, nine were of a private nature (e.g. blogs, news sites, opinion pieces, corporate sites), and four were of an academic nature (authors were affiliated with an educational institution) ( Table 6). The main purposes of these webpages included: 1) homepages of the sites for HDC companies; 2) academic research on the HDC model, such as conference presentations or professional opinion pieces. A significant number of the private webpages were for lay audiences, whereas all the academic records were for academic audiences. Most of the private webpages originated from the United States and Europe, with none being from Canada. All academic webpages were from Europe.

Major Themes and Benefits and Challenges of HDC
The HDC model seems to fill significant gaps in health data, health research, and community health. All records identify in their content that HDCs are citizen owned and the equal property of members, where the model aims to enable meaningful collaboration and facilitate a transformation of community health data [8]. The records point to integrations and merging of new types of data (i.e. lifestyle data, quantified self-movement, demographic characteristic, genetic data) that can drive improvements in health and healthcare by increasing the accuracy, accessibility, and utility of patient information [18]. Through thematic analysis, three major themes capture the data available on HDCs: 1) Data ownership and control; 2) Data security; and 3) Data flow and infrastructure. The benefits and challenged are discussed within these themes.

Data control and ownership
The grey literature and database search records emphasised the benefits of control that participants will have over their health data. HDCs can operate with minimal costs and without charging their participants, since the valuable nature of the data can be invested in different industries by the participants to generate profits [18,19] . This transaction can involve multiple stakeholders including primary care physicians, beneficiaries, community members, health departments, social services, and universities. Some HDC models are able to provide and gain consent electronically and build a democratic structure, which establishes a link between control and ownership [20,21]. Ideally, participants of the co-operative can have control of data access, data use, and governance. Data control, in this case, is conducive to transparency, accountability, and trust [20,22]. The web search results looked at data ownership and control from a practical perspective, since most records were from HDC companies, mainly opinions or news blogs. Here, the HDC corporations not only give patients control of their data, but also allow them to connect and pitch to interested stakeholders. Further, there seems to be a big push from private cooperatives to take control of data away from corporations and put it back into the hands of people [23,24].
Negative aspects of data control and ownership were focused on the political and bureaucratic hurdles to truly make    data belong to participants. This includes: first, establishing legal precedence, where health information as a private property of patients is hard to justify based on traditional labour theory of ownership [25][26][27][28] . Second, the HDC model would need to establish connections within government agencies, which may risk the privacy of participants if robust data governance is not in place [19,23]. For example, HDCs in countries hosting public healthcare systems may need to establish health data sharing procedures that follow the countries laws and policies. The web search results took a practical and patient-centered approach to these issues. The records show there is an inherent variability in a community's vision on how they want to use the data within the cooperative. This variability can be in who has access to the data, who is the data controller, and who outlines the data sharing agreements. This can lead to distrust, lack of respect, and insufficient patient control of the process [22,29,30].

Data security
Data security was equally a strength of the HDC model as much as it was a weakness. Briefly, the grey and database literature results found that HDC models are able to securely store data from multiple platforms within secure servers and cloud computing. This not only provides a financially sound solution but offers an opportunity to collect data that is comparable. For example, the creation and management of a single server to hold information can be designed such that the information is ready-to-use by interested stakeholders [31]. However, there are concerns that no technology exists thus far that can absolutely guarantee trust, transparency, and data security [32] . Although a singular space to store data provides its benefits, it is also open to security threats [33,34]. Insufficient transparency may discourage patients from entering a cooperative due to the fear of an anonymity breach.
The web results show the use of government regulatory bodies and governance structures that improve data security and transparency related to access and control of patient data, which can legitimise the HDC operation. However, considering large data repositories and their value in the market, cyberattacks are a real threat [28,35]. This is especially true due to the ability of many cooperatives to access aspects of data through multiple modalities, including phone applications. As the access points to data repositories increase, so do security threats, and the potential costs to secure such repositories [36,37]. Indeed, to ensure data security, the HDC models must establish access and privacy standards that are communicated to HDC members and that may need to be regulated by not only good governance structures, but also independent government regulatory bodies [22].

Data flow and infrastructure
Records discussed data flow and infrastructure of the HDC model on multiple levels. Grey and database literature discuss the advantages of HDC in that they can create longitudinal health data from various care settings, which can be accessed via mobile applications [38]. Data integration is a highly cited strength of the HDC model in both search streams. Integration allows the use of data from multiple modalities, reducing the costs of gathering and third-party consent. Cloud comput-ing would then allow access to the data at anytime [19,20,39]. The web results break down the process of data collection and integration. Patients can collect, store, and manage data that is lifestyle (e.g. running times, blood pressures, sleep time, number of steps taken in a day) or medical (e.g. lab results, genetic information) in nature and can input it using a simple interface on either a phone or computer. The data is then used by interested stakeholders and invested into projects that will benefit the community of patients [40][41][42].
The challenges to the data flow and infrastructure of the HDC include clashes with the socio-political system in the host country. For example, it is difficult to know the fate of a HDC in a publicly funded healthcare system [43]. Further, the HDC may function better with the patient population representing homogenous strata, which may include participants with similar disease profiles and risk factors, or similar demographic characteristics. If this is not the case, certain disease groups may take leading roles in decision making, domineering the voices of another legitimate cooperative stakeholder [44][45][46][47][48]. Finally, although data integration is an important strength of the HDC model, webpage records show that there is a risk for collecting too much data. Not only will this create waste, as some data may be unusable, but it can also make it difficult to maintain anonymity. As more personal data is combined for patients, the easier it becomes to re-identify a patient profile [49][50][51].

Implications and Future Work
The work presented here has implications to inform the development of HDCs in communities that face disparities in healthcare access, health outcomes, and exposure to social cultural, and environmental influences. The key question in health research is to understand how immigrant communities fare in comparison to the health of people born in Canada. To answer this, data on long-term health outcomes, preventable conditions, and chronic disease outcomes are essential. This review outlines the political, financial, and humanistic barriers that must be overcome to establish HDCs. First, there is a need for a transparent infrastructure of an HDC that works well with the political situation in Canada, which hosts a public healthcare system. The collection and hosting of health data must overcome barriers in costs, legality, and privacy and security that must not only be effective, but also be effectively communicated to HDC members. Finally, the investment of health data must establish a democratic process where cooperative members establish informed consent, and members have a voice within the sharing of such data. Regardless of these barriers, the HDC models would encourage the development of information sharing hubs or community-based labs for immigrant communities to understand the worth and consequences of health data and serve as a source of grassroots action to understand health data. Next steps to understand HDCs further, along with the nuances of establishing an HDC, could be to conduct an in-depth systematic review, environmental scans, and content analyses of prominent HDC models. Cooperatives are ultimately a community-level initiative; therefore future work should involve community members in the exploration for HDCs in order to provide effective ready-to-use evidence.

Strengths and limitations
This review is the first of its kind to map the literature on HDC and related topics. It has multiple strengths, the first being an extensive search strategy that includes database, grey literature, and internet search streams. The review was also strong in collating the literature into comprehensive themes that represented the data appropriately. However, this review had multiple limitations. The inclusion criteria may have been broad; however, this is something the authors felt was necessary to capture information on HDC and related topics appropriately. Further, the review was unable to assess the quality of the literature, due the variability in the type of records identified in this review. Finally, most records found hailed from Europe and the United States which present different sociopolitical and health-related environments. In this case, the results of this scoping review should be generalised to Canada with caution.

Conclusions
The results of this extensive scoping review found HDC to be a relatively unexplored topic, with a focus in records from the United States and Europe. Canada seems to be lacking in its use, discussion, or research of the HDC model. We found that the benefits and challenges of HDC operate around three main themes related to data control, data security, and data flow and infrastructure. The study of HDC is multi-disciplinary, with themes in law, ethics, medicine, and public health, and presents a way to revolutionise the collection, storage, and use of health data that may be more sustainable than other models. The results of this study are an informative first step to the study of the HDC model, or an establishment of a HDC in immigrant populations.