Lessons Learned: It Takes a Village to Understand Inter-Sectoral Care Using Administrative Data across Jurisdictions

Abstract Cancer care is complex and exists within the broader healthcare system. The CanIMPACT team sought to enhance primary cancer care capacity and improve integration between primary and cancer specialist care, focusing on breast cancer. In Canada, all medically-necessary healthcare is publicly funded but overseen at the provincial/territorial level. The CanIMPACT Administrative Health Data Group’s (AHDG) role was to describe inter-sectoral care across five Canadian provinces: British Columbia, Alberta, Manitoba, Ontario and Nova Scotia. This paper describes the process used and challenges faced in creating four parallel administrative health datasets. We present the content of those datasets and population characteristics. We provide guidance for future research based on ‘lessons learned’. The AHDG conducted population-based comparisons of care for breast cancer patients diagnosed from 2007-2011. We created parallel provincial datasets using knowledge from data inventories, our previous work, and ongoing bi-weekly conference calls. Common dataset creation plans (DCPs) ensured data comparability and documentation of data differences. In general, the process had to be flexible and iterative as our understanding of the data and needs of the broader team evolved. Inter-sectoral data inconsistencies that we had to address occurred due to differences in: 1) healthcare systems, 2) data sources, 3) data elements and 4) variable definitions. Our parallel provincial datasets describe the breast cancer diagnostic, treatment and survivorship phases and address ten research objectives. Breast cancer patient demographics reflect inter-provincial general population differences. Across provinces, disease characteristics are similar but underlying health status and use of healthcare services differ. Describing healthcare across Canadian jurisdictions assesses whether our provincial healthcare systems are delivering similar high quality, timely, accessible care to all of our citizens. We have provided a description of our experience in trying to achieve this goal and, for future use, we include a list of ‘lessons learned’ and a list of recommended steps for conducting this kind of work. Key Findings The conduct of inter-sectoral research using linked administrative health data requires a committed team that is adequately resourced and has a set of clear, feasible objectives at the start. Guiding principles include: maximization of sectoral participation by including single-jurisdiction expertise and making the most inclusive data decisions; use of living documents that track all data decisions and careful consideration about data quality and availability differences. Inter-sectoral research requires a good understanding of the local healthcare system and other contextual issues for appropriate interpretation of observed differences.


Introduction
The patient cancer experience is a trajectory, from understanding a new diagnosis, being involved in treatment decisions, dealing with the social and emotional effects of the diagnosis and, if all goes well, living life as a cancer survivor which can involve ongoing issues affecting quality of life. Cancer patients often have other health problems requiring them to manage their care across multiple health care settings. Consequently, cancer care changes by phase of disease, and necessarily exists within the broader health care system.
From both the patient and system perspectives, cancer care should be patient-centred, and integrated with the other health care a patient receives, to provide more effective, efficient, and acceptable care. But health care fragmentation is well documented and can be extreme (1)(2)(3). Family physi-cians, who are trained to have longstanding relationships with their patients and oversee the care of all their health conditions and preventive care, are one group of healthcare professionals that could help the system achieve whole-person, integrated care (1,4).
The Canadian Team to Improve Community-Based Cancer Care along the Continuum (CanIMPACT) was formed to 'improve cancer care together'. Its overarching objectives are to enhance primary cancer care capacity and improve integration between primary and cancer specialist care along the cancer care continuum (5). We focussed on breast cancer care as an exemplar of what can be done to support these aims. Starting in September 2013, Phase 1 of the CanIMPACT program of research involved the conduct of foundational studies using a multimethod approach to inform the development of interventions in Phase 2, which began in the Spring 2016 and will be completed by April 2020. As part of Phase 1, the CanIM-PACT Administrative Health Data Group (AHDG) undertook a description of breast cancer patients, their diagnostic process, their treatment and survivorship care across five Canadian provinces (British Columbia, Alberta, Manitoba, Ontario and Nova Scotia) to understand care and inform improvement efforts. Specifically, we conducted inter-and intra-provincial comparisons, focusing on aspects of care that may be influenced by primary care; and investigated whether vulnerable subgroups were at risk of sub-optimal access and outcomes. The purpose of this paper is to describe the process used and challenges faced in creating four parallel administrative health datasets, to present the content of those datasets and the characteristics of the resulting population-based provincial breast cancer cohorts, and to provide guidance for future such work based on 'lessons learned'.

Context
In Canada, all medically-necessary health care is required to be publicly-funded, universal, comprehensive, and portable across provincial/territorial jurisdictions (6). Health care funding and delivery is the responsibility of the thirteen individual jurisdictions, and there are some differences in the actual health care by jurisdiction. Most outpatient physicians are "fee-forservice", and primary care physicians play a "gate-keeper" role in access to specialty care. Cancer services other than surgery are usually offered within designated provincial cancer facilities.

CanIMPACT Administrative Data Aims
The overall aim of the administrative health data component of the CanIMPACT research program was to conduct population-based comparisons of care for all breast cancer patients diagnosed from 2007 through 2011 (or latest available) in each of the five Canadian participating provinces. Provincial data sources included similarly structured population-based cancer registries that are linked to clinical and administrative health services data using individual encrypted health card numbers for research purposes. Details of these operations are found in provincial websites (7)(8)(9)(10)(11). All data sources used are stable and mature. They are linked routinely and used repeatedly in Canada to conduct health services research. The breast cancer outcomes we studied included detection method (screened or symptomatic), diagnostic interval length, use of adjuvant chemotherapy, chemotherapy toxicity and attendant use of emergency departments, survivorship care guideline adherence and use of primary care and oncology care across the continuum from diagnosis through survivorship. Comparisons were made across provinces and regionally within provinces, and by vulnerability indicators: age at diagnosis, rurality, arealevel socioeconomic status, area-level immigration status, and comorbid disease status.

Data Management Approach
The CanIMPACT Administrative Health Data Group (AHDG) has twenty members, with combined expertise in primary care, surgery, medical oncology, epidemiology, biostatistics, data processing, economics, and cancer registries and it includes three patient representatives The AHDG membership includes a lead from each province with expertise in the use of the their provincial health administrative data to provide informed data processing and interpretation and ensure adherence to provincial security/ privacy rules. Our research methods were informed by the collective research experience from the members of our team including the conduct of similar studies using administrative databases in individual provinces (12)(13)(14)(15)(16)(17)(18)(19)(20)(21). Analyses were conducted separately at designated research centres in each province using similar strategies guided by a common data processing and analysis plan. Knowledge of each province's policy environment and health care structure was also required in order to interpret study findings. This knowledge was provided by AHDG members and the wider CanIM-PACT team.
The lead and some other AHDG members with expertise in their provincial data and/or analysis constituted a core working group that communicated regularly, with patient advisors and other AHDG members participating whenever possible. Bi-weekly conference calls have been the core communication strategy of the AHDG with more than sixty documented calls over a three-year period. Ongoing email communications and a number of face-to-face meetings attended by key members complete our communication strategy. Through this communication strategy we refined and operationalized the research objectives, identified and processed data elements using standardized definitions, and developed ten study objectives and analysis plans for publication (22)(23)(24).

AHDG Activities
In Canada, federal and provincial data protection laws provide insufficient guidance regarding data release for research purposes, leading to inconsistent inter-provincial data sharing policies across provinces (25). Although there have been some instances in which a country-wide research dataset has been created, the effort involved is considerable and was beyond the scope and time constraints of our project. We therefore produced separate, parallel project datasets and analogous analysis plans across provinces to meet the group's objectives. These datasets would contain information on the demographic, clinical and healthcare utilization of a cancer patient cohort from one to two years prior to diagnosis through up to ten years post-treatment. The tasks involved in creating these parallel project datasets included understanding overlap and gaps in file and data availability and determining common variable definitions. Creation of these data resources was an iterative process of refining research objectives and analysis plans as feasibility issues related to data quality and/or availability were identified.
Our understanding of the data nuances evolved throughout the course of the study. As part of the grant development and continuing after funding was received in April 2013, before data were available, decisions about variable capture relied on team members' knowledge, provision of variable frequencies from previously cut datasets and on country-wide reporting of data by government agencies. As a first step in the research process, we produced an inventory of data sources and potentially relevant data elements in each province. This allowed us to assess, at a high level, whether the provinces involved had access to similar data and identify obvious limitations. The initial data sources considered included the provincial cancer registry, provincial health insurance plan client registry, hospital discharge abstracts, outpatient physician service claims, hospital outpatient services including emergency services, continuing care data, mental health services data, elderly prescription drug data and/or population-wide prescription drug data, cancer treatment data, and immigration data. With the exception of immigration data which is probabilistically linked, all other data are deterministically linked at the individual level using provincial health insurance numbers. This inventory revealed that some provinces did not have access to hospital outpatient services, population-wide prescription data, and/or individual-level immigration data. These limitations informed the selection of our first draft set of key data elements.
Some of the key AHDG group members met in person for two days in September 2013 to further critique our experiences, present methodologies we had developed in our respective single-province studies, and compare findings. We discussed specific aspects of complex variable definitions, problems with missing data, and issues about data validity. Shortly after this workshop we started regular conference calls to continue the discussion.
We gathered further information about each key data element including: its description/definition, data source, coverage years and any relevant background information. One of the study investigators (MW) summarized this information and assessed feasibility for inclusion. Examples of issues that arose at this stage included age being defined based on month and year only in one province, area-level immigration data were only available in two provinces, and one province used less precise diagnostic codes in its physician billing data.
At this point we began to create four dataset creation plans (DCPs) for each phase of care we planned to study: baseline/diagnosis phase; treatment phase; and survivorship phase. We used a DCP template from ICES in Ontario which requires documentation of the study personnel, edition changes, study goals and objectives, datasets to be used, study timeframe and key dates, study variables and analysis plan. These living doc-uments provided a road map for data programmer-analysts in each province and serve as reference documents for data dictionary development, key decisions, and our knowledge about inter-provincial data differences. Development of the DCPs focused our thinking on the details needed to address the phaseof-care-specific research questions. These details included specific inclusion/exclusion criteria needed for each phase, defining the diagnosis period and follow-up time, details involved in data processing and variable definitions including ensuring data comparability across phases to facilitate longitudinal analyses, descriptive statistics, and statistical modeling. Our discussions and decisions were informed by team members' previous work, as mentioned above. For instance, determination of the diagnostic interval (12)(13)(14)(15), chemotherapy toxicity codes (16,17), and censoring decisions during survivorship (18)(19)(20)(21) were imports from this previous work. We also took advice from a national cancer quality of care reporting agency regarding the best choice for characterizing area-level immigration status, rurality and area-level socioeconomic status (26).
At the one-year point and in preparation for a full-team face-to-face meeting in October 2014, we took a step back and developed a study framework that mapped our key data elements onto the dimensions of access and quality defined by Andersen (27) and the WHO(28). These dimensions included: coordinated care, effective care, efficient care, accessible care, acceptable/patient-centered care, equitable care and safe care. The framework considered the three phases of the cancer care continuum we were studying (diagnosis, treatment and survivorship) and the relevant, responsible healthcare providers. After having initially taken a broad perspective on what we could accomplish, this exercise, with the help of the full Can-IMPACT team, helped us refocus on the big issues around coordinated high quality cancer care and further refine our plans.
We presented our preliminary findings at the CanIMPACT Consultative Workshop, held in March 2016 as our contribution to the CanIMPACT Phase 1 goals. The workshop included all members of the CanIMPACT team and others, including knowledge users and patients with 74 attendees in all. The outcome of that workshop was a decision on a direction for the intervention to be conducted in Phase 2 of the CanIMPACT study (29). Since then we have completed data processing and analyses addressing ten research objectives for publication.

Study Process Experiences
Overall, we found that frequent, meaningful communications and the commitment of the team members were key to our success and we had to allow for flexibility in the study process and analytic details as further data understanding occurred.
Inter-provincial data inconsistencies can be summarized across four dimensions: 1) system practice 2) data source 3) raw data element, and 4) variable definition. An example of system practice level variation was the "cancer diagnostic assessment program", which is an Ontario initiative that oversees the diagnostic process using a multidisciplinary team approach (30). No other province had a similar program so the impact of such programs was dropped from our objec-tives. Instead, the existence of this program serves as a contextual element that informs our interpretation of our study findings. Another potential system-level source of variation was the quality of claims data by method of physician payment (fee for service versus alternative payment plans). For instance, 5% of Ontario specialists and 50% of Ontario primary care physicians are remunerated under alternative payment plans. However, they are required to shadow bill and are often given cash incentives for doing so. Completeness and accuracy have been shown to be high for both payment forms in a recent study conducted in the Province of Alberta (31). We mitigated potential claims errors by: 1) emphasizing visit counts whenever possible (which only require the existence of a claim on a particular day); 2) using hospitalization data to assign surgery type; 3) grouping all imaging into a single variable; and 4) using established claims-based chronic disease algorithms which require more than one occurrence of a diagnostic code for assignment of disease status(32). At the data source level, two databases were not available to all provinces. Immigration, Refugees and Citizenship Canada's permanent resident data, which contains demographic information for every landed immigrant, were only available in British Columbia and Ontario. We are reporting on the immigrant experience in those two provinces. National Ambulatory Care Reporting System data that standardizes reporting on emergency department visits across Canada were available only in Ontario and Alberta. Fortunately, Nova Scotia and Manitoba had strategies for identifying emergency room visits in the absence of the National Ambulatory Reporting System data that they shared with British Columbia. Data quality and coverage is particularly high for the provincial cancer registries which meet the certification quality standards required for North American reporting (except Ontario) (33) and WHO reporting (34). Quality is also high for hospital inpatient reporting, which is required for all Canadian jurisdictions using standardized data (35). The number of breast cancer patients leaving their home province is likely to be low since only 6.3% of internal migrants in Canada are over 65 years of age (36), thus large losses to follow up are not a concern. Double-counting across provinces is not an issue either since we only studied incident cases. At the raw data element level, an example of data inconsistency was the "date of death" variable. All provinces had date/month/year information except for British Columbia, which only had month/year so survival data are computed at that level of precision.
Similar data resources could contain fundamental differences that were only revealed once we were producing detailed data processing plans. For instance, in Canada we can use the "Postal Code Conversion File (PCCF)"(37) created by Statistics Canada to assign people to census areas based on their postal code. The PCCF is then able to identify many area-level data items relevant to that person from the census, such as socioeconomic status. But there are many consecutive versions of the PCCF that contain subtle variable definition differences or even different data. As it turned out, the PCCF version available in Ontario did not contain the area-level immigration tertile variable which we used for area-level immigration status, and the PCCF version available in British Columbia did not contain a geocode that was required to create a deprivation index.
Data processing to create comparable data items varied based not only on variations in data structure and availability but also on variations in the structure of the respective provincial healthcare systems. For instance, whereas all mammography screening occurs and is documented in organized programs in the other provinces, in Alberta and Ontario, screening mammography can occur outside the organized screening programs, requiring the application of algorithms to other databases to identify those patients (13,15). Results interpretation also had to consider the provincial context. For instance, screening rate differences had to be interpreted with an understanding of screening age eligibility variations over time and across provinces. Health system structural differences could explain inter-provincial variation. For instance, in Nova Scotia, the diagnostic interval was similar for screened and symptomatic patients because of the centralized nature of their diagnostic services. In other provinces, the symptomatic patients waited longer for a diagnosis.
Even with population-wide data sources in a common cancer, sample size concerns dictated some decisions. Our aim to study the effect of a breast cancer diagnosis on chronic disease care was limited by small numbers of documented chronic disease in the smaller provinces and by incomplete data (see Supplementary Appendix 1 for details). We had to include cases as far back as 2007 to ensure enough numbers in the smaller provinces because we also had to end our recruitment (2011 diagnoses) with enough time to study the survivorship phase.
We used the following principles in the presence of data differences: 1. Maximize the number of provinces contributing by making the most inclusive choice. For instance, in Nova Scotia, chemotherapy data is known to be incomplete, but information about consultations with medical oncologists is available. Based on patterns of visits to medical oncology, we determined who received chemotherapy and the start date for chemotherapy receipt. We were, however, unable to determine a chemotherapy end date in Nova Scotia so an average chemotherapy treatment duration was used instead.
2. Use previously-developed methods and definitions whenever possible. For example, British Columbia did not have access to emergency room data but Manitoba had an algorithm previously developed and validated using hospital discharge data to find emergency room visits that British Columbia adopted for the study.
3. Track differences in key study variables between provinces in the DCPs to ensure this is considered when interpreting results. For example, stage information was collected differently across provinces, with variable use of clinic-assigned stage and use of cancer registrars to assign collaborative stage (38), requiring the use of only stage groups (I-IV) for consistency.

4.
Track study variable quality in the DCPs to ensure this is considered when interpreting results. For example, area-level SES assignment depends on mapping census dissemination areas to postal codes. The error rate on this mapping is high in rural areas and needed to be considered when comparing SES effects.

Study Data Sources and Variables
Supplementary Appendix 1 provides details on the datasets, including study variable definitions, with source references when applicable, data sources, inter-provincial definitional differences and data availability. The main data sources we used included similarly structured cancer registry, census area-level demographic data and provincial administrative databases, including physician claims, ambulatory care and inpatient hospital data. Our population-based datasets contain information on patient socio-demographics, baseline health status, breast cancer disease characteristics, health care use across the cancer care continuum, the diagnostic method and timeliness, initial treatment and waits for chemotherapy, treatment toxicity, survivorship care guideline adherence, and survival. The methods used for data capture of all data sources used were stable across the period of the study. In the Supplementary Appendix 1 we have also documented our attempts to identify chronic disease cohorts and chronic and preventive care.

Cohort Description
The datasets contain information on all histologicallyconfirmed breast cancer patients (ICD 174) diagnosed in these provinces for the years listed in Table 1 as captured in our provincial, population-based cancer registries. The size and demographics of the study cohorts are described in Table 1.
The results reflect known inter-provincial differences in general population demographics(39). The median age (IQR) was 61  in British Columbia, 62 (52-72) in Manitoba, 60 (50-71) in Ontario, and 62 (52-72) in Nova Scotia. Median age was not available for Alberta but it included more patients in the 40-49 group and correspondingly fewer in the >74 group. Area-level socioeconomic status patterns were similar across provinces but with slightly fewer in the lowest income quintile in Manitoba and Ontario. In contrast, the pattern for area-level material deprivation for the three provinces reporting shows larger differences, with 49% of Ontario patients in the two least deprived groups compared to 38% in Manitoba and 28% in Nova Scotia. Conversely, 35% of Nova Scotia patients fell in the most deprived quintile for that province. The difference in the results of these two socioeconomic variables is explained by the fact that the income quintile boundaries were set using the provincial distribution while the deprivation quintile boundaries were set using the country-wide distribution. Therefore, larger differences for deprivation compared to income are due to inter-provincial SES differences. There are more immigrants in British Columbia than the other two provinces reporting immigration tertile and larger urban populations in British Columbia and Ontario. Table 2 describes the disease characteristics and comorbid illness burden of these provincial breast cancer cohorts. Three provinces (Alberta, Manitoba, Nova Scotia) had almost complete information on breast cancer stage. The stage distributions for these three provinces are similar (if we exclude the carcinoma in situ group in Alberta to mimic the other cohorts) except that the Stage IV group is smaller in Alberta at 3.8% compared to Manitoba and Nova Scotia at 6.1%. Histologic grade distributions for provinces with reasonable completeness were similar, with the largest difference being a 6% lower rate of poorly differentiated cancers in Manitoba compared to Al-berta and Nova Scotia. This information was missing for 50% of Ontario patients. Comorbid illness counts, as measured by the Johns Hopkins Adjusted Clinical Group (ACG) system(40) revealed a lower comorbid illness burden in British Columbia than in the other provinces with 32% having a 0-3 count. Ontario patients also had more patients in this group at 26.4% compared to Manitoba at 23.6% and Nova Scotia at 21.9%. Although not directly comparable, we have Charlson comorbidity scores (41,42) for Alberta, with 72.4% having a score of 0 comorbidities on this scale, 19% with 1 and 8.6% with more than one. More patients in Nova Scotia were high users of the health care system at 18.3% compared to 13.6% in British Columbia, 16.9% in Manitoba and 16.5% in Ontario.

Effective Practices
We have described a cross-province collaboration involving a strong, committed team of researchers, knowledge users and patients who worked together to describe and assess differences in inter-sectoral breast cancer care. The practices we adopted that proved effective included: concurrent data definition and development of detailed analysis plans across jurisdictions; frequent, structured communication within a core group; scheduled "check-ins" with the full group at key points in development of the research plans and utilization of previous study definitions and methods whenever possible. These strategies led to the creation of four comparable datasets that are allowing the reporting of breast cancer care and outcome patterns across Canada.
A critical component to maintaining good organization and documentation management, both essential given the complexity of the endeavor, was designating one research associate (LJ) to be responsible for keeping the dataset creation plans up-to-date based on decisions made during our conference calls and meetings. This associate also fielded all clarification questions from the provinces, forwarding them to appropriate investigators as needed. She kept track of action items and worked with the group's co-chairs to set conference call agendas. Additionally, to further clarify the data requests and analyses, she created templates for data tables needed with the agreed upon demographic, clinical and healthcare utilization factors specific to each data analysis plan.
The regular conference calls were critical to the successful completion of the analyses, and we continue our regular conference calls as manuscripts are being developed. These calls and ongoing email correspondence help with manuscript refinement and reconciliation of any further data inconsistencies that become evident as we are reporting on the results.

Challenges
Data privacy and ethics board requirements differed across provinces with regard to the amount of study information needed for approval. Data access processes varied across provinces, with it taking longer to receive a de-identified dataset in some provinces than in others. This complicated efforts to perform analyses in parallel and in some cases reduced the timeliness with which final results were made available for dissemination. Importantly, the process was most straightforward in the provinces with centralized linked data repositories collated for research purposes. Challenges varied depending on the number of jurisdictions involved. In the current study we were able to enroll five of the thirteen provincial/territorial jurisdictions in Canada.
The research plan evolved with the operationalization of our initial high-level objectives and increasing understanding about data availability and other feasibility issues. We initially thought we could use existing breast cancer cohort databases but since our data elements were different, or provincial ac-cess rules required that databases be recreated from scratch, these pre-existing datasets were useful only for some preliminary data analyses but not for the final work. Adding new objectives mid-course led to fairly large changes to the dataset development and analysis plans. Specifically, we added tumour markers, we added an objective to assess the association between time to chemotherapy and survival, and the result of one of the full-team face to face meetings was a decision to include quality of chronic and preventive disease indicators at baseline and during survivorship. These additions were not as successful as our original goals; only one province had  near-complete tumour marker data, we were unable to run the survival analyses due to budget constraints, and chronic disease indicator exercise was subsequently reduced to a focus on chronic obstructive pulmonary disease (COPD) and diabetes care due to small numbers in the smaller provinces. Budgeting was difficult due to the complexity, scope and changing nature of the project and due to variation in dataset readiness and available provincial infrastructure support. The planning phase was much longer than expected leading to the loss of the study coordinator before the process was complete due to budgetary constraints.
Personnel availability can change over the course of a large study which can disrupt continuity. In this study, the Alberta lead moved to the United States and although she continues to be actively involved and one of her colleagues fortunately stepped in to continue to provide data access, in the end, we were only able to include Alberta on some of the diagnostic phase analyses.

Other Similar Work
Other researchers have conducted similar studies of crossjurisdictional healthcare quality and access.(43) Within Canada, for instance, Barbera and colleagues created parallel datasets to study quality indicators of palliative care across four Canadian provinces (44). With regard to the effort involved, they concluded that conducting inter-provincial comparisons in the absence of data sharing agreements makes ongoing surveillance of palliative care quality indicators unlikely. Inter-country studies of healthcare patterns have also been conducted, such as that by Gigli and colleagues looking at colorectal cancer care in Italy and the United States (45) and Warren and colleagues who compared end of life care in Ontario and the United States (46). As Lipscomb points out, these studies were possible because the jurisdictions involved could link established cancer registries to administrative healthcare data longitudinally (43).

Lessons Learned
Based on our experiences with this project, we have drafted a checklist of principles and processes that could be used for future cross jurisdictional research that are provided in Box 1 and a suggested checklist for undergoing similar future studies in Box 2.

Conclusions and Future Directions
Documenting differences in health care across Canadian jurisdictions is crucial for understanding whether our provincial health care systems are delivering similar high quality, timely, accessible care to all of our citizens as mandated by the Canada Health Act (6). Restricting such description to single data sources cannot provide a comprehensive picture of health care delivery, so cross-data source inter-sectoral linkage projects such as this are an important evolution toward our ability to study the complete health care experience across the thirteen jurisdictional health care systems in Canada. The development of parallel linked datasets across national or international jurisdictions can also inform our understanding of whether factors associated with access and quality such as vulnerable group status are universal or healthcare context specific. We note that future use of parallel datasets such as ours could be subjected to summary data meta-analysis for pooled effects (47) allowing the quantitative assessment of generalizability of observed effects over multiple jurisdictions in addition to summary effect estimates.
Data resources and availability are ever-changing based on changes in the health care system, health care informatics, and privacy and research ethics and legislation. In future, we hope that the conduct of projects such as ours will become more streamlined with regard to data access and common data elements. The SEER-Medicare linked database in the United States has been used for many years to conduct crossjurisdictional cancer-related health services research (48,49). The development of distributed data networks, in which similarly structured parallel datasets are subjected to the same analytic code, is a welcome development (47). The Canadian Network for Observational Drug Effect Studies (CNODES) is a Canadian example(50) that is supported by the Canadian Institutes for Health Research and is filling the need for a large data source to study rare adverse drug events.
The creation of a single national healthcare data source containing the level of detail that we were able to capture is a much greater challenge -especially in a country such as Canada in which healthcare is the responsibility of the provinces. Barriers to accessing and analyzing health information in Canada have been described (51). Recommendations for overcoming those barriers include the need for harmonized ethics approaches, legislation, policies and procedures for accessing and sharing data, the existence of strong federal-provincial-territorial partnerships and support, and more standardized data across the commonly-used data sources (51). Lipscomb specifies the need to include partnerships with between government agencies, professional organizations, provider organizations and researchers and cautions that that feasibility for building and maintaining such a resource would be very challenging (43).
We faced many hurdles in creating parallel datasets for a single study and we thought it important to report the learning from this effort for future research. In two of the provinces in this study the work was done at longstanding provincial data centers with robust infrastructure and data experience. It was clear to us that the existence of those centers simplified a lot of the data access and processing that was required in comparison to the provinces without those resources. If such resources could exist in each of our provinces and territories it would greatly enhance our ability to study healthcare delivery both at the provincial and national levels. We have also learned that local knowledge about health system structure, practices and policies is crucial to data use and interpretation. Local knowledge will always need to be an integral part of the use of such data, even if we are successful in creating a national resource.

Acknowledgments
In addition to the authors, the membership of the The authors would also thank Emma Shu, Marlo Whitehead, and Yan Zhang for conducting data processing and statistical analyses.
This study was funded by the Canadian Institutes of Health Research (grant # 128272). The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funder. This study is supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. Parts of this material are based on data and information provided by Cancer Care Ontario (CCO). The opinions, results, views, and conclusions reported in this paper are those of the authors and do not necessarily reflect those of CCO. No endorsement by CCO is intended or should be inferred. Parts of this material are based on data and information compiled and provided by the Canadian Institute for Health Information (CIHI). However, the analyses, conclusions, opinions and statements expressed herein are those of the author, and not necessarily those of CIHI. We gratefully acknowledge CancerCare Manitoba for their on-going support and Manitoba Health for the provision of data. The results and conclusions presented are those of the authors. No official endorsement by Manitoba Health is intended or should be inferred. Nova Scotia data were provided by Health Data Nova Scotia and the Nova Scotia Department of Health and Wellness, however, the observations and opinions expressed are those of the authors and do not represent those of either Health Data Nova Scotia or the Department of Health and Wellness. Data for this study were also provided by Population Data BC and the BC Cancer Agency. All inferences, opinions, and conclusions drawn in this study are those of the authors, and do not reflect the opinions or policies of the BC Data Steward(s) (52)(53)(54)(55)(56).

Statement on Conflict of Interest
The authors declare that they have no conflicts of interest.