Research Protocol for an Observational Health Data Analysis on the Adverse Events of Systemic Treatment in Patients with Metastatic Hormone-sensitive Prostate Cancer: Big Data Analytics Using the PIONEER Platform

Take Home Message Despite their efficacy, novel combination systemic therapies for metastatic hormone-sensitive prostate cancer (mHSPC) are associated with side effects. Our study reports drug-related adverse events in a large cohort of “real-life” patients treated with combination therapies for mHSPC.


Introduction
The treatment of metastatic hormone-sensitive prostate cancer (mHSPC) has evolved over time with the introduction of combination systemic therapies, which are more effective than androgen deprivation therapy (ADT) alone [1].These combination therapies include ADT in addition to either chemotherapy (docetaxel) or an androgen receptor signaling inhibitor (ARSI; enzalutamide, apalutamide, abiraterone acetate, or darolutamide) or both (triplet therapy) [1][2][3].The PEACE-1 and ARASENS trials showed a benefit of ARSI addition to docetaxel plus ADT [4,5].However, none of the existing trials can answer which of the doublet combination therapies is associated with the highest efficacy.Furthermore, in recent network meta-analyses, none of the doublets was significantly superior to the other in terms of efficacy [3], and triplet therapy outperformed doublets only in selected subgroups [1,2], albeit also associated with a higher risk of severe adverse events (AEs).
Each of the novel therapies comes with their own set of toxicities and AEs [1].For example, combination therapies are associated with a higher risk of cardiovascular [1], cognitive [2,6], and nervous system toxicity [2], as well as fatigue [1] compared with ADT monotherapy.Therefore, the key considerations in the management of mHSPC become safety and toxicity profiles of available combinations.In other words, the pivotal question is whether the patient can tolerate selected combination as opposed to alternative treatment with better safety and quality of life.Furthermore, in patients with multiple comorbidities, introduction of more drugs at baseline can develop drug-drug and drugcondition interactions, leading to increased AEs and hospitalisations.All this increases the need for selecting the right therapy for the right patient at the right time.
Overall, previous reports show that registry-based realworld data on AEs differ from those in clinical trial setting [7,8].Most of the real-world evidence on ARSI and docetaxel combination is derived from metastatic castration-resistant prostate cancer (mCRPC) patients [9][10][11].For example, Conover et al [12] showed in the US administrative claims data that in the mCRPC setting, abiraterone acetate is associated with a higher risk of heart failure, acute myocardial infarction, and ischemic stroke than enzalutamide.In addition, Bjartell et al [9] found that in patients treated with hormonal and chemotherapy sequencing for mCRPC, drug toxicity was the second most common cause of treatment discontinuation.However, there are little to no data on the safety and incidence of AEs in patients treated with combination therapies for mHSPC outside clinical trials.Therefore, we aimed to characterise and report the incidence of AEs in a large cohort of patients undergoing treatment for mHSPC using approved combination therapies.

Design
This is an observational study to report the rate of prespecified AEs among new mHSPC users of ADT monotherapy, ARSI, docetaxel, or docetaxel plus ARSI using the PIONEER platform [13][14][15][16][17][18].Definitions of key terms used in study are is included in Table 1.Proposed project-related studies are included in Supplementary Table 1.

Objectives
Overall, this study aims to characterise and report the incidence of drug-associated AEs in the following four cohorts over defined follow-up: ADT monotherapy, ARSI, docetaxel, and ARSI plus docetaxel in real-world patients treated for mHSPC (Table 2).The primary objective is to report the incidences of AEs of interest among patients with mHSPC receiving ADT monotherapy, ADT plus ARSI, ADT plus docetaxel, or ADT in combination with ARSI and docetaxel.
The secondary objectives are as follows: 1. To analyse time to AEs associated with ARSI, docetaxel, and docetaxel plus ARSI for mHSPC 2. To report baseline characteristics of patients treated with ARSI, docetaxel, and docetaxel plus ARSI for mHSPC 3. To report hospitalisation rates for the ARSI, docetaxel, and docetaxel plus ARSI cohorts for mHSPC

Data sources
Electronic health records (EHRs), registries, and administrative claims databases all converted to the Observational Medical Outcomes Partnership (OMOP) Common Data Model will be utilised (Supplementary Table 2).This includes a standard representation of healthcare experiences (such as information related to drug utilisation and condition occurrence) as well as common vocabularies for coding clinical concepts, and enables consistent application of analyses across multiple disparate data (OHDSI, 2020; Voss, 2015).All analyses will be performed independently within each database to produce database-specific results for each analysis.

Target cohorts
The target cohorts for this study are mHSPC patients who were treated with ADT monotherapy, ARSI, docetaxel, or docetaxel plus ARSI.Cohorts will be indexed on the first eligible treatment recorded within a (-30, 90) d window relative to the earliest metastasis diagnosis.For the ADT monotherapy cohort, no details of docetaxel or ARSI may be recorded any time before and up to 183 d after index ADT, unless it follows a censoring event.A cut-off of 6 mo was selected to minimise the risk of treatment for localised disease or initial combination therapy, however still to capture treatment for progression on ADT.For the ARSI cohorts, no details of docetaxel may be recorded any time before and up to 183 d after index ARSI, unless it follows a censoring event.For the docetaxel cohort, no details of ARSI may be recorded any time before and up to 183 d after index docetaxel, unless it follows a censoring event.In addition, in    the docetaxel cohort, patients on ADT who have a record of chemotherapy, but without a named drug reported, and no record of ARSI, 30 d before and up to 183 d after the index date will be included under the presumption that this chemotherapy would likely be docetaxel; in the ADT-only and ARSI plus ADT cohorts, this is an exclusion criterion.A sensitivity analysis will be performed to compare the cohort treated with docetaxel and the cohort with only the notification of chemotherapy treatment.For the docetaxel plus ARSI cohort, either docetaxel or ARSI may serve as the index event, with the alternative treatment having to occur within 183 d of the index treatment.The following operational definitions will be applied to identify mHSPC patients.Patients must have a record indicating metastases plus a record of prostate cancer (PCa) that occurs any time before or up to 30 d after the metastasis record.To ensure that metastases are due to PCa, patients must not have a record for any other primary cancer at any time before metastases, with the exception of nonmelanoma skin cancer (-1 to +30 d).To identify mHSPC patients, patients must not have a record of orchiectomy >30 d before metastasis and they must not have a record of chemical ADT within a (-365 to -31) d window relative to metastasis.Index events, and inclusion and exclusion criteria of the target cohort (mHSPC patients treated with ADT monotherapy, ARSI, docetaxel, or ARSI plus docetaxel; Fig. 1) are detailed below.

Index event
The earliest record of exposure for mHSPC to:   The study will perform a descriptive analysis of the ADT monotherapy, ARSI, docetaxel, and docetaxel plus ARSI cohorts, and as such no minimum sample is required.For each analysis to be reported, the minimal number of patients/events required is 5.

Stratifications
Each target cohort's ADT monotherapy, ARSI, docetaxel, and docetaxel plus ARSI will be analysed in full and stratified on factors based on the following baseline characteristics assessed for the 1-yr preindex period; all strata are pending meeting minimum reportable cell counts (as specified by data owners).
The baseline characteristics are as follows: 1. Age at index (raw data) 2. Comorbidities reported separately and classified according to standardised systems (eg, Charlson Comorbidity Index [CCI]).Patients will be stratified into three groups:

Characterisation analysis
All analyses will be performed using the code developed and adapted from the OHDSI Methods library.The code for this analysis can be found at https://github.com/ahijazy/PioneerMetastaticAE.A single package executes cohort diagnostics to assess the fitness of use of the phenotypes in the database and the characterisation step.Baseline covariates will be extracted using an optimised SQL extraction script to quantify demographics, condition group eras, and drug group eras.Additional cohort-specific covariates will be constructed using OMOP Standard Vocabulary concepts [19].
At the time of execution, the package will create a data frame in which individuals' age and sex will be extracted.Individuals' medical conditions, procedures, measurements, and medications will be summarised over several time periods: all time (-1 to +1), all time prior (-1 to 0), a year prior (-365 to 0), a year to a month prior (-365 to -31), 6 mo prior (-180 to 0), and a month prior (-31 to 0) the index date; at the index date; and over the follow-up period of 1 mo (0 to 31) and from 1 mo to a year (31 to 365).The numbers and proportions of persons with feature variables during timeat-risk (TAR) windows will be reported by the target cohort and specific stratifications.Standardised mean differences will be calculated when comparing the characteristics of study cohorts, with plots comparing the mean values of characteristics.Baseline disease characteristics at diagnostics will be reported using the median and proportions for non-normally distributed continuous variables and categor-ical variables, respectively.The median follow-up will be computed for the overall study cohort.The absolute number of patients who experienced each AE will be reported.

Estimation analysis
The objective of this analysis is to estimate the crude incidence rates (per 1000 person-years) and incidence proportions (per 1000 persons) of AEs across the ADT monotherapy, ARSI, docetaxel, and docetaxel plus ARSI cohorts.In reporting estimated incidence rates per 1000 person-years rather than real numbers, we allow comparability across different populations and time periods.This approach adjusts for variations in population sizes and exposure times, ensuring a more accurate reflection of AE occurrences.Nevertheless, we report the number of as well as the rates in the Shiny app.Crude incidence rates will be estimated and stratified by the database.For the time-toevent analysis, Kaplan-Meier estimates of event-free survival for each database are evaluated.Furthermore, we will estimate regarding the TAR specifications for chronic and episodic events in the incidence rate analysis: The chronic events are listed as follows: 1. Patients experiencing the AEs before the start of the TAR excluded.2. A record of a chronic event during the observation period ends the TAR, the event is counted in the incidence rate, and follow up time is the timespan between the cohort start date and the time of the occurrence of the chronic event.
The episodic events are the following: 1.Having the events prior to the TAR does not exclude the patients.
2. The follow-up time is the time span between the treatment cohort start date and the cohort end date.This is added up if there are multiple observation periods.All AE event occurrences during this TAR are counted in the incidence rate calculation.
It is important to note that the incidence analysis conducted is descriptive and univariate in nature, focusing on presenting and summarising the incidence rates of the AEs.It aimed to provide an overview and understanding of the occurrence of AEs within a target population.Therefore, comparing incidence rates directly is not appropriate in this context.Incidence rates can be influenced by various factors, such as population demographics or the presence of confounding variables.Hence, caution should be exercised when comparing incidence rates between different groups or time periods, as it may lead to misleading interpretations.A further analysis, such as a comparative effectiveness study with propensity score matching, is needed before comparisons can be made.Once sufficient data are available, we intend to conduct this study to explore and assess the comparative effectiveness of different treatments based on AEs, thus studying the safety profiles of various interventions, enabling us to make informed decisions and enhance patient outcomes.Another critical point is that the incidence rates being calculated are not limited to treatmentcaused events but encompass all events experienced by the patients.On the one hand, this offers a more comprehensive view of their health outcomes.On the other hand, one cannot infer a direct cause-and-effect relationship between treatments and specific outcomes (AEs).In other words, it is not possible to determine which events are directly attributable to the treatment and which are not.
The third objective is to estimate the mean cumulative failure (MCF) of episodic events.MCF is a statistical measure used in survival analysis, particularly when assessing the occurrence of events or failures over time.It represents the cumulative number of events that will have occurred by a specific time point.MCF allows one to understand how the risk of the event changes over time and provides valuable insights into the occurrence of AEs under study.MCF will be estimated using the Nelson-Aalen estimator [18].The Nelson-Aalen estimator, a nonparametric estimator in the point estimate of sample MCF at each time point, does not assume any underlying model.

Sensitivity analysis
Castration sensitivity is central to the definition of mHSPC and relies on well-captured prior ADT use in data sets.However, we have a reason to believe that ADT may be underreported in some data sources, a bias that may be significantly greater in the periods prior to obtaining specific indications for the mHSPC populations.To understand the risk of misclassification of mCRPC patients as mHSPC patients, an analysis will be undertaken to assess in the datasets whether ADT is associated with agents requiring it according to their label.Another sensitivity analysis will describe baseline characteristics and outcomes of cohorts when restrictions are placed to define time periods during which agents have received regulatory approval for mHSPC (Supplementary Table 4).The baseline analysis requires intensification with a period from index to 183 d with the intent to limit misclassification of mCRPC patients as mHSPC patients.While this approach supports target cohort specificity, it may limit generalisability of the findings to routine clinical practice.An analysis will be performed where the intensification period may be up to 90 d to reflect more general clinical practice.The baseline analysis requires index treatment to occur within a window of 30 d before and 90 d after the earliest record of metastases.An analysis will be performed where index treatment will be required to occur within 30 d of metastases. 3.4.6.

Logistics of executing a federated analysis
We will use both centrally hosted and federated datasets to perform this analysis.The analytics will be developed using readily available data, either centrally hosted or directly accessible by study-a-thon members.Once we reach a shareable version of the analytics package, we will circulate the link to the GitHub hosting the package, which will be downloaded by all data owners and run locally.We will then gather and collate the results in an online Shiny app available to all.

Strengths
The study is anticipated to be the largest patient-level cohort of metastatic PCa patients, thus allowing characterisation of relatively uncommon outcomes otherwise not identifiable in smaller datasets.Data will be obtained from multiple centres and providers.The data sets also represent patient-level data from multiple countries, which aids in the generalisability of findings.This enables comprehensive characterisation of the study population, key baseline characteristics, and outcomes.Several sensitivity analyses have been conducted, which will aid in the interpretation and lend robustness to results.Lastly, the use of routinely collected data from multiple sources maximises the external validity and generalisability of the findings.

Limitations
This study is carried out using data recorded in a collection of EHRs, claims, and tumour registries.As with any healthcare database used for a secondary data analysis, the patient records might be incomplete in many respects and may have had erroneous entries, leading to misclassification of study variables.Data regarding diagnosis of metastatic PCa, treatments, pathology, imaging, and laboratory results or baseline covariates prior to enrolment within the database may not be available.Clinical progression based on radiological imaging is limited by the data collection.PCaspecific characteristics such as stage, grade at diagnosis, or the extent of the disease are not readily available in most EHRs and claims databases.A selection bias cannot be ruled out as patient factors such as age, comorbidities, and clinical characteristics may influence treatment choice and subsequent outcomes.Treatment provided in hospitals or any other setting outside each participating institution is not included.Medical conditions may be underestimated as these will be based on the presence of condition codes, with the absence of such a record taken to indicate the absence of a disease.Meanwhile, medication records indicate that an individual was prescribed or dispensed a particular drug, but this does not necessarily mean that an individual took the drug as originally prescribed or dispensed.In the real world, there does not exist consistent documentation of AEs that we see in trials using a system such as Common Terminology Criteria for Adverse Events.Additionally, cohorts have not been matched or weighted to ensure comparable groups at baseline.

Protection of human individuals
The study uses only deidentified data.Confidentiality of patient records will be maintained at all times.Data custodians will remain in full control of executing the analysis and packaging results.There will be no transmission of patient-level data at any time during these analyses.Only aggregate statistics will be captured.Study packages will contain minimum cell count parameters to obscure any cells that fall below the allowable reportable limits.All study reports will contain aggregate data only and will not identify individual patients or physicians.

Management and reporting of AEs and adverse reactions
According to the new guidelines for good pharmacovigilance practice (EMA/873138/2011) and ISPE, there is no requirement for expedited reporting of adverse drug reactions from studies with secondary use of data (such as electronic healthcare databases).

Plans for disseminating and communicating study results
The results of the study will be presented at international urological and oncological meetings in the form of abstracts.
The final results will be published as full-text papers in an international peer-reviewed urological journal.The results of this study will be published following guidelines, including those for authorship, established by the International Committee of Medical Journal Editors.When reporting results of this study, the appropriate Strengthening the Reporting of Observational Studies in Epidemiology checklist will be followed.

Sponsor
The study is supported by the Innovative Medicines Initiative 2 (IMI2) Joint Undertaking project PIONEER (grant agreement no.777492) [13][14][15][16][17][18].IMI2 receives support from the European Union's Horizon 2020 research programme and the European Federation of Pharmaceutical Industries and Associations (EFPIA).Many of the contributors are part of OHDSI, a multistakeholder interdisciplinary collaborative to bring out the value of health data through large-scale analytics, and may have other funding sources, which will be listed in the study manuscripts.
Continuous treatment was defined as follows: ADT-until the end of the observation period ARSI-until a gap of >60 d between records Docetaxel-until the end of the observation period ARSI abiraterone acetate, apalutamide, enzalutamide, and darolutamide ADT = androgen deprivation therapy; AE = adverse event; ARSI = androgen receptor signalling inhibitor; PCa = prostate cancer.

index 10 .
Number of hospitalisations in the year prior to index 11.Time from index to intensification for doublet and triplet combination Report hospitalisation associated with treatment Number of hospitalisations of interest on treatment AE = adverse events; ADT = androgen deprivation therapy; ARSI = androgen signalling inhibitor; CCI = Charlson Comorbidity Index.a Obesity: diagnostic codes associated with obesity.

3 .
Comorbidities recorded: (a) At any time point before the index date (b) 365 d before the index date (c) 30 d before the index date (d) At the index date 3.4.3.

Table 1 -
Definitions of key terms used in study protocol

Table 2 -
Summary of objectives Endpoints Primary Report AEs per cohort Number of AEs of interest on treatment (Table 2) per cohort Secondary Report time to AEs of interest per cohort Episodic AEs: time from index date to first documentation of episodic AE of interest per cohort Chronic AEs: time from index to the documentation of chronic AE of interest per cohort Follow-up time Time from index date to end of observation (ADT-only and docetaxel cohorts), end of continuous ARSI use (ARSI cohorts), censoring event (treatment switch, diagnosis of other cancer), or loss to follow- 8. No alternative index treatment recorded any time before index (with the exception of chemical ADT recorded >365 d before the earliest diagnosis of metastases) 9.No radiotherapy recorded 1095 d prior to index 10.For the ARSI, docetaxel, and docetaxel plus ARSI cohorts, an assumption is made that ADT is concomitantly required in each agent's regulatory approval for mHSPC