Dataset of phase I and II immunotherapy clinical trials used for a meta-analysis to assess the role of biomarkers in treatment outcomes in diverse cancers

We performed a literature search in PubMed to identify phase I/II clinical trials with immunotherapy drugs approved by the Food and Drug Administration (labeled, off-label, and/or combined with investigational immune checkpoint inhibitors or other treatment modalities) from 2018 to 2020. We used the following key words: clinical trials, phase 1, Phase 2; and the following filters: cancer, humans; and selected the checkpoint inhibitors that had been approved by the FDA by March 2021, i.e., “pembrolizumab”, “nivolumab”, “atezolizumab”, “durvalumab”, “cemiplimab”, “avelumab”, and “ipilimumab. Clinical trials with their checkpoint inhibitors as in their labeled indications, off-label use or their combinations with investigational immune checkpoint inhibitors or other treatment modalities were included. Studies describing supportive care or locoregional treatments; cellular, viral, or vaccine therapy; studies in the adjuvant or neoadjuvant setting; and pediatric studies were excluded. Overall, 173 articles reporting on relevant studies were identified. Using these articles, we compiled a data file of study-specific covariates for each study. We recorded the immunotherapeutic agent, tumor type and biomarker, and clinical outcomes (objective response rate and median values [point estimate] and confidence intervals for progression-free survival and overall survival. Using these data, we carried out meta-analyses for the three outcomes and meta-regression on study-specific covariates. The same data could be used for any alternative implementation of meta-analysis and meta-regression, using more structured inference models reflecting different levels of dependence based on the available study-specific covariates.


a b s t r a c t
We performed a literature search in PubMed to identify phase I/II clinical trials with immunotherapy drugs approved by the Food and Drug Administration (labeled, offlabel, and/or combined with investigational immune checkpoint inhibitors or other treatment modalities) from 2018 to 2020.We used the following key words: clinical trials, phase 1, Phase 2; and the following filters: cancer, humans; and selected the checkpoint inhibitors that had been approved by the FDA by March 2021, i.e., "pembrolizumab", "nivolumab", "atezolizumab", "durvalumab", "cemiplimab", "avelumab", and "ipilimumab.Clinical trials with their checkpoint inhibitors as in their labeled indications, off-label use or their combinations with investigational immune checkpoint inhibitors or other treatment modalities were included.Studies describing supportive care or locoregional treatments; cellular, viral, or vaccine therapy; studies in the adjuvant or neoadjuvant setting; and pediatric studies were excluded.Overall, 173 articles reporting on relevant studies were identified.Using these articles, we compiled a data file of study-specific covariates for each study.We recorded the immunotherapeutic agent, tumor type and biomarker, and clinical outcomes (objective response rate and median values [point estimate] and confidence intervals for progression-free survival and overall survival.Using these data, we carried out meta-analyses for the three outcomes and meta-regression on study-specific covariates.The same data could be used for any alternative implementation of meta-analysis and meta-regression, using more structured inference models reflecting different levels of dependence based on the available study-specific covariates. © • Many immuno-oncology trials are conducted without biomarker selection.We performed a meta-analysis of phase I/II clinical trials evaluating immune checkpoint inhibitors (ICIs) to determine the association between biomarkers and clinical outcomes, if any.• This meta-analysis of data from phase I/II clinical trials with immunotherapy drugs published from 2018 to 2020 showed that the immune-related biomarker-positive cohort had higher response rates and longer progression-free and overall survival after immune checkpoint blockade compared with the biomarker-negative cohort.• This is a unique database that provides all the raw data that were meticulously collected and recorded.To our knowledge, this is the first and the only publicly available database summarizing the results of phase I and II clinical trials with checkpoint inhibitors.New data and new publications can emerge from the analysis of this valuable resource.• These data are a source of published clinical outcomes with and without biomarker use for patient selection.Raw data have not been previously published and can be valuable to the research community.
• The data file includes study-specific covariates for each study, that describes the immunotherapeutic agent, tumor type and biomarker, and clinical outcomes (objective response rate and median values [point estimate] and confidence intervals for progression-free survival and overall survival).• The same data could be used for any alternative implementation of meta-analysis and metaregression, using more structured inference models reflecting different levels of dependence based on the available study-specific covariates.Other investigators, healthcare professionals, biotechnical industry, regulatory agencies as well as patients could exploit the database.• These results demonstrate the significance of the use of immune-related biomarkers for the selection of patients with diverse tumor types who will participate in clinical trials evaluating ICIs.Prospective clinical trials need to implement the use of composite biomarkers by incorporating genomic, transcriptomic, and immune profiles, host pharmaco-genome, and other factors.

Data Description
The spreadsheet shared at the indicated anonymous ftp site provides the complete secondary data used in the meta-analysis.In the spreadsheet, each line corresponds to one cohort of patients, with some items reported separately for marker-positive and marker-negative patients in the indicated cohort.In many cases, a single study includes data on multiple cohorts defined by different disease subtypes or by different biomarkers.In such cases, multiple lines in the spreadsheet are used to summarize the study.The headers in line 1 of the spreadsheet indicate the reported information.
Additionally, the analysis used a derived variables hazard ratio (HR) and weights (w).For the HR we used a point estimate as the ratio of median event times (this would be exact under exponential sampling models).For the study-specific weights we used sum of the inverse sample sizes (of marker-positive and marker-negative cohorts).
Details of the data entries in the spreadsheet are reported in Table 1 .

Experimental Design, Materials and Methods
Each of the articles identified in the PubMed search was reviewed by one of the authors.The search included the following terms: clinical trials, phase 1, Phase 2," and publication dates from January 1, 2018, to December 31, 2020.The following filters were also used: "cancer"; and "humans".The checkpoint inhibitors that had been approved by the FDA by March 2021, i.e., "pembrolizumab", "nivolumab", "atezolizumab", "durvalumab", "cemiplimab", "avelumab", and "ipilimumab" were selected in the search.Clinical trials with their checkpoint inhibitors as in their labeled indications, off-label use or their combinations with investigational ICIs or other treatment modalities were included.Studies describing supportive care or locoregional treatments; cellular, viral, or vaccine therapy; studies in the adjuvant or neoadjuvant setting; and pediatric studies were excluded (Fig. 1) .We first verified that the article did indeed report on a phase I or II immuno-oncology trial and that it included outcomes stratified into biomarker-positive and -negative cohorts.We then recorded all study-specific covariates and the outcome summaries.Before proceeding with the desired meta-analysis we carried out a test for homogeneity (fixed effects only), to use a random effects model only if the null hypothesis of common fixed effects could be rejected [1] .We used the method of DerSimonian-Laird [2] to estimate residual heterogeneity.For all three endpoints, ORR, PFS and OS, homogeneity was rejected with p < 0.0 0 01.
For ORR, we recorded the number of patients and number of overall responses by marker status.Analyses of ORR were based on log ORs.For the event time outcomes, we recorded point estimates and 95% confidence intervals for the median PFS and OS durations, where available.When the upper limit of the confidence interval was not available, we recorded it as N/R (not reached).In some cases, we referred to Kaplan-Meier plots to identify sample sizes and estimates.Results for event times (PFS and OS) were based on differences of log median event times for biomarker-negative versus biomarker-positive groups [3] .For each study we evaluated studyspecific variance as the sum of inverse sample sizes, as this was readily available for all studies in the meta-analysis [4] .Tests and coding of variables were performed as previously published.Statistical significance was defined as p < 0.05, and highly significant was defined as p < 0.0 0 01.
All reported inference was performed in the R statistical software environment [5] .We used implementation of meta-analysis in the R packages metaphor [6] , meta [7] and mvmeta [8] .Summaries are displayed as forest plots [9] .All event times were converted to months, if needed.
Potential biases were addressed by (i) inspection of a funnel and trim-and-fill plot; (ii) by carrying out meta-regression on agent, tumor type, monotherapy vs. combination therapy, line of therapy, and tumor types [10][11][12] .The trim-and-fill method was proposed in Duval and Tweedie [ 11 , 12 ].It estimates the number of studies missing from a meta-analysis due to lack of significant results.We implement the method using the R package metafor [6] .Meta-regression describes how different study characteristics impact the overall treatment effect being reported in the meta-analysis.We use an implementation of meta-regression from the R package meta [13] .Neither of the two methods identified evidence for systematic biases.Finally, we note that the results in the related research article are based on the data file with one study erroneously included in duplicate (PMID: 30515672).The duplication is removed in the shared data set.Repetition of the analysis, after excluding the duplicate study, did not alter study results.

Ethics Statements
The work was carried out on data from published studies and did not involve animals or human subjects.

Declaration of Competing Interest
The authors declare that they have no conflict of interest from any competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

1. Value of the Data
2023 Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) A PubMed search for phase I/II clinical trials with Food and Drug Administration (FDA)-approved ICIs (2018 to 2020) identified 173 articles reporting on relevant studies.Data collection was carried out by manual review of articles.Study-specific covariates and outcome data (ORR, PFS, and OS) were recorded.Confidence intervals were recorded for PFS and OS, if available.Selected articles reported summaries stratified by biomarker status for multiple biomarkers, while other reported summaries stratified by patient subgroups.Data were recorded for each biomarker or subgroup separately.

Table 1
Columns in the data spreadsheet.