Use of study-specific MOE-like estimates to prioritize health effects from chemical exposure for analysis in human health assessments

There are unique challenges in estimating dose-response with chemicals that are associated with multiple health outcomes and numerous studies. Some studies are more suitable than others for quantitative dose-response analyses. For such chemicals, an efficient method of screening studies and endpoints to identify suitable studies and potentially important health effects for dose-response modeling is valuable. Using inorganic arsenic as a test case, we developed a tiered approach that involves estimating study-specific margin of exposure (MOE)-like unitless ratios for two hypothetical scenarios. These study-specific unitless ratios are derived by dividing the exposure estimated to result in a 20% increase in relative risk over the background exposure (RRE20) by the background exposure, as estimated in two different ways. In our case study illustration, separate study-specific ratios are derived using estimates of United States population background exposure (RRB-US) and the mean study population reference group background exposure (RRB-SP). Systematic review methods were used to identify and evaluate epidemiologic studies, which were categorized based on study design (case-control, cohort, cross-sectional), various study quality criteria specific to dose-response analysis (number of dose groups, exposure ascertainment, exposure uncertainty), and availability of necessary dose-response data. Both case-control and cohort studies were included in the RRB analysis. The RRE20 estimates were derived by modeling effective counts of cases and controls estimated from study-reported adjusted odds ratios and relative risks. Using a broad (but not necessarily comprehensive) set of epidemiologic studies of multiple health outcomes selected for the purposes of illustrating the RRB approach, this test case analysis would suggest that diseases of the circulatory system, bladder cancer, and lung cancer may be arsenic health outcomes that warrant further analysis. This is suggested by the number of datasets from adequate dose-response studies demonstrating an effect with RRBs close to 1 (i.e., RRE20 values close to estimated background arsenic exposure levels).


Introduction
The evaluation of the potential human health effects of chemicals, like arsenic, that have an extremely large volume of datasets and the wide variance in data quality represented across studies would benefit from an efficient screening approach that helps to narrow the focus of dose-response analyses to studies and health outcomes that are of the most concern for risk assessment. The screening approach described in this paper allows for the use of exposure-or dose-response data 2 provided in studies without the need for resourceand time-consuming adjustments often performed on a set of critical studies, such as converting reported exposure metrics to a more relevant intake dose metric (e.g., μg/kg-day) and performing complex statistical dose-response modeling (e.g., model averaging or meta regression). It is intended to preliminarily inform, not replace, a human health assessment. It offers an efficient approach for identifying studies that may be adequate for dose-response analysis and for approximating the relative potency of a chemical with respect to health effects that are known or presumed to be related to increases in exposure. This approach attempts to provide a portion of the information that can be used to prioritize studies and health effects for the more focused and in-depth dose-response analyses that would be performed as part of a full chemical health assessment.
This margin-of-exposure (MOE)-like analysis uses ratios of the study-specific estimates of exposures associated with a 20% increase in relative risk over a background exposure estimate (RRE 20 ) divided by the background estimate in the same study-specific exposure units (RRB). 3 The RRB is, therefore, a unitless, study-specific exposure:exposure ratio (not a risk:risk ratio) that can be used to assess the potential potency of a chemical across health outcomes without the need for extensive adjustments to the exposure metrics and relative risk estimates reported by study authors. The large database of arsenic epidemiological studies is used as a case study to illustrate the approach. Two types of study-specific RRB estimates are derived in this case study, one that uses a background exposure estimate for the U.S. population (RRB-US) in the exposure units of the study (see Table 2) and the other that uses the study-specific background exposure estimated for the study population reference group (RRB-SP).
As defined, the RRB-US ratios are study-specific estimates of exposures associated with a 20% increase in relative risk over an estimated U.S. background exposure level (RRE US 20 ) divided by the estimated U.S. background exposure level. While this provides a means of comparing approximations of potency for different health effects in relation to U.S. background levels, the derivation of the RRE-US 20 can involve considerable model extrapolation. For this arsenic case study, the estimated U.S. background exposure levels (Table 3) are often well below the background (i.e., the study-specific reference group) exposure levels observed in arsenic studies, which are frequently conducted in regions that have been associated with high arsenic contaminations (e.g., Taiwan, Bangladesh, Chile). Consequently, there is model uncertainty in the RRE-US 20 and, by extension, the RRB-US estimates from studies of populations with historically high arsenic exposures. A full assessment of all uncertainties associated with every RRB-US derivation presented here is beyond the scope of this screening approach and this paper. However, for comparison purposes, we have also reported RRB-SP values that relate reported relative risk to the study-specific background levels of exposure. RRB-SP estimates may not be as relevant to low exposure populations like the U.S. However, they provide study-specific estimates of potential use for comparing approximations of potency across health effects that will generally involve less extrapolation and, consequently, less model uncertainty and less RRB estimation variability (see further discussion in Section 4).

Overview
For the inorganic arsenic test case, studies of 11 health outcomes (Table 1) that have been relatively strongly associated with inorganic arsenic exposure were evaluated. 4 The RRB analyses derive individual study, preliminary (screening-level) relative risk estimates from analyses of grouped data (e.g., binned by exposure level) from individual epidemiological studies; for the database used in this inorganic arsenic case study, such datasets account for the bulk of exposure-response information. These analyses involve fitting standard parametric exposure-response models (e.g., logistic or Poisson regression) to the exposure or dose metrics reported by the study authors and deriving RRE 20 estimates based on the results of the best-fitting models. The overall process is detailed in the following sections and illustrated in Fig. 1 ,"Study Selection," and Fig. 2,"RRE 20 Derivation." Briefly, the methods used in this RRB analysis to screen, prepare, model exposure-response data from studies and derive RRE 20 and RRB estimates included the following: Pre-model screening and selection of datasets: a three-step strategy (described in Section 2.2) was used to select studies for modeling. Data preprocessing: group-level means were estimated, incidence rates were adjusted to account for covariates, background exposures for the U.S. population were estimated, outcomes were mapped to outcome domains, and author-performed trend tests were considered.
Exposure-response modeling: case-control and cohort studies (see Section 2.2.1 for study selection criteria) were modeled 5 to predict a relative risk exposure (RRE), which is defined as the exposure or dose level where the RR or OR has changed by a certain percentage (the benchmark relative risk, or BMRR) compared to the estimated RR or OR at the estimated U.S. background level of exposure (see Table 3 for estimated U.S. background exposure levels for the different exposure metrics) or at the estimated mean study population reference group level of exposure (see Section 2.3.1). The RRE estimates presented here are for a 20% increase in relative risk over background exposure estimates for the U.S.  and the study population reference group (RRE-SP 20 ). The exposure corresponding to an increase in relative risk of 20% is deemed to be sufficient for the purposes of the RRB analyses in this evidence base (see Section 2.5); different RRE benchmarks may be more appropriate for other evidence bases or chemicals.
Exclusion based on lack of fit: Models were excluded from selection if they did not provide adequate fit (p-value < 0.05). Culling due to model uncertainty: Models with a tendency to predict very steep, sometimes supralinear dose-response curves (i.e., Exponential 4 and Michaelis-Menten models) were excluded if their RRE 20 result was more than 10-fold below all other adequately fitting models. Further, to avoid including datasets with highly uncertain RRE 20 s, a dataset was excluded if the RRE 20 was more than a factor of three below the central estimate for the lowest dose group or above the central estimate for the highest dose group of the study. If studies reported an unexposed reference population (i.e., zero exposure), these RRE 20 s were not considered in the RRB analysis.
Derivation of RRB ratios: to facilitate comparisons across exposure metrics, the RRE-US 20 and RRE-SP 20 estimates were divided by an estimated U.S. central tendency (background) exposure estimates (Table 3) or the estimated mean study population reference group exposure (Section 2.3.1) to derive the RRB-US and RRB-SP MOE-like estimates 6 .

Study/dataset selection process
This section describes the three-step "Pre-model Screening" process (see Fig. 1) used to select studies of those 11 health outcomes for modeling. The goals of this process were to efficiently identify studies that are potentially adequate for dose-response modeling by applying consistent and transparent preliminary study quality evaluation criteria.
2.2.1. Initial study screen-To begin the study selection process, the datasets for each health outcome were characterized as (1) ecological, (2) cohort/cross-sectional dichotomous, (3) case-control dichotomous, or (4) continuous. For comparison purposes, the screening focuses on evaluating all adequate exposure-response datasets for the exposure level associated with a 20% increase in relative risk of a disease (see Section 2.5). In some cases, study authors were contacted in an attempt to obtain missing data necessary for an exposure-response analysis. The data provided are described in the Supplemental Material for this article. If the necessary data were not provided, the study was not included in the RRB analysis.
Next, studies that provided no quantitative exposure-or dose-response data were set aside. This selection criterion excluded "ecological" studies, where location of residence or geographically averaged arsenic estimates based on a small number of measurements were the sole exposure metrics. Also removed at this step were studies that reported arsenic concentrations in hair, nails, or blood as the sole metrics of exposure due to a relative lack of National background data for the approximation of RRBs from these biomarkers 7 .
In this RRB analysis, the exposure-response metrics were compared across study types. Rothman and Greenland (2005) point out that studies that estimate prevalence (i.e., cross sectional studies) should not be compared with studies that estimate incidence or mortality rates (i.e., case-control and cohort studies). In addition, the method for adjusting the disease counts for covariates, which differs for case-control, cumulative incidence, and incident rate study metrics, does not have a readily-available counterpart for cross-sectional studies. Therefore, we excluded cross-sectional studies from these RRB analyses.

Secondary screen:
Evaluate study elements-Studies were evaluated for suitability for exposure-response modeling. A series of quality rating criteria were used to characterize the potential utility of specific studies for exposure-response ( Table 2). The rating criteria were specifically tailored to identify studies that may have been relevant for qualitative information but did not provide suitable information to support exposure response estimation (see example exposure-response study selection tables for 11 inorganic arsenic health outcomes in Supplemental Material).
For each quality rating criterion, each study was determined to be either "suitable," "less suitable," or "not suitable," with a score of 0, 1, or 2 given to those categorizations, respectively. Additionally, borderline cases between "suitable" and "less suitable" were given a score of 0.5. The scores across all rating criteria were then summed, and datasets that had a score greater than or equal to five were excluded. All rating criteria were weighted equally in this approach (i.e., no individual rating criterion was deemed more important than another). In general, however, the large majority of studies providing quantitative exposureor dose-response data were retained. Table 1 shows the numbers of studies for each health outcome category that were originally identified as potentially useful, the numbers set aside (excluded) in the different study selection steps, and the numbers ultimately included in the RRB analysis. Sixty-eight total studies were included in the RRB analysis: 18 studies evaluating bladder cancer outcomes, 4 studies evaluating diabetes outcomes, 17 studies evaluating diseases of the circulatory system outcomes, 3 studies evaluating liver cancer outcomes, 16 studies evaluating lung cancer outcomes, 2 studies evaluating nonmalignant respiratory outcomes, 3 studies evaluating pregnancy outcomes, 6 studies evaluating renal cancer outcomes, 3 studies evaluating skin cancer outcomes, and 10 studies evaluating skin lesion outcomes. Full details of the study selection rating criteria and decisions are provided in the Supplemental Material for this article.

Summary of datasets included in RRB analysis-
Within each study selected for analysis, multiple datasets could be present, including analysis of different outcomes (e.g., both stroke and ischemic heart disease for "diseases of the circulatory system"), different populations (e.g., the full study population versus individual susceptible subpopulations), and different exposure metrics (e.g., both water and urinary concentrations). In general, all datasets within each of 68 selected studies were modeled based on the most-adjusted statistical model reported. These criteria led to the inclusion of 255 datasets (Table 1).

Data pre-processing
Once the datasets were identified, the data were pre-processed and the following data attributes were assessed to allow the RRE 20 and RRB analyses to be performed (see Fig. 2).

2.3.1.
Estimating group-level mean exposures-All of the datasets that served as inputs to the modeling associated with this RRB analysis are group-summarized results. An important implication is that exposure metrics are provided at the group level, where the groups are often defined based on exposure bins; for example, an exposure group may be defined as "arsenic concentration less than 3 μg/L". However, for dose-response modeling, a point estimate representing the group's exposure or dose is needed. A few studies provided mean or median exposure or dose estimates for each of the exposure groups; for those studies, the provided means or medians were used directly in exposure-response modeling. Where only exposure ranges and numbers of subjects were provided, it was necessary to estimate mean exposures in each group. For the modeling performed in this RRB analysis, the proportions and exposure ranges were fit to lognormal distributions. Distribution fitting was performed by maximum likelihood methods, assuming no truncation of the uppermost exposure range. Group mean estimates were then derived by drawing large Monte Carlo samples (10 million iterations) from the fitted distributions, and sampling randomly within each exposure range for the appropriate numbers of "subjects." In some cases, referent group exposures were described as "0" with no indication of the exposure range or variance. Such datasets were excluded from consideration for RRB derivation due to lack of a valid referent group background exposure estimate to use for RRB derivation.

2.3.2.
Adjusting incidence values to account for covariates-Outcome data were generally supplied only for groups; that is, total numbers of cancers, numbers of subjects/cases and controls/referents. "Crude" and/or covariate-adjusted statistical measures of risk, such as relative risks or hazard ratios (cohort studies) or odds ratios (case-control studies) were also provided. Because the exposure-response models for both case-control and cohort studies specifically are fitted to the numbers of cases and non-cases (or cases and controls), it is necessary to adjust the numbers of expected cases to account for covariate adjustment, similar to the way adjusted ORs account for covariates compared with crude ratios. The approach used for this adjustment involves generating effective-counts (Allen et al., 2020a). The effective-counts for both cases and controls are then used in the model fitting (see example input data in Tables 5 and 6).

Estimating U.S. population background exposure for RRB-US
derivations-An estimate of U.S. background exposure is needed for two purposes in the RRB-US derivation, for the derivation of the RRE-US 20 (to estimate the baseline RR or OR used in the RRE-US 20 derivation; see Sections 2.4.2 and 2.4.3) and then also for the derivation of the RRB (RRE-US 20 /U.S background exposure estimate = RRB-US). For estimating the "background" arsenic level for each exposure metric, a U.S. central tendency and U.S. high exposure level was estimated, as shown in Table 3. When needed, we assumed a body weight of 70 kg and intake rates from the 2011 EPA Exposure Factors Handbook (U.S. EPA, 2011) to convert between metrics. The U.S. central tendency values were then used as the baseline.

Mapping specific outcomes to outcome domains-
The datasets across the health outcome categories covered many different levels of outcomes. To facilitate comparing RREs across studies within one outcome category, the different outcomes were divided into categories based on what was reported by the authors.
-Outcome domains: subcategories within the health outcome categories used to group similar outcomes.

♦
Outcome name in study: the name used for the specific outcome in the study. Table 4 shows the different outcome types, outcome domains, and outcome names for each of the health outcome categories. The table includes both fatal and non-fatal cancer and non-cancer outcomes. All of the outcomes included in the RRB analysis had at least one dataset where an acceptable model fit could be found.
In some cases, the four outcome types above were not entirely applicable for the health outcome category. For example, pregnancy outcomes that resulted in death to the fetus/baby were characterized as "perinatal mortality" to distinguish it from cancer mortality. Skin lesions, while a preclinical marker for increased skin cancer risk, were given a separate type of "precancer lesions." The outcome types were used to group results, which are presented in the Supplemental Material for this article.
2.4. Exposure-response analysis for case-control and cohort studies 2.4.1. Input data and pre-analysis-Example input datasets are shown in Tables 5  and 6, with data as reported in the epidemiologic study included in the first 4 columns: exposure/dose, number of cases and controls, and adjusted odds ratio for case-control studies, and exposure/dose, number of cases, and adjusted relative risk for cohort studies. As discussed in Section 2.3.1., mean or median exposure/dose metrics for each group was as the authors reported or, alternatively, estimated by fitting distributions to the proportions of subjects in each exposure range. Modeling methods for reported and estimated means were the same. Given that all datasets were modeled individually, no attempt was made to convert to a common exposure metric. Instead, as covered in Section 2.3.3, background estimates for the author-reported exposure or dose metric were used in the benchmark relative risk calculations. In this way, the RRE 20 values for any dataset using any exposure or dose metric are comparable in terms of potential potency, and thus conversion of all datasets to a common dose metric is not necessary. The cases and controls were adjusted using the effective count method described by Allen et al. (2020a) in a companion article to this article. This adjustment, broadly speaking, ensures the cases and controls are appropriately adjusted to account for any covariates.

Exposure-response modeling-
The RRB is a ratio obtained by dividing a study-specific estimate of the exposure associated with a defined health outcome risk by an estimate of the exposure (in the same study-specific units) typically experienced in the U.S. (U.S. background exposure). This section will describe the exposure-response modeling methods used to obtain the numerator of the RRB ratio.
As described in Section 2.3.3, the denominator of the RRB ratio, either the estimate of U.S. background or the study background (reference group) exposure, is also used in the derivation of the numerator of the RRB, the RRE 20 . Specifically, the modeled study data is used to estimate the arsenic exposure level (or dose) corresponding to a RR or OR of 1.2 where the baseline in the modeled data (i.e., a RR or OR of 1.0) corresponds to an estimate of arsenic background exposure (for either the U.S. or study population) in terms of the studied exposure metric. This was done to create RRE 20 , and therefore RRB, estimates that are comparable across studies with widely different referent group exposure levels.
For case-control studies, adjusted case and control numbers are treated as dichotomous data and fitted by the logistic model: where dose is either exposure or dose, depending on the data set. The logistic form was used because, under a logistic regression model, results from case-control studies can be analyzed as if they had been collected prospectively rather than retrospectively. That is, the likelihood contributions from each exposure group in such a study are the same binomial-based likelihoods encountered in other study designs (Prentice and Pyke, 1979). Parameter values in the logistic model were estimated using standard likelihood maximization techniques, similar to the methods incorporated in EPA's Benchmark Dose Software (U.S. EPA, 2012). Chi-squared p values are reported and used to judge how well the model fit the data. Akaike information criterion (AIC) value are also reported but not used for comparing models because only one model is used for case-control data sets.
For cohort study data, the fundamental assumption made for modeling is that the counts of cases in each exposure group follow a Poisson distribution: where o i and e i are observed cases and expected case number in the ith exposure group, respectively, and f(d i ) is an exposure-or dose-response function describing the relationship between exposure or dose and relative risk (which is a continuous measure). Seven continuous dose-dose-response models were used for f(·), including the linear model, power model, 2nd-degree polynomial model, Michaelis-Menten model, and the Exponential 2, 3, and 4 models. Models were again fit by maximum likelihood estimation, and chi-squared p-values and AICs were reported.

Predicted odds ratio and relative risks and relative risk dose estimation-For
case-control studies, the model also estimates adjusted odds ratios (ORs) associated with each exposure/dose level in the data set based on the logistic model, using the following equation: where f(·) is the logistic function specified in Eq. (1), d refers to the individual exposure/ dose levels, and f(d 0 ) is the percentage of cases predicted at the estimated background arsenic exposure level (see Section 2.3.3). As previously described, the RRE is the study-specific modeled exposure or dose estimate of where OR has changed by a certain benchmark relative risk (BMRR) percentage compared to the modeled OR at the estimated background level of arsenic exposure (RREBMRR or, for our purposes, the RRE 20 ): The predicted odds ratios and the RRE for each individual study are estimated relative to either a study-specific estimate of the OR at a common arsenic exposure level across studies (the estimated U.S. background exposure levels as described in Section 2.3.3), in the case of the RRE-US 20 derivations, or the estimated referent exposure level in the individual study, in the case of the RRE-SP 20 derivations. For example, in the Fig. 3 plot, the RRE-US 20 and RRE-SP 20 estimates of 0.51 mg/L-yr and 0.68 mg/L-yr are the model estimates of the cumulative exposure associated with an OR increase of 20% (i.e., an OR of 1.2) over the OR estimated at the U.S. background level (0.075 mg/L-yr; see Table 3) and the study referent exposure level, respectively. By defining the estimated U.S. background arsenic exposure level as the reference point in each individual study (where the odds ratio is 1), in the case of the RRE-US 20 derivations, differences in levels of exposure in the referent group across studies are not expected to strongly influence the final RRE-US 20 estimate. where OR(·) is Eq. (3). Thus, the RRE is the exposure level that satisfies Eq. (5).
A bootstrap method was used to derive confidence intervals for the predicted ORs and RRE estimates. One thousand sets of adjusted cases and controls were randomly generated based on binomial distributions of adjusted cases and controls about the maximum likelihood estimates (MLEs).
For cohort studies, the fitted exposure-response functions f(·) were used to calculate predicted relative risk at each level evaluated in each data set. In addition, RREs were calculated as the exposure levels that satisfy Eq. (6): As for case-control data, RREs were calculated for BMRRs that represent increases in relative risk compared to study-specific estimates of risk at estimated background exposure levels (results for the selected BMRR of 20% are shown in the Supplemental Material).
As for the case-control studies, bootstrap simulation was used to derive confidence intervals for the predicted relative risks and RRE estimates. Because the convention is to assume the relative risk is log-normally distributed, the lower and upper confidence bounds on the adjusted relative risk reported by the authors were used to estimate geometric standard deviations of relative risk within each exposure group. One thousand sets of simulated adjusted relative risks were randomly generated from the resulting lognormal distributions. All exposure-exposure-response models considered were fit to these data sets to derive corresponding confidence intervals.

2.4.4.
Model fit assessment and model selection-For each dataset, the exposure response model generated estimates of log-likelihood, AIC value (averaged across the bootstrap iterations) and χ 2 p-value, estimates of model parameters and 95% upper and lower confidence limits, and predicted risks (ORs for case-control datasets and RRs for cohort datasets) at each exposure level, also with confidence limits (data not shown). During post-processing (see Fig. 2), models with chi-squared p-values less than 0.05 were rejected from consideration. 8 Also, to address model uncertainty, we excluded results from the Michaelis-Menten and/or Exposure 4 models, both of which can have steep, sometimes supralinear exposure-response curves at low exposure levels, when RRE 20 s for these models were a factor of 10 lower than the other models. If multiple models remain after application of these model exclusion criteria (as can be the case for our analysis of cohort studies), the model with the lowest AIC was selected. Finally, to further address model uncertainty imparted by over-extrapolation, datasets that resulted in RRE 20 estimates more than a factor of three below the lowest or above the highest study group exposure central estimates were excluded from consideration.

Selection of a benchmark relative risk percentage
For the purposes of this comparative analysis, a 20% BMRR was selected as the primary risk metric. The 20% BMRR was used to estimate the exposure associated with a 20% increase in relative risk (RRE 20 ). It is used herein exactly the same way as a BMR is used to derive a BMD (U.S. EPA, 2012). Note that the RRE 20 represents a maximum likelihood estimate (MLE), and is not a lower confidence limit like a BMDL.
The 20% effect level was chosen after preliminary examination of the effect sizes and exposure ranges of the input data sets evaluated in this case study. A key consideration was minimizing extrapolating far outside the range of the data. Thus, the focus was on deriving RRE values that were in or near the range of the input data. As compared to other tested BMRRs, for a large majority of the datasets, an RRE 20 estimate was in or near the range of the input data and was therefore considered acceptable for the purposes of this comparative RRB analysis.

Results
The overall database ultimately used for this RRB test case analyses included 255 datasets, resulting in 180 RRE-US 20 and 192 RRE-SP 20 estimates, from 64 studies, and is heterogeneous (e.g., many different countries of origin, study types, exposure metrics, outcomes). A full list of the studies included in this RRB test case analysis and the individual RRE20 and RRB modeling results from ah 255 datasets are provided as supplemental material to this manuscript.
Not every dataset could be fit by the model forms employed. If no model met the model exclusion criteria set forth in Section 2.4.4, the models under consideration did not converge to a satisfactory solution and no outputs (coefficients, predicted risks, RREs) were generated. Also, one dataset (Bates et al., 1995, never smokers) was fit to models that predict a non-positive dose-response that cannot be used for the derivation of RRE 20 estimates.
Modeling results for the 255 datasets evaluated for RRE 20 derivation purposes are detailed in Table 1 and illustrated in Fig. 2. They can be categorized as follows:

1.
235 RRE-US 20 and 235 RRE-SP 20 estimations remained after removing RRE 20 s that were below background or infinite.

2.
202 RRE-US 20 and 202 RRE-SP 20 estimations remained after applying the model fit criteria described in Section 2.4.4.

3.
180 RRE-US 20 and 192 for RRE-SP 20 estimations remained after excluding models that resulted in RRE 20 estimates more than a factor of three below the lowest or above the highest study group exposure central estimates.
Figs. 3 and 4 illustrate dose-response curves for two datasets: Fig. 3 from a case-control study  and Fig. 4 from a cohort retrospective study (Chen et ah, 2010a), where both studies that produced positive and finite estimates of both RRE-US 20 and the RRE-SP 20 from models with acceptable fit, and RRE 20 estimates were within a factor of three of the range of exposure data reported in each study. Examples of the model uncertainty such as non-positive dose response (Fig. S-47) and uncertainty with the Michaelis-Menton and Exponential models (Fig. S-48) are provided in the Supplemental Material with additional discussion. The counts for datasets included in the RRB-US and RRB-SP analyses are provided in Table 1. Figs. 5 and 6 show individual and median RRB US (Fig. 5) and RRB-SP (Fig. 6) results for health outcome-specific preclinical/subclinical, clinical nonfatal and clinical fatal health effect categories, organized from left to right by greatest to least number of supporting datasets. Medians are denoted with a cross hatch and shown only where the health outcome had more than 1 dataset with acceptable models fitted. As shown in Table 1 and Fig. 2, datasets with acceptable model fit and RRE 20 estimates that did not lie far outside the range of data in the underlying study population (i.e., where the RRE 20 was not more than three times below the lowest exposure group's mean or three times above the highest exposure group's) were used in the RRB analysis. Table 7 presents the results shown in Figs. 5 and 6 in tabular form, including the ranges of RRB estimates for each health outcome.

Discussion
The results of the RRB test case analysis suggest that diseases of the circulatory system, bladder cancer and lung cancer are likely to be important outcomes of concern for evaluating the potential health effects of inorganic arsenic exposure. Diseases of the circulatory system, bladder cancer and lung cancer have the most studies and datasets that meet the criteria for RRB derivation. In particular, they are responsible for 121 (54 diseases of the circulatory system, 40 bladder cancer and 27 lung cancer) of the 180 datasets that met all criteria for RRB-US derivation (67%), and 131 (55 diseases of the circulatory system, 49 bladder cancer and 27 lung cancer) of the 180 datasets that met ah criteria for RRB-SP derivation (68%) (see Table 1 and Supplemental Material). Notably, these outcomes also have a high percentage of RRB-US values below 10 (i.e., RRE-US 20 values within 10-fold of estimated U.S. central tendency background exposure levels), 50%, 52.5%, and 22%, respectively (see Supplemental Tables S-30A, S-26A, and S-33A). Skin cancer and skin lesions have fewer studies and datasets that meet the criteria for RRB derivation. They are only responsible for 27 (7 skin cancer and 20 skin lesion) of the 180 datasets that met ah criteria for RRB-US derivation (15%), and 27 (8 Skin cancer and 20 skin lesion) of the 192 datasets that met ah criteria for RRB-SP derivation (14%). They also resulted in lower percentages of RRB-US values below 10, 14% and 15%, respectively (see Supplemental Tables S-44A and S-42A).
Ah of the remaining six health outcomes (diabetes, renal cancer, liver cancer, immune effects, pregnancy outcomes and non-malignant respiratory disease) have fewer studies and datasets that meet the criteria for RRB derivation. The number of datasets that meet ah study inclusion, model fit, and RRE 20 reporting criteria, range from 0 for immune effects to 10 for renal cancer RRB-US derivations (see Table 1 and Supplemental Material Sections 1.2 and 1.4). However, the percentage of RRB-US estimates below 10 was high for several of these health outcomes, including 5 of 7 (71%) for diabetes, 3 of 6 (50%) for liver cancer, 2 of 4 (50%) for nonmalignant respiratory disease, and 5 of 10 (50%) for renal cancer (see Supplemental Tables S-28, S-32, S-36 and S-40). These RRB-US results suggest that, based on this preliminary analysis of limited number of datasets, diabetes, renal cancer, liver cancer and nonmalignant respiratory disease may represent potentially sensitive health outcomes that could warrant more complex, higher tier dose-response analyses.
As stated previously, RRB-SP estimates may not be as relevant to low exposure populations like the U.S. However, they provide study-specific estimates of potential use for comparing approximations of chemical potency across health effects that involve less extrapolation and, consequently, less model uncertainty and less RRB estimation variability. They tend to be associated with higher background levels, resulting in lower RRB estimates. As can be seen from Table 7, RRB-SP estimates are below 10 for all 15 health outcomes for which RRB values were derived (100%), whereas only 8 of 15 RRB-US values are below 10 (53%). Rare exceptions can occur when the study background estimate used in the RRB-SP derivation is lower than our estimate of U.S. background (Table 3). Such is the case for two RRP-SP pregnancy outcome values derived from a study of fetal loss and infant death in relation to arsenic drinking water exposure to Bangladeshi women (Rahman et al., 2007). For these RRP-SP derivations, the study reference group (background) water concentration was 1 μg/L, below our estimated U.S. background of 1.5 μg/L, and the reported RRs did not increase by 20% (i.e., did not reach 1.2), even at the highest, 500 μg/L exposure group. These two RRB-SP estimates were much higher than the other three RRB-SP estimates for pregnancy outcomes, resulting in a high mean, but low RRB-SP value for this health category. In general, however, the RRB-SP values are consistent with the RRB-US results with respect to what they suggest about potential differences in arsenic's relative potencies across health outcomes, with clinical non-fatal lung and skin cancer studies resulting in the highest and clinical non-fatal diabetes studies resulting in the lowest RRB medians.
When applying this type of approach, investigators should consider whether dataset selection criteria are inordinately biasing results towards or away from the null. For example, in our approach, the exclusion of RRE 20 results from adequate model fits that are below background, infinite or more than 3-fold above the high observed exposure mean (indicating extremely flat or negative slopes) has the potential to bias away from the null and the exclusion of RRE 20 results from adequate model fits that are 3-fold below the lowest observed exposure mean but above background (indicating extremely steep slopes) has the potential to bias towards the null. Of the 255 datasets considered in our arsenic case study, 36 RRE-US 20 (14%) and 21 RRE-SP 20 (8%) were excluded because they reflected extremely flat or negative slopes and 4 RRE-US 20 (2%) and 0 RRE-SP 20 (0%) were excluded because they reflected extremely steep slopes. Thus, while more RRE 20 s indicating flat or negative slopes were excluded, a large majority of the RRE 20 derivations are consistent with a positive dose-response, and there is little evidence that our exclusion criteria introduced a strong bias away from the null.
It should be recognized that this type of comparative screening approach, RRB analysis, is just one tool that can be used to help prioritize and focus analyses of large databases. Other considerations during the full assessment development process may play a prominent role in guiding the focus of human health assessments, including whether datasets from the same study should be treated independently, whether studies examined potentially susceptible populations or life-stages, whether studies provide adequate information to address response confounders (e.g., smoking), whether the exposure data are sufficient for estimating lifetime daily intake and the importance of the evaluated health outcome(s) for cost-benefit analyses.
In addition, the RRB approach described here is not applicable to health outcomes that are generally characterized by continuous response measures (e.g., IQ) and not RR or OR estimates. Further, the RRE 20 is not meant to represent an exposure associated with a "clinically significant" change in a health outcome or to have any other policy-relevant interpretation other than for purposes such as those described for this RRB analysis, particularly the identification of studies and health outcomes that may warrant further consideration for use in dose-response analysis. Depending on the severity of the health outcome being modeled and the established background risk of disease in the (unexposed) general population (i.e., not estimated in the RRB analysis), the public health significance of a 20% change in relative risk could have very different public health implications. For instance, a 20% increase in the relative risk for a health outcome with a high background lifetime risk might be viewed as having more serious public health implications than would a 20% increase for a health outcome with a low background lifetime risk. In this case, the background lifetime risk for most of the health outcomes under consideration for arsenic are comparable and below 5% with the notable exceptions of the high background lifetime risk in the United States for diabetes, estimated at 32.8% for males and 38.5% for females born in 2000 (Narayan et al. 2003) and for diseases of the circulatory system such as cardiovascular disease, estimated at 66% (2/3) and 50% (1/2) for males and females, respectively, at 40 or 70 years of age (Go et al. 2014).
With respect to diseases of the circulatory system, the smaller RRBs in Figs. 5 and 6 for the more severe (e.g., fatal) endpoints compared to the less severe (e.g., preclinical) endpoints might seem counter-intuitive at first. However, there are at least two possible reasons for this observation. First, pre-clinical measures are just one of many factors associated with the risk of dying from diseases of the circulatory system. Second, the pre-clinical studies are generally in young to middle age adults for which death from diseases of the circulatory system is not imminent. While the risk of an individual pre-clinical factor in a young or middle age adult might be low, the accumulation and progression of multiple factors as that individual ages can result in a relatively high risk of fatality over a lifetime. Thus, what the study-specific data are showing more explicitly is that, in some circumstances, the inorganic arsenic exposure potentially associated with a 20% increase in the relative risk of a young or middle age adult getting one of many pre-clinical markers of diseases of the circulatory system (e.g., cIMT, an indicator of atherosclerosis) can be higher (lower RRB) than the inorganic arsenic exposure potentially associated with a 20% increase in the lifetime relative risk of death from diseases of the circulatory system, a health outcome associated with a lifetime progression of multiple pre-clinical factors.

Conclusions
The RRB analysis described in this study represents a systematic and pragmatic method for analyzing large databases of exposure-or dose-response data. In applying this type of approach, it is important to achieve a useful balance between study quality, model fit criteria and the number of RRB derivations. The study quality and model fit criteria applied in this arsenic case study resulted in at least six RRB derivations for 73% (8/11) of the health outcomes considered. Depending on the size and quality of a chemical's database and the purpose of the investigation, it might be reasonable to apply moderate adjustments to selection criteria to achieve an adequate the number of RRB derivations. Limitations of our approach include the inability to address health outcomes typically characterized by response metrics other than ORs and RRs (e.g., continuous response measures such as IQ) and the potential for model uncertainty associated with extrapolating RRE-US 20 values from studies with substantially higher reference group exposures. This inorganic arsenic example illustrates how an RRB analysis can be helpful for identifying studies and health outcomes that may warrant more detailed and sophisticated analyses. As illustrated in companion case studies, more sophisticated dose-response methods can include model averaging approaches (Mendez et al., 2020) or multistudy Bayesian meta-regression (Allen et al., 2020a(Allen et al., , 2020b. Although this RRB analysis used inorganic arsenic data as a test case, it can potentially be implemented for any chemical of concern with a large database of epidemiologic studies in order to prioritize health outcomes and datasets, and guide structured and/or tiered dose-response analyses.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Fig. 4.
Example plots of RRE-SP 20 and RRE-US 20 derivations for cohort study evaluating drinking water exposure and lung cancer by Chen et al. (2010a).  Arsenic Study RRB-US Estimates by Health Outcome and Endpoint Category. Individual and median RRB-US estimates for health outcomes with > 25 and ≤ 25 datasets supporting the derivation of RRE-US 20 estimates from studies for which the estimated RRE-US 20 estimate was not more than a factor of three below the central estimate for the lowest dose group or above the central estimate for the highest dose group of the study. Arsenic Study RRB-SP Estimates by Health Outcome and Endpoint Category Individual and median RRB estimates for health outcomes with >25 and ≤25 datasets supporting the derivation of RRE-SP 20 estimates from studies for which the estimated RRE-SP 20 estimate was not more than a factor of three below the central estimate for the lowest dose group or above the central estimate for the highest dose group of the study.    Rating criteria for exposure-or dose-dichotomous response datasets.

Criteria
Health outcome Representativeness of referent group/controls Well documented reports that compare referent to exposed groups for key variables preferred over reports that do not provide such documentation or document major differences between referent and exposed groups.
Sufficient number of subjects, cases A sufficient number of cases to conduct reliable statistical analyses (most applicable to cohort cancer studies) preferred; desirable to have > ~5 cases/exposure group a Studies that report "0" for control exposures were excluded from consideration for RRB-SP derivations due to lack of a valid referent group background exposure estimate to use for the denominator of the RRB-SP equation. b An exception is when the subject health outcome studied is associated with renal impairment that could substantially impact clearance rates resulting in higher blood creatinine but lower urinary creatinine in cases relative to controls.   Table 4 Outcome types, domains, and specific outcome names considered in the RRB analysis.  Table 5 Example input data from case-control studies.