Refined reference doses and new procedures for phthalate mixture risk assessment focused on male developmental toxicity

New procedures for phthalate mixture risk assessments (MRAs) focused on male developmental toxicity (antiandrogenicity) are overdue. Previous efforts suffer from several shortcomings: There is a lack of consistency in terms of the phthalates entered into the assessments, and in the choice of tolerable intakes. Many of these values do not reflect new evidence about low dose male developmental effects. Nearly all previous mixture risk assessments have focused solely on phthalates, with no regard for exposures to other chemicals that also induce male developmental toxicity, leading to underestimations of risks. Here, we address these weaknesses and inconsistencies by proposing criteria for the selection of phthalates for MRA based on structure-activity relationships. We suggest new reference doses for phthalates for use in MRA, as follows: DBP 6.7 μg/kg/d, DIBP 100 μg/kg/d, BBP 10 μg/kg/d, DEHP 10 μg/kg/d, DINP 59 μg/kg/d. We conclude that the fixation on the Hazard Index (HI) = 1 as signalling acceptable combined phthalate exposures is misguided as it ignores co-exposure to other anti-androgenic chemicals that also contribute to male developmental risks. Until more comprehensive assessments of phthalates in combination with other anti-androgens become feasible, we propose the use of a HI of 0.1–0.2 as a benchmark for interpreting phthalate mixture risk assessments.


Introduction
Phthalates (for abbreviations, trivial names and side chain lengths see Table 1) are priority pollutants widely used as additives and plasticisers in a multitude of consumer articles such as plastic bags, polyvinyl flooring and personal care products. They also occur in pharmaceuticals, medical devices, cleaning materials and children's toys. Due to their high production volume and extensive use, multiple phthalates have entered the indoor environment and the food chain, with widespread human exposure.
Exposure to some of the phthalates is a concern as these substances are developmental and reproductive toxicants. Studies with laboratory rodents exposed during gestation have shown that phthalates with a certain side chain length induce agenesis of testis and epididymis, together with often complete agenesis of the gubernacular cords and poor semen quality (Gray et al., 2004). This effect spectrum ("phthalate syndrome", also referred to as "anti-androgenicity") derives from the ability of phthalates to suppress InsL3 peptide hormone production and testosterone synthesis in foetal Leydig cells. Without InsL3, the gubernacular cord cannot develop properly, leading to the disruption of testis descent. Phthalates also down-regulate genes involved in the transport of cholesterol, a precursor required for androgen synthesis. The resulting diminished testosterone levels in foetal Leydig cells alter their developmental trajectory such that they continue to proliferate but fail to differentiate properly. Consequently, Leydig cells in phthalate exposed foetal testes typically appear in large clusters. The suppressed testicular testosterone levels also result in reduced sperm numbers and have knock-on effects on dihydrotestosterone (DHT) concentrations. Reduced DHT action leads to shortened (feminised) anogenital distance (AGD) and retained nipples in male rats exposed to phthalates in foetal life (Schwartz et al. 2019). Feminised AGDs have also been associated with phthalate exposures in humans (Dorman et al., 2018) and there are associations with reduced InsL3 levels (Chang et al., 2017). As is common with many endocrine disruptors, phthalates affect multiple endpoints that exist as a constellation and constitute a syndrome.
Several experimental studies have shown that multiple phthalates act together to produce adverse reproductive and developmental effects (reviewed in Howdeshell et al. 2017;Kortenkamp, 2019). The combinations usually show stronger effects than any single phthalate in the https://doi.org/10.1016/j.ijheh.2019.113428 Received 19 September 2019; Received in revised form 28 November 2019; Accepted 2 December 2019 mixture on its own. In the light of such observations, the US National Academy of Sciences has called for mixture risk assessments of phthalates and suggested the use of the Hazard Index (HI) approach (USNAS, 2008).

Previous phthalate mixture risk assessments
The HI is a simple screening tool in mixture risk assessments (MRA) based on summing up risk quotients (RQ) of exposures and so-called reference doses (RfD) for relevant health endpoints. The utility of the HI approach for combined phthalate exposures was first shown by Benson (2009) and then expanded by Kortenkamp and Faust (2010) to include exposures to other substances capable of producing reproductive and developmental toxicity by endocrine modes of action. Since then, many phthalate MRAs have been published. They all focused on effects related to gestational exposure and the phthalate syndrome and have, to varying degrees, revealed concerns about combined phthalate exposures. However, there are several problems with these efforts: First, comparisons of evaluation outcomes across studies are complicated due to a lack of consistency in terms of the phthalates entered  Table 1 Phthalates and their common names.
Grey: Phthalates active in inducing the phthalate syndrome, a form of male developmental disorders.
A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 into the assessments (Table 2). Some studies considered only 2 or 3 phthalates, most included DBP, DIBP, BBP, and DEHP, while some left out DINP. Yet others entered DEP and DIDP. Recent investigations of structure-activity relationships (Li et al., 2019) might provide systematic criteria for the choice of phthalates to be considered in MRAs. Second, similar inconsistencies concern the choice of RfD for the single phthalates. Most studies based their assessment on the tolerable daily intakes (TDI) published by EFSA or the RfD AA for developmental toxicity through endocrine modes of action proposed by Kortenkamp and Faust (2010). Christensen and colleagues (Christensen et al. 2014) also considered the TDIs proposed by the Danish EPA (2009) and introduced a revised value for DEHP. As there is no EFSA TDI for DIBP, most authors filled this gap by adopting the EFSA TDI for the straight chain analogue DBP (EFSA, 2005a). However, since 2010, new data about DIBP have become available which could be used to arrive at more accurate assessments (Yost et al., 2019). Furthermore, the EFSA TDIs for DINP and DIDP that were used in phthalate MRAs are based on effects unrelated to reproductive and developmental toxicity, such as hepatic or renal effects. This compromises the comparability and consistency of the risk quotients that are summed up when using the HI approach. The RfD for DINP requires an update in the light of evidence of low dose effects that has emerged since 2010 and the same applies to DEHP. Thus, it is timely to review the RfDs for phthalates and to derive new and harmonised values.
Third, as argued by Wilkinson and colleagues (Wilkinson et al., 2000), the outcome of MRAs using the HI approach may be biased through the use of RfDs that were derived by application of differing uncertainty factors (UF). As shown in Table 3, UFs between 100 and 1000 were commonly used to establish phthalate RfDs. The overall UFs are the composite of two different kinds of sub-factors: Those for the adjustment of differences in the data quality of experimental values (e.g. LOAEL to NOAEL extrapolations) and those necessary for the realisation of protection goals enshrined in various chemical legislations (e.g. animal to human extrapolation, increased protection against carcinogens). The mixing of these different UFs may then yield RfDs that are a poor reflection of the potency of the various phthalates. The resulting distorting effect may not be so much on the quantitative outcome of the HI MRA, but rather on the rank order of the single risk quotients (RQ). This may lead to misguided decisions in terms of the phthalates to be prioritised for risk reduction measures in risk management. However, in mixture toxicology, data that strictly reflect the toxic potency of mixture components must be used for the evaluation of experimentally observed combined effects. Although the problem does not arise when using the HI with RfDs based on the same UFs, it may be more appropriate to use mixture risk assessment methods that implement strict separations of the subclasses of UFs. One such method is the Point-of Departure-Index (PODI) which sums up RQs derived from Points of Departure (NOAELs, benchmark doses).
Last, except for Kortenkamp and Faust (2010), all studies have focused solely on phthalates, without taking account of other chemicals capable of disrupting male sexual development through hormonal modes of action. The unspoken assumption behind this stance is the belief that only chemicals with a common mechanism of action qualify for inclusion in MRA. Without much further consideration, the practical implementation of this commonality criterion (which is mandated in certain legalities for certain chemicals, such as for pesticides in the US Food Quality Protection Act) then reduces to simply grouping chemicals with similar structural features, such as phthalates. However, this A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 largely ignores the recommendations of USNAS (2008) which called for including other endocrine disruptors in phthalates MRAs. There is ample evidence that chemicals capable of inducing male developmental toxicity through a multitude of mechanisms can act together with phthalates to produce adverse effects (Howdeshell et al., 2017;Conley et al., 2018;Kortenkamp, 2019). This includes chemicals as diverse as anti-androgenic pesticides (vinclozolin, prochloraz, procymidone, linuron), pain killers (paracetamol, aspirin, ibuprofen), other pharmaceuticals (finasteride, ketoconazole, the lipid-lowering drug simvastin), poly-chlorinated dibenzo-dioxins and other dioxin-like pollutants and phenolics (bisphenol A, butylparaben) (Kortenkamp, 2019). Consideration of co-exposures to such substances is critical when it comes to interpreting exceedances of cumulative acceptable exposures to phthalates. In all phthalate MRAs published thus far, an HI of up to 1 was judged acceptable. However, this leaves no room for co-exposures to any of the above chemicals. Thus, new criteria are required for the quantitative evaluation of the outcome of phthalate MRAs.
To put the derivation of new phthalate reference doses for MRA on a sound footing, we will first consider data requirements for MRA from the viewpoint of the scientific principles of experimental mixture effects assessments. This will clarify the data needs that should ideally be met for MRA and will provide valuable orientations for the procedures and approaches we propose.

Data requirements for mixture risk assessments and the simplifying assumptions of the HI and PODI approaches
In experimental assessments of mixture effects (Howdeshell et al., 2008a,b;Christiansen et al., 2009), the expected combined toxicity is calculated based on information about the toxicity of all mixture components and their prevalence in the mixture. With the aim of establishing common effect doses, the effects of all components are determined for the same measurement endpoint and effect magnitude, in the same test under identical experimental conditions, usually by dose-response analysis.
These data requirements cannot be met in MRAs conducted in regulatory practice. Instead, MRAs usually have to rely on data from different experimental studies and with differing measurement endpoints. To overcome this problem and to make MRA viable with readily available data, the HI (Teuschler and Hertzberg, 1995) and PODI (Wilkinson et al., 2000) were introduced as pragmatic simplifications of the dose addition concept. These simplifications concern the following aspects: Dose addition relies on RQs derived from doses that correspond to the same effect magnitude (effect doses such as ED10 etc.). However, the RfDs that are used as input values to obtain the RQs in the HI do not always correspond to a uniform effect dose. As already discussed, UFs of differing magnitudes may have a distorting influence. In addition, the NOAELs and LOAELs that are needed to derive a RfD are not toxicological metrics associated with the same effect magnitude. Instead, these values are single point estimates that do not incorporate doseresponse data and are artefacts of experimental design (dose selection, dose spacing). They may correspond to varying effect magnitudes of up to 30% (Moore and Caux, 1997).
To blur matters further, the NOAELs and LOAELs used to derive RfDs for phthalates may refer to different toxicity endpoints determined in different test species. For example, because the EFSA TDIs for DIDP and DINP were based on hepatic toxicity (for DIDP in the dog, DINP in the rat), any HI based on these EFSA values together with those for other phthalates mixes estimates of developmental and reproductive toxicity with those of hepatic toxicity measured in different species. While this is an extreme case, the phthalate RfDs derived for male developmental toxicity (the phthalate syndrome) also do not always correspond to the same measurement endpoints. Some rely on suppression of testosterone synthesis, others e.g. on AGD changes. To a certain degree, this cannot be avoided and is due to the multiple endpoints affected by phthalates which materialise as a syndrome.
Thus, the HI introduces four sources of uncertainty that impact on A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 the accuracy of assessments of mixture risks: it mixes different UFs, inexactly defined effect sizes, different toxicity endpoints and different species. While it will be impossible to reach the consistency of the data that form the input for evaluating experimental mixture studies, efforts should be made to choose input values that approximate these standards as much as possible. For phthalate MRAs, this could be achieved in the following ways: First, to mitigate the potential distorting influence of different UFs, Wilkinson et al. (2000) suggested that UFs needed to even out differences in data quality should be distinguished from those required to realise protection goals. For example, UFs of 3-10 are applied to extrapolate from LOAELs to NOAELs (ECHA, 2012). Such adjustments are necessary to obtain toxicological metrics that better reflect potency differences between phthalates. By allowing the summing up of RQs proportional to the potency of the chemicals in the mixture, they achieve better alignment with the scientific principles of dose addition. In contrast, UFs applied for realising protection goals go far beyond matters related to potency. They are used to deal with numerous other issues, including animal to human extrapolations and protection from severe and irreversible toxicity such as carcinogenicity for which very large UFs (1000) can be applied. In principle, the HI is ill-suited to achieve the necessary separation of potency-adjustment issues from policy-driven considerations, as it aggregates the different UFs at the level of the RQ for a single mixture component. In this respect, the PODI is better equipped to accomplish the required approximation to the principles of dose addition. By summing up RQs based on points of departure, it employs toxicological metrics more closely related to potency differences. The realisation of protection goals is achieved by applying an UF to the sum of RQs, the PODI, i.e. at the level of aggregation of mixture effects, and not at the level of a single mixture component. In this way, more transparency can be realised by detaching UFs employed for purposes of data adjustment from those dealing with other extrapolation issues. If, however, the UFs for all components are of the same magnitude, these distinctions are irrelevant. Under such conditions both HI and PODI will yield similar results. As shown in Table 3, UFs ranging from 100 to 1000 have been applied in the case of phthalates, and it remains to be assessed whether this has a distorting effect on HI calculations.
Second, to deal with the distorting influence of inexactly defined effect sizes associated with NOAELs and LOAELs, it is desirable to rely on benchmark doses as much as possible, if information suitable for the estimation of benchmark doses is available. For the RfDs proposed by Kortenkamp and Faust (2010) this could only be realised for DBP, DIBP and BBP. We recognise that it may be difficult to achieve this goal, due to limitations in data quality, especially with quantal response data (e.g. number of animals affected by spermatocyte dysgenesis).
Third, the ideal solution to the problem of mixing different toxicity endpoints during the summing up of RQ would be to select an outcome common to all phthalates. The suppression of foetal testicular testosterone levels seen after gestational exposure to phthalates would be a suitable candidate. However, exclusive reliance on this endpoint may lead to unsatisfactory situations. Some phthalates produce effects associated with the phthalate syndrome at lower doses than those associated with suppressions of testosterone synthesis. An example is DEHP: The lower one-sided 95% confidence interval of the benchmark dose associated with testicular testosterone suppression is 31 mg/kg/d (USNAS, 2008), calculated from the data by Howdeshell et al. (2008a,b). However, the NOAEL for retained nipples is 3 mg/kg/d (Christiansen et al., 2009), ten-fold lower. Even at the NOAEL of 3 mg/ kg/d there was mild dysgenesis of the genitalia in gestationally exposed male rats, which means that this dose must be designated a LOAEL and not a NOAEL (Christiansen et al., 2010). This example shows that it is necessary to strike a balance between achieving consistency in the interest of approximating the scientific principles of dose addition and being protective by choosing the most sensitive endpoint. When establishing RfDs for use in MRA it is suggested to resolve this issue by selecting toxicity measurement endpoints related to elements of the phthalate syndrome. For each phthalate, the most sensitive phthalate syndrome endpoint should be used. This will exclude the use of RfDs based on toxicity endpoints related to general toxicity, such as hepatic or renal toxicity. The EFSA TDIs for DINP and DIDP are therefore unsuitable for phthalate MRA with a focus on male developmental toxicity. In general, it should be emphasised that RfDs derived for use in MRA are not suited for general, single compound-based risk assessments. To make this distinction, we will designate them as RfD AA (AA = anti-androgenicity), as previously in Kortenkamp and Faust (2010).

Structure of this paper
In this paper, we review the TDIs for phthalates used in previous MRAs and establish new revised RfD's and corresponding Points of Departure (POD) for male developmental toxicity through anti-androgenic modes of action (the phthalate syndrome). We apply the revised RfD's to phthalate intake estimates used in previous MRAs and compare the outcome with earlier evaluations. This is followed by a comparison between the HI approach and the PODI. Finally, we develop criteria that may help to better judge the point when acceptable combined exposures to phthalates are exceeded by taking account of potential coexposures to other developmental toxicants.
To identify studies that appeared after 2010 which describe effects related to the phthalate syndrome below the doses of studies used to establish the TDIs by EFSA (EFSA, 2005b) Kortenkamp and Faust (2010) (Table 3), we relied on the systematic reviews by Dorman et al. (2018) and Yost et al. (2019). In addition, we conducted literature searches (Web of Science and pubmed) using the search string "(DINP OR DiNP OR diisononylphthal*) AND development*". Equivalent searches were conducted for DBP, DIBP, BBP and DEHP. The records were manually searched for relevance and only studies describing developmental toxicity relevant to the phthalate syndrome at doses lower than those used to derive existing reference doses in the rat or the mouse were included.

Benchmark dose estimations and derivation of reference doses
We conducted benchmark dose modelling using a benchmark response of 5% for suppression of testosterone synthesis, a continuous response endpoint. For quantal endpoints, relevant to DBP and DEHP (reduced spermatocyte development and testicular dysgenesis, respectively) we applied a benchmark response of 10%. However, due to limitations in data quality of the relevant studies (Lee et al. 2004;A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 Christiansen et al. 2010) this returned benchmark doses (lower bounds) of 0. For the calculation of the benchmark dose confidence intervals for DIBP, BBP and DINP we employed model averaging, as recommended in EFSA guidance (EFSA et al., 2017), and used the lower bound benchmark dose (BMDL) as point of departure (PROAST tool version 66.39 (EFSA et al., 2017)).
To derive Reference Doses for anti-androgenicity (RfD AA) the BMDL were combined with an UF = 100. The outcome of the BMDL estimations is reported in Supplementary Information.

Phthalates to be included in mixture risk assessments
Not all phthalates can induce the phthalate syndrome, and among those that show effects, there are marked potency differences. According to the extensive investigations of structure-activity relationships conducted by Li et al. (2019) and Furr et al. (2014), potency depends on the ester side chain length and the degree of branching of the side chain. Phthalates with too short (C1 -C3) or too long (> C8) linear side chains are of very low potency, while side chain lengths of between C4 and C7 confer strong activity. With branched side chains, a length of between C4 and C9 is associated with male developmental toxicity. Based on these insights, phthalates with linear side chains of C1-C3 and > C8 can be excluded from the MRA. The same applies to branched side chains with C1-C3 and > C10. Thus, the phthalates to be subjected to a MRA should comprise (arranged according to alkyl chain length): DIBP, DBP, DiPP, DPP, DNHP, DCHP, BBP, DIHP, DHP, DEHP and DINP (Table 1). To accommodate future shifts in usage patterns that might arise from phthalate substitutions, this selection is based entirely on criteria of hazard assessment and does not consider usage or human exposure. This list might not be exhausting and might include additional phthalates with mixed alkyl chain lengths such as Di-C7-11-(linear and branched)-alkyl phthalate (DHNUP) which contains several relevant structural motifs.

Revised and updated PODs and RfDs AA for use in MRA
Previous phthalate MRAs have employed mainly two sets of RfDs, the TDIs derived by EFSA (2005a, b, c, d) and the RfD AA suggested by Kortenkamp and Faust (2010) (Table 3). Some studies also used USEPA RfDs, but as these values are based on toxicity endpoints not relevant to the phthalate syndrome they are not considered here. Christensen et al. (2014) are the only authors who also relied on the anti-androgenicity TDIs derived by the Danish EPA (2009). As shown in Table 3, these sets of values vary to a certain degree.
Our goal is to propose a new set of values, derived by following clear and consistent decision rules that accommodate the requirements for MRAs, as outlined in the Introduction. We suggest achieving consistency by relying as much as possible on a common endpoint relevant to the phthalate syndrome measured in the rat. For most phthalates this will be suppression of testosterone synthesis in the foetal testis, a key event for the manifestation of the phthalate syndrome. If there is evidence of other, more sensitive effects related to the syndrome that occur at lower doses, the corresponding PODs will be chosen as the basis for RfDs.
The question of adversity deserves consideration. As discussed, antiandrogenicity in terms of the phthalate syndrome materialises as a constellation of effects, including poor semen quality, incomplete development of spermatocytes, abnormal clusters of Leydig cells, multinucleated gonocytes, suppression of testosterone synthesis in foetal testes, changes in AGD, retained nipples, non-descending testes, agenesis of the epididymus and malformations of the external genitalia. Among toxicologists (but not endocrinologists) there have been debates about the adversity of these effects, taking each of these in isolation. For example, changes in AGD are considered as markers of androgen insufficiency, but regarded by some as not adverse in themselves. However, OECD guidance classes AGD changes as adverse and recommends their use for estimating NOAELs (OECD, 2008). Similarly, the relevance of retained nipples in male rat offspring for human risk evaluations is sometimes dismissed with the argument that this effect does not occur in humans, but OECD guidance (OECD, 2013) stipulates that retained nipples should be evaluated similarly to AGD changes. In view of the underlying modes of action it is imperative to consider the effect spectrum as a whole and to regard any of its manifestations as adverse.

Dibutyl phthalate (DBP)
Our literature searches revealed that the RfDs for DIBP, BBP, DEHP and DINP require updates, while the value of 10 μg/kg/d or 6.7 μg/kg/ d for DBP proposed by EFSA (EFSA, 2005a) and the Danish EPA (2009), respectively can be taken without changes. Both are based on the study by Lee et al. (2004) which estimated a LOAEL of 1-3 mg/kg/d for reduced spermatocyte development in the rat. Our searches did not identify a study that documented effects at lower doses. The only difference between the EFSA and Danish EPA TDIs concerns the choice of the UF (EFSA: 200; Danish EPA: 300). While EFSA did not itemise the subfactors that made up their UF of 200, it presumably is composed of a LOAEL to NOAEL extrapolation UF of 2 and a UF of 100 for specieshuman extrapolation. In line with current ECHA and ECETOC guidance (ECHA, 2012; ECETOC, 2010) which regards UFs between 3 and 10 as appropriate, the Danish EPA (2009) employed 3 for LOAEL to NOAEL conversion (assuming a LOAEL of 2 mg/kg/d). The NOAELs for testosterone synthesis suppression listed by Dorman et al. (2018) range between 20 and 112 mg/kg/d, higher than the LOAEL reported by Lee et al. (2004). The NOAELs for AGD changes and hypospadias are between 50 and 500 mg/kg/d. Thus, the Lee study is still the critical one, and accordingly, we follow the Danish EPA and propose a POD of 0.67 mg/kg/d for use in calculating a PODI. Combined with the standard UF of 100, this translates into a RfD AA of 6.7 μg/kg/d (Table 4). We attempted to derive a lower bound benchmark dose (BMDL) from the Lee et al. data, but this resulted in an estimate of 0, due to the large variability in the data for reduced spermatocyte development (all benchmark dose estimations can be found in Supplementary Material).

Diisobutyl phthalate (DIBP)
We based our re-evaluation of RfDs for DIBP on the systematic review by Yost et al. (2019) which highlighted the studies by Saillenfait et al. (2008), Howdeshell et al. (2008a,b) and Hannas et al. (2011) as highly reliable. These three studies reported some of the lowest PODs. Saillenfait and colleagues established a NOAEL of 125 mg/kg/d for shortened AGD and retained nipples. In this study they did not evaluate suppressions of testosterone synthesis, as this can only be done by sacrificing foetuses which precludes subsequent measurements of AGD and retained nipples. With testosterone suppression as the endpoint, Hannas et al. and Howdeshell et al. estimated a quite similar, but slightly lower, NOAEL of 100 mg/kg/d. Using the data by Hannas et al., we calculated a BMD of 50 mg/kg/d, with a BMDL of 10.9 mg/kg/d and propose this value as a POD for phthalate MRA using the PODI. By application of a UF = 100 for animal -human extrapolation, this converts to a RfD AA of 100 μg/kg/d (Table 4).

Benzyl butyl phthalate (BBP)
For BBP, both EFSA (EFSA, 2005c) and the Danish EPA (2009) chose the study by Tyl et al. (2004) as the basis for their TDIs. Tyl et al. estimated a NOAEL of 50 mg/kg/d based on shortened AGDs. However, in relation to suppression of testosterone synthesis, the more recent study by Furr et al. (2014) estimated 33 mg/kg/d as the NOAEL. In the interest of achieving consistency in terms of endpoints, we selected testosterone suppression as the basis for deriving an RfD AA. From the Furr et al. data we estimated a BMD of 4 mg/kg/d with a BMDL of 1 mg/kg/d and suggest this value as a POD for use in phthalate MRAs. With a UF = 100 this converts to 10 μg/kg/d as a RfD AA (Table 4).

Di(ethylhexyl) phthalate (DEHP)
In deriving their TDIs for DEHP, EFSA (EFSA, 2005b) and the Danish EPA (2009) relied on a technical report by Wolfe and Layton (2003) which estimated a NOAEL of 5 mg/kg/d for reduced testis weight and effects on gametes in the rat. Many of the other endpoints that make up the phthalate syndrome are responsive at higher doses: Furr et al. (2014) saw significant reductions of testosterone synthesis at 100 mg/kg/d, the lowest dose for this endpoint among the studies collated by Dorman et al. (2018). Selection of testosterone suppression as the endpoint would make the DEHP POD comparable with DIBP and BBP, but as this dose is considerably higher than the NOAEL from Wolfe and Layton (2003) there are concerns about achieving sufficient protection. Effect doses related to AGD changes suffer from similar disadvantages: Dorman et al. identified the lowest reported NOAEL for AGD changes as 10 mg/kg/d based on the study by Christiansen et al. (2010). In agreement with Christensen et al. (2014) we identify Christiansen et al. as the critical study that detected effects below 5 mg/ kg/d. These authors estimated a NOAEL of 3 mg/kg/d for retained nipples. At this dose, however, mild dysgenesis of genitalia became apparent which designates this dose as a LOAEL. Christensen et al. employed a LOAEL to NOAEL extrapolation factor of 10 which, together with the standard UF = 100 produced a RfD of 3 μg/kg/d. However, in line with ECHA and ECETOC guidance (ECHA, 2012;ECETOC, 2010), we propose a UF = 3 for a LOAEL to NOAEL conversion to yield a POD of 1 mg/kg/d. With the standard UF = 100 this gives a RfD AA of 10 μg/kg/d (Table 4). We attempted to derive a BMDL from the Christiansen et al. data, but due to the relatively large variability in the responses, this produced a BMDL of 0. There was evidence of a clear statistical dose-related trend in the formation of dysgenesis of genitalia: the Akaike Information Criterion (AIC) of the best fitting model was lower by more than 2 units than the AIC of the null model (Supplementary Information).

Diisononyl phthalate (DINP)
The TDI proposed by the Danish EPA (2009) for DINP is 1600 μg/ kg/d, based on a NOAEL of 276 mg/kg/d for reduced testes weights in mice. Since 2010, important studies appeared which documented effects at lower doses (Boberg et al. 2011). Critical is the study by Clewell et al. (2013) who estimated a NOAEL of 50 mg/kg/d for testosterone synthesis suppression and multi-nucleated gonocytes. Based on the Clewell et al. data we estimated a BMD of 80 mg/kg/d with a BMDL of 5.9 mg/kg/d which we propose as a POD. This value translates into 59 μg/kg/d as the RfD AA by application of UF = 100 (Table 4). Effect doses based on other phthalate syndrome endpoints are higher than 50 mg/kg/d and therefore unsuitable as a basis for deriving a RfD AA (Dorman et al., 2018).
The remaining phthalates highlighted in the structure-activity analysis by Li et al. (2019) as active (e.g. DPP, DNHP, DHP, DIHP and DCHP) and therefore relevant for inclusion in MRA are not considered further here. These phthalates are commercially less relevant, are rarely measured in biomonitoring studies, and data suitable for derivations of POD and RfD are scarce. However, there are indications that at least some of these phthalates (such as DNHP) are more potent than DBP and DEHP in producing the phthalate syndrome (Furr et al., 2014).
A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 3.8. Application of revised RfD AA to phthalate MRA Our analysis revealed that apart from DBP all other phthalates considered here (DIBP, BBP, DEHP and DINP) required revisions of their RfD AA. Our new proposed RfD AA differ in some respects from previously used values. They are generally lower than the RfD AA developed by Kortenkamp and Faust (2010) and more similar to the TDI based on EFSA and the Danish EPA. However, the impact of the new RfD on the outcome of MRAs compared with previously derived HI estimates is not immediately obvious. We therefore chose some of the phthalate MRAs reported earlier as case studies to compare the HI calculated using the new RfD AA with the outcomes using EFSA TDIs and the Kortenkamp and Faust values (Table 5). Our selection and the associated comparisons of HI is arbitrary and is not intended to be Table 5 The application of revised RfD AA to selected previous mixture risk assessments. Based on the 95th percentiles of estimated Daily Intakes (DI 95P). Hartmann Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 conclusive.
Relative to the EFSA TDI, the new proposed RfD AA generally yielded higher HI estimates, even though the revised value for DIBP increased from 10 μg/kg/d to 100 μg/kg/d. The exception is Hartmann et al. (2015) where the HI decreased slightly from 0.49 to 0.44. Compared to the HI based on the RfD AA by Kortenkamp and Faust the new HI values are considerably higher.
In almost all MRAs that substituted the missing DIBP TDI with that for DBP from EFSA, DIBP was a main contributor to the HI. With the revised RfD AA, this is no longer the case.
Due to its low nominal value among the Kortenkamp and Faust RfD AA, DEHP was often the sole major contributor to the HI. With the revised RfD AA, DEHP and DBP now drive the HI, with the share of BBP and DINP to the HI correspondingly increased.

The PODI as an alternative to the HI -does it remove distortions?
To evaluate concerns about the possible distorting effects on the HI due to the use of differing UFs, we built RQs based on PODs, calculated the corresponding PODI and compared the resulting values with those obtained by using the HI (Table 6). As with the procedure used for deriving the revised RfD AA, we applied UFs to the EFSA values to produce corrected PODs better aligned with potency differences. Among the EFSA TDI, this concerned the LOAEL of 2 mg/kg/d for DBP which was converted to 1 mg/kg/d by using the factor of 2. Because the UFs applied for extrapolating from POD to TDI were consistently 100, this produced a PODI 100-times lower than the corresponding HI.
In contrast, the PODIs generated by using the Kortenkamp and Faust PODs were not equivalent to the HI, mainly due to the difference between the UFs of 200 used for DBP, DIBP and BBP and that for DEHP. The UF for DINP was the largest used, but due to the small value of the corresponding RQ, the impact on the PODI was small. In general, however, the distortive impact of the differing UFs was small.
As with the EFSA PODI, the PODIs obtained by using our revised POD were equivalent to the HI, because uniform UF were used for the conversion from POD to RfD AA used in connection with the HI method.

Discussion
In deriving revised phthalate RfD AAs appropriate for MRAs (Table 4), we struck a balance between the need for consistency in terms of common endpoints and the requirement of realising a reasonable degree of protection. We achieved this by relying as much as possible on data related to suppression of foetal testicular testosterone synthesis, an effect common to many phthalates. Our own literature searches and the outcomes of recent systematic reviews (Dorman et al., 2018;Yost et al., 2019) showed that for DIBP, BBP and DINP testosterone suppression is the critical toxicity (in relation to the phthalate syndrome). Other effects that make up the phthalate syndrome occurred at higher doses. However, in the case of DBP and DEHP, reliance on testosterone suppression would have been insufficiently protective, as there is evidence that reduced spermatocyte development (DBP) and  A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 mild dysgenesis of genitalia (DEHP), all components of the phthalate syndrome, occur at lower doses.
In the case of DBP and DEHP it was not possible to completely avoid the imbalances that arise from the inaccurately defined effect magnitudes associated with using NOAELs or LOAELs as toxicological metrics. This would have required high quality dose-response analyses for all the endpoints in question, but the data to support such efforts are not available. Until better data become accessible, this flaw and the uncertainties this introduces to phthalate MRA have to be accepted.
With DIBP, BBP and DINP we achieved the required consistency by estimating BMDLs for suppression of testosterone synthesis. We based these estimates on a benchmark response (BMR) of 5% as recommended as the default in EFSA guidance (EFSA et al., 2017). This recommendation is in line with analyses of 246 developmental and reproductive toxicity studies (Allen et al. 1994) and of experiments from the US National Toxicology Programme (Bokkers and Slob, 2007) which established that the BMDL for a BMR of 5% was on average close to the respective NOAELs. Our BMDLs were 10-fold (DIBP, DINP) or 30fold (BBP) lower than the corresponding NOAELs which fell in the window defined by the BMDL and BMDU. Fournier et al. (2016) have presented a BMD for BBP based on 50% testosterone suppression. We did not follow this approach, for two reasons: First, it would have violated the principle of utilising reference doses derived for similar effect magnitudes in mixture risk assessments. Secondly, it is at present unclear whether any reduction in fetal testosterone levels, however small, triggers adverse effects, or whether such effects materialise only when testosterone suppressions falls below a certain critical value. To our knowledge, there are no data that support 50% testosterone suppression as being that critical value. Furthermore, the BMD philosophy is to safeguard against the occurrence of any adversity, and this is achieved with a BMR of 5%.
There is a relatively high degree of uncertainty in our BMDL estimates for testosterone synthesis suppression based on the three key studies we utilised (Clewell et al. 2013;Furr et al. 2014;Hannas et al. 2011). This is reflected by BMD: BMDL ratios of 5, 4 and 13 for DIBP, BBP and DINP, respectively (Supplementary Material). In principle, this uncertainty may be due to model uncertainty, data variance, or both (Haber et al. 2018). However, since we used model averaging in deriving our BMDL, the influence of any particular dose response model on the BMD: BMDL ratio is lessened considerably. We therefore believe that data variance is the main reason for the uncertainty of our estimates. This uncertainty could have been reduced somewhat had we been able to account for litter effects. However, this was not possible because the data in Clewell et al. (2013), Furr et al. (2014), andHannas et al. (2011) were reported as overall means.
All in all, our proposed revised RfD AA are lower than the values derived by EFSA (2005 a, b, c, d) and those from the Danish EPA (2009) ( Table 3). The RfD AA in Kortenkamp and Faust (2010) could not take account of new evidence of low dose phthalate effects that appeared after 2010 and are therefore insufficiently protective. They should not be used in future phthalate MRAs.
For some known reprotoxic phthalates we did not derive RfD AAs because these phthalates have only recently been detected in exposure assessments (Bertoncello et al., 2018). In addition, the available toxicity data is limited or rather old. However, phthalates with alkyl chain lengths of C5-C7, between DBP and DEHP, are considered more potent than DBP or DEHP themselves. Preliminary RfD AAs for such phthalates should therefore be lower than 6.7 μg/kg/d or 10 μg/kg/d, respectively, in the range between 1 and 5 μg/kg/d. These preliminary values could be used for the assessment of exposure scenarios that included these phthalates.

HI or PODI?
Our comparisons of the outcome of the familiar HI-based MRAs with those produced by using the PODI did not reveal striking differences. This is due to the separation of UFs for adjusting data quality (LOAEL -NOAEL extrapolations) from those for realising protection goals. After adjustment for data quality, a UF = 100 was consistently applied to the resulting POD which produces results equivalent to the HI, with no distortions. Thus, provided there is a clear separation of these two classes of UFs, either HI or PODI can be used, as both methods produce equivalent assessment outcomes. With the HI, a value larger than 1 signals that the combined exposures to the selected phthalates exceeds exposures judged to be acceptable. The PODI is the inverse of the MoE between human exposures and doses associated with borderline effects in experimental animals. A PODI of 0.01 or smaller indicates MoEs of 100 or larger. If a clear separation of the two types of UF cannot be made, we follow the arguments advanced by Wilkinson et al. (2000) in favour of the PODI.

A call for retiring the fixation on HI = 1
As already pointed out, most previous phthalate MRAs have considered an HI below 1 as signalling acceptable combined exposures. This tacitly assumes that the "risk cup" for male developmental disorders related to disruption of androgen action is solely composed of phthalates. However, this stance disregards all the evidence from the experimental literature of combined effects between phthalates and a wide variety of other substances also capable of interfering with hormone action (Howdeshell et al. 2017;Kortenkamp, 2019). These studies show that the developing male reproductive system can be disrupted from multiple different points of entry. These include not only suppression of InsL3 and testosterone synthesis as with phthalates, but also androgen receptor antagonism (certain dicarboximide pesticides, parabens, bisphenol A), direct inhibition of steroidogenic and steroidconverting enzymes (certain imidazole and phenylurea pesticides, and the drug finasteride), interference with the transport of androgen precursors (lipid-lowering drugs), disruption of prostaglandin signalling by inhibition of Cox enzymes (certain analgesics and a wide variety of phenolic substances) and poorly defined pathways triggered by polychlorinated dioxins and biphenyls. Thus, common adverse outcomes overlapping with the phthalate syndrome can be induced through multiple interacting and converging pathways that involve numerous chemicals with diverse structural features. Human exposure to many of these substances is as widespread and common as to phthalates.
This evidence calls for retiring the fixation on HI = 1 as the benchmark for evaluating the outcome of phthalate MRAs. HI = 1 leaves no room for co-exposures to other substances that also disrupt the normal development of the male reproductive system. We suggest two, not mutually exclusive, options for addressing this challenge: First, future MRAs should be expanded to consider not only phthalates, but also other substances capable of inducing male developmental toxicity through hormonal mechanisms. If the selection of relevant chemicals is comprehensive, then HI = 1 (equivalent to PODI = 0.01) can remain unchanged as the evaluation criterion. However, this option will be difficult to implement immediately. There are considerable challenges in collating the required exposure information for all substance of concern. Similar problems are to be expected when it comes to compiling toxicity data of suitable quality.
Alternatively, phthalates can be grouped together as before and subjected to MRA but evaluated against a lowered HI or PODI. The advantage of this approach is in its pragmatism: As pursuit of the first option is likely to be a complicated and time-consuming effort, phthalate MRAs can continue until more comprehensive assessments are available. However, this begs the question as to the degree to which the HI (or PODI) should be lowered.

What is an acceptable HI or PODI for phthalate MRAs?
The answer to this question will depend on the contribution of A. Kortenkamp and H.M. Koch International Journal of Hygiene and Environmental Health 224 (2020) 113428 phthalates to the HI (or PODI) relative to the entire "risk cup" of male developmental toxicants beyond phthalates. This requires that the number of chemicals contributing to an "anti-androgen" exposure scenario is known, together with the relative impact of each corresponding RQ to the HI or PODI. For other anti-androgens acting together with phthalates these figures are currently unknown. Thus, the choice of an acceptable HI below 1 is at present arbitrary. Nevertheless, informed guesses can be made. Considering that in addition to the group of phthalates as a whole at least 10 further substances are likely to contribute to the combined exposures (Kortenkamp, 2019), it is not unrealistic to assume that phthalates alone make up 10% of the risk cup. To accommodate the impact of these other substances, phthalates would have to be evaluated using a HI of 0.1. However, this assumes that the contribution of anti-androgenic non-phthalates is evenly distributed. If however some of these chemicals make a disproportionately large contribution to the HI, while others add little, the phthalate share of the risk cup could increase to perhaps 20%. Accordingly, the HI of phthalates alone would have to be evaluated against a HI of 0.2.
We emphasise that these considerations are speculative, but not implausible. We therefore recommend 0.1-0.2 as a point of orientation to guide the interpretation of phthalate MRAs until more information about exposures to other anti-androgens becomes available to allow upwards revisions of these values.

A framework and workflow for MRAs beyond phthalates
To realise MRAs beyond phthalates that also include other chemicals capable of producing male developmental toxicity via hormonal modes of action it is first necessary to gain a sense of which chemicals to include. Based on reviews of experimental mixture studies (Howdeshell et al. 2017;Kortenkamp, 2019) we suggest that a minimum set of chemicals to be assessed together with phthalates should comprise pesticides such as vinclozolin, prochloraz, procymidone, linuron, pain killers including paracetamol, aspirin and ibuprofen, pharmaceuticals such as finasteride, ketoconazole, and the lipid-lowering drug simvastin, poly-chlorinated dibenzo-dioxins and other dioxin-like pollutants, poly-brominated diphenyl ethers, perfluoro octanoic acid, perfluoro octanoic sulfonate and phenolics such as bisphenol A and butylparaben.
It is suggested to begin an MRA with a scoping analysis by application of the HI. The aim of this step would be to assess whether the sum of RQ exceeds HI = 1. At this stage of the analysis, TDI, ADI or RfD for the critical toxicity of each of the chemicals, regardless of the availability of RfD for male developmental toxicity may be used. A second step will be to establish whether simultaneous exposure to all the chemicals in the "risk cup" is likely, and, if appropriate, to focus only on those relevant to simultaneous exposures.
If this analysis reveals unacceptable combined exposures (HI = 1), the assessment should be refined by focusing strictly on effect doses and RfD related to male developmental toxicity by hormonal mechanisms. To avoid possible distortions by using differing UFs in the HI method, application of the PODI suggests itself.

Conclusions
A set of phthalate RfDs for use in MRA is required that incorporates new information about low dose effects and is commensurate with the scientific principles of mixture effect assessments under dose addition. We have proposed new RfDs that match these requirements. We conclude that the fixation on HI = 1 that has dominated previous phthalate MRAs should be retired. The idea that HI = 1 for the assessment group of phthalates signals minimal risks and no need for regulatory action is misguided. It ignores that there is co-exposure to other anti-androgenic chemicals that also contribute to risks. Until more comprehensive assessments of phthalates together with other anti-androgens become available, we propose the use of a HI of 0.1-0.2 (PODI = 0.001-0.002) as a provisional pragmatic solution.

Declaration of competing interest
The authors declare there are no conflicts of interest.