A quantitative structure-biodegradation relationship (QSBR) approach to predict biodegradation rates of aromatic chemicals.

The objective of this work was to develop a QSBR model for the prioritization of organic pollutants based on biodegradation rates from a database containing globally harmonized biodegradation tests using relevant molecular descriptors. To do this, we first categorized the chemicals into three groups (Group 1: simple aromatic chemicals with a single ring, Group 2: aromatic chemicals with multiple rings and Group3: Group 1 plus Group 2) based on molecular descriptors, estimated the first order biodegradation rate of the chemicals using rating values derived from the BIOWIN3 model, and finally developed, validated and defined the applicability domain of models for each group using a multiple linear regression approach. All the developed QSBR models complied with OECD principles for QSAR validation. The biodegradation rate in the models for the two groups (Group 2 and 3 chemicals) are associated with abstract molecular descriptors that provide little relevant practical information towards understanding the relationship between chemical structure and biodegradation rates. However, molecular descriptors associated with the QSBR model for Group 1 chemicals (R2 = 0.89, Q2loo = 0.87) provided information on properties that can readily be scrutinised and interpreted in relation to biodegradation processes. In combination, these results lead to the conclusion that QSBRs can be an alternative tool to estimate the persistence of chemicals, some of which can provide further insights into those factors affecting biodegradation.


Introduction
Microbial degradation is one of the important processes that determine the fate of anthropogenic chemicals in the environment. It can transform potentially hazardous chemicals into less or more harmful products and can ultimately sometimes lead to their complete mineralization into carbon-dioxide, water and nutrients (Marchlewicz et al., 2017). Biodegradation tests have thus become an indispensable step in the regulation of chemicals; determining their classification and labelling, environmental risk assessment, and hazard assessment (Pavan and Worth, 2008;Rücker and Kümmerer, 2012). However, current laboratory testing methodologies used for the evaluation of biodegradation/persistence are expensive, time consuming and poorly reproducible (Goodhead et al., 2014;Kowalczyk et al., 2015). To date only 4214 of the estimated 145,297 chemicals (pre-registered unique substances under EU REACH legislation; ECHA, 2017, Accessed Date: 23/06/ 2017) (ECHA, 2017) have been reliably screened for their biodegradability (OECD, 2017). Performing such tests for the remaining existing and new chemicals is a laborious, costly and perhaps unachievable task. Furthermore factors affecting biodegradability are poorly understood and seldom studied (Rücker and Kümmerer, 2012;Kowalczyk et al., 2015;Wang et al., 2018). Therefore, an ability to reliably predict biodegradation rates would help to accelerate and improve hazard and environmental risk assessment of chemicals, while reducing time, monetary cost and potentially unnecessary animal testing requirements (Pavan and Worth, 2008;Rücker and Kümmerer, 2012;Martin et al., 2017a).
Quantitative Structure Activity Relationships (QSARs) are a widely used modelling technique in diverse fields where the physicochemical properties of chemicals are correlated to the strength of a given response or activity (Okey and Stensel, 1996;Roy et al., 2011;Lee and von Gunten, 2012;Mikolajczyk et al., 2015;Cvetnic et al., 2017). In recent years, the application of QSARs to biodegradation (Quantitative Structure Biodegradability Relationship, QSBR) has been advocated (Pavan and Worth, 2008;Rücker and Kümmerer, 2012), providing valuable prediction of relative or absolute biodegradation rates, and/or transformation products of biodegradable chemicals, strictly on the basis of chemical structures without having to undertake laboratory testing. Such models could provide a powerful tool to forecast the environmental fate of chemicals and assist in ranking them for in-depth evaluation. However, such techniques are either rarely used in practice or rarely receive regulatory acceptance (QSBRs currently provide reliable regulatory data for 20 of 15,873 registered chemicals in the ECHA database) [(ECHA, 2017); Accessed Date: 23/06/2017], largely due to a lack of high-quality experimental data (Rücker and Kümmerer, 2012).
Biodegradation models are broadly classified into qualitative and quantitative models. Qualitative models simply predict if a chemical is biodegradable or not, mostly using structural fragments as a molecular descriptor (Gamberger et al., 1996a(Gamberger et al., , 1996bLoonen et al., 1999;Tunkel et al., 2000), whereas quantitative models predict the rate and sometimes biodegradation pathways of chemicals, mostly using different structural and physicochemical properties of a chemical [such as molecular descriptors] (Arnot et al., 2005;Cvetnic et al., 2017). The Organization for Economic Co-operation & Development (OECD) principles for QSAR state that the rigorous validation of QSARs (or QSBRs) requires; (i) a defined endpoint, (ii) a defined domain of applicability, (iii) appropriate measures of goodness of fit, robustness and predictability, (iv) expressed in the form of unambiguous algorithms and (v) if possible, capable of providing a potential mechanistic interpretation of biodegradation (Netzeva et al., 2005). Most of the currently available quantitative QSBR models do not comply with all these principles; they are built based on a small set of congeners without a defined, or at least an uncertain, applicability domain [the theoretical area of the chemical space, where for the particular mechanism of biological action or function, the model's predictions are reliable (Puzyn et al., 2009)] (Paris et al., 1982(Paris et al., , 1983Banerjee et al., 1984;Paris and Wolfe, 1987;Pitter and Chudoba, 1990;Okey and Stensel, 1996;Arnot et al., 2005;Yang et al., 2006), and rarely provide information that can be used to better understand the mechanisms of the biodegradation process. This limits their application in predicting the biodegradability of chemicals with diverse structural properties. There is a need for the development of new validated and mechanistic quantitative models (Pavan and Worth, 2008), which would be helped by the availability of high quality endpoint data for biodegradation (preferably quantitative) for a diverse set of chemicals. However, such data for biodegradation rates are currently lacking (Rücker and Kümmerer, 2012;Nolte et al., 2018), therefore, these must be estimated from standardized biodegradation screening tests (Arnot et al., 2005).
Advances in computational and statistical tools have motivated researchers to move towards the development of more sophisticated QSBR models by allowing the calculation of the numerous 1D, 2D, 3D e structural and quantum mechanical molecular descriptors that can provide potential information on a chemical's structural, physical, or electronic properties that have influence on biodegradation process Helguera et al., 2008;Pavan and Worth, 2008). The availability of databases containing harmonized biodegradation data further allows such models to be tested and validated (Mansouri et al., 2013;Ceriani et al., 2015). The objective of this work was to develop QSBR models able to provide a potential mechanistic interpretation for organic pollutants based on biodegradation rates from a database containing globally harmonized biodegradation tests using relevant molecular descriptors. Models were developed and validated following the OECD principals for QSAR validation. The developed models would not only improve our understanding of the biodegradation of such chemicals, but could also be used as a means of prioritising and classifying chemicals in regulatory hazard and risk assessments in place of expensive and time-consuming laboratory biodegradation tests.

Chemical selection and molecular descriptor calculation
140 organic chemicals (Tables SIe5) categorized either as priority pollutants or emerging organic pollutants in the field of water policy (Decision_Number, 2001;EPA, 2003;Lee and von Gunten, 2012) were initially selected for model development. The identified chemicals encompass aliphatic, aromatic and cyclic chemicals.
In the QSBR approach, information on the structural features of chemical are encoded as numerical values of molecular descriptors and provides separate information about the chemical's structure.
Each chemical was characterized by 4897 molecular descriptors. 4885 descriptors were computed by DRAGON software, used world-wide for the calculation of molecular descriptors [Version 6.0e2014, (Dragon)]. The descriptors of selected chemicals in the DRAGON software were computed with the optimized structure (i.e. structure with minimum energy conformation) of the chemicals. The optimization of structures was performed with semiempirical PM7 method implemented in MOPAC software (Stewart, 2007). Descriptors with constant values (i.e. some descriptors have the same value for all chemicals) and inter-correlated descriptors (i.e. correlation coefficient greater than 0.9) were excluded in a pre-reduction step; thus obtaining a set of 2459 DRAGON molecular descriptors. Detailed information on DRAGON molecular descriptors can be found in the Handbook of Molecular Descriptors (Todeschini and Consonni, 2008). Another 12 descriptors; physio-chemical descriptors (7 descriptors) (Pitter and Chudoba, 1990;Saterbak et al., 2007;Ballabio et al., 2009;Lee and von Gunten, 2012) and quantum-chemical descriptors (5 descriptors) (Stewart, 2007) were obtained from online sources or computed with quantum mechanical method on optimized structure (i.e. semi-empirical PM7 method implemented in MOPAC software), respectively [Tables SIe6].

Endpoint for QSBR model
Endpoint refers to any physicochemical, biological or environmental effect that can be measured and therefore modelled. Most of the biodegradation data available in the literature are based on data-poor experimental information and are rarely reproducible (Rücker and Kümmerer, 2012). Half-lives, which are directly linked to first-order rate, are the end-point commonly used in the regulatory assessment of persistence (Pavan and Worth, 2008). First order biodegradation rates derived from BIOWIN3 ultimate biodegradation rating were used as a model endpoint since experimental data for all the chemicals examined was lacking. BIOWIN3 is one of several environmental fate estimation models incorporated in EPI (Environmental Protection Interface) Suite and uses internationally harmonized biodegradation data (US-EPA, 2012). It predicts relative ultimate biodegradation rates of chemicals using the fragment based additive approach (US-EPA, 2012).
In this study, a new statistically significant regression relating the ultimate biodegradation rating and ultimate biodegradation half-life of 13 chemicals (eChemPortal) was developed [Table SI-1(A and B) and Figure SI-1]. A similar approach has been used elsewhere to convert ultimate biodegradation rating into chemical half-life (Arnot et al., 2005), but not specifically for the chemicals examined in this study. These chemicals are among the few chemicals, which were used in the BIOWIN3 model development;this regression was used to convert semi-quantitative BIOWIN3 biodegradation ratings to half-lives of the selected chemicals in the study.
Finally, the corresponding first order biodegradation rate of all 140 chemicals were computed using the degradation half-life of the chemicals [Equation 11, SI]. The natural logarithm of first order rate was used as the response variable for subsequent QSBR modelling.

Screening of chemicals for QSBR model development
Clustering analysis enables pattern recognition and classification of chemicals into natural groups that are unknown beforehand, by using the common properties characterized by the values of a set of variables (Pirhadi et al., 2015). In this study, Hierarchical Clustering Analysis (HCA) [distance: Euclidean, method: Ward] was used to group chemicals into clusters based on calculated molecular descriptors (Pirhadi et al., 2015). This resulted into four main clusters [ Figure SI-3], composed, respectively, of aliphatic chemicals and few simple aromatic chemicals (cluster 1); mostly simple chemicals having a single aromatic ring and a few aliphatic, polyaromatic and acyclic chemicals (cluster 2 and 3); and mostly poly-aromatic chemicals and a few cyclic chemicals (cluster 4). Aliphatic chemicals from the first cluster were excluded from further analysis, as there were too few (20 chemicals) for reliable QSBR model development and validation, even though first order rates for those chemicals were available. It has to be noted that a QSBR model developed with a small dataset would not reflect the complete property space, and as a consequence QSBR results cannot be used to confidently predict the desired activity (Cherkasov et al., 2014). Furthermore, a small number of acyclic and aliphatic chemicals (in total 17 chemicals) were present in clusters 2, 3 and 4, which were therefore also excluded from further analyses, since they are unlikely to confirm to the same applicability domain as aromatic chemicals. Therefore, only aromatic chemicals (103 chemicals) were considered for further model development.

Dataset splitting
Cluster analysis in general revealed that [ Figure SI-3], regardless of the number of observed clusters, the distance between different clusters for simple aromatic chemicals (i.e. clusters 1, 2 and 3) were closer to each other as compared to the cluster for poly-aromatic chemicals (i.e. cluster 4). We decided to develop QSBR models for simple aromatic chemicals and poly-aromatic chemicals separately, and also a combined model for all aromatic chemicals. After discarding cluster 1 and other structurally dissimilar chemicals (mentioned in Section 2.3), the selected aromatic chemicals were classified into three groups (see Tables SIe2, Tables SIe3 and  Tables SIe4): (i) Group 1 (simple aromatic chemicals with one aromatic ring, 69 chemicals; from clusters 2 and 3); (ii) Group 2 (polyaromatic chemicals, 34 chemicals, from cluster 4) and (iii) Group 3 (all aromatic chemicals; Group 1 þ Group 2, 103 chemicals). Each group was analysed as a separate dataset for QSBR modelling. It should be noted that each cluster was not used as a separate dataset for QSBR model development, as it was unlikely to confirm to the same applicability domain due the presence of chemicals with structural dissimilarities. Furthermore, simple aromatic chemicals formed two clusters, and would have required two different models despite showing some similarity. Firstly, Principal Component Analysis [PCA] was used for each group to identify any outliers that could affect the robustness and fitness of the model (Gramatica et al., 2013) (Pirhadi et al., 2015). This analysis also aided in selecting training and validation sets. Subsequently, the random-by-response approach (sorting chemicals by ordering them according to increasing/decreasing order of end point value) was applied to split the chemicals into training and validation sets (Mikolajczyk et al., 2015). To perform splitting, the third chemical from the sorted chemical list was selected from the set as a first validation chemical. Subsequently every third chemical (if possible) was selected from the sorted list as a validation chemical. The remaining chemicals formed the training set. The chemical with the highest and the lowest biodegradation rates were included in the training set, to guarantee that the prediction set spanned the entire range of the experimental measurements and was numerically representative of the dataset. The final QSBR model for Group 1, Group 2 and Group 3 were developed with 60 [Tables SIe2], 28 [Tables SIe3], and 84 [Tables SIe4] chemicals, respectively.

QSBR model development and validation
QSARINS software (Version 2.0) (Gramatica et al., 2013) was used to develop a QSBR model for the split dataset using Multiple Linear Regression (MLR) techniques. In MLR, the endpoint (y i ) is described with the best combination of the most relevant autoscaled descriptors used as independent variables (x 1 , x 2 , …x n ), as follows: where b 0 is the intercept and b 1 , b 2 , …b n are the regression coefficients. It has to be noted that, the two step approach involving clustering (i.e. HCA) and activity prediction (i.e. MLR) are two distinct and independent steps used for different purposes both from molecular descriptors. This approach is recommended when QSAR models need to be developed from datasets having large number of chemicals with diverse structure (He and Jurs, 2005). Initially, DRAGON descriptors (i.e. 2459 descriptors) were used to develop QSBR models for the three groups defined in Section 2.4. The best combination of the most relevant descriptors was selected using the Genetic Algorithm [GA] (Pavan and Worth, 2008;Gramatica et al., 2013) incorporated in QSRINS. This technique allows identification of the best solution (i.e. helps to search the best combination of descriptors) by maximizing (or minimizing) a selected fitness function. In this study, Q 2 Loo (Cross validated coefficient; used to evaluate the model's performance in predictions) was selected as a fitness function. Fig. 1 summarizes the detailed methodological steps performed during QSBR model development.
The best models developed by QSARINS were sorted using fitting (R 2 , SI, Eq. (1) and RMSE tr , SI, Eq. (2)) and robustness (Q 2 LOO , SI, Eq. (3) and RMSE CV , SI, Eq. (4)) criteria. Correlation coefficient (R 2 ) and the root mean square error of calibration (RMSE tr ) were used as measures of the goodness of fit for the developed model (Puzyn et al., 2009). Cross validated coefficient Q 2 LOO (leave one out method) and root-mean-square-error of cross validation (RMSE CV ) were used to verify its stability and robustness (Puzyn et al., 2009). For details please refer to the SI.
The internally optimized, stable and robust models were further evaluated for their external predictive power with chemicals not used in the model building process using different external validation parameters like Q 2 F1 (Eq. 5, SI) Q 2 F2 (Eq. 6, SI), Q 2 F3 (Eq. 7, SI) and root-mean-square-error of prediction (RMSE P , Eq. (8)). (Chirico and Gramatica, 2011). These are predictive squared correlation coefficients and for details please refer to the SI.
In addition, the Applicability Domain of the finally selected model was assessed by the leverage approach and using the Williams graph (Roy et al., 2011), a plot of leverage values (h) versus standardized residuals that generally identifies the structural outliers (X-Outliers, those having leverage value greater than critical h value) and the residual outliers (Y-outliers, those with predicted response value above the user defined standardized residual limit). The critical h value (h * ) is calculates as: where p is the number of model predictors, and n is the number of objects (training chemicals) used to calculate the model.
The leverage value (h i ) is calculated from the molecular descriptors included in the model, and estimated according to Equation 9 (SI).
After identifying the best model with three DRAGON descriptors for Group1 chemicals, an attempt was made to develop new models that incorporated an additional five quantum mechanical descriptors and seven other descriptors describing different physicochemical and structural properties of chemicals (15 descriptors; Tables SIe6). These descriptors were those that pertained to electronic properties and functional groups within the chemical, which have been shown to influence the biodegradation of a chemical (Pitter and Chudoba, 1990;Nolte and Ragas, 2017). The quantum mechanical descriptors were calculated at the semi-empirical level of theory with the use of PM7 method implemented in MOPAC software (Stewart, 1994(Stewart, , 2007, whereas other descriptors were obtained from different online databases (Pitter and Chudoba, 1990;Ballabio et al., 2009;Lee and von Gunten, 2012). A similar approach as described above was used for model development.

Result and discussion
3.1. QSBR models Table 1 provides the overall summary of the best model for each set of chemical groups. The QSBR model for the simple aromatic chemicals (Equation (4)) was better than the other two models for complex (Equation (3)) and aromatic chemicals (Equation (2)), as indicated by higher values for R 2 , Q 2 loo Q 2 F1, Q 2 F2 and Q 2 F3 . The applicability domain of all three models was evaluated by a Williams plot (Fig. 2; Group 1 and Figure SI- The Williams plots verified the absence of outliers (residual values (y i À b y i ) were within the limits ± 3 times standard deviation), and showed good applicability of the model for the prediction of the biodegradation rates for all of the studied aromatic chemicals. In addition, none of the structures of the studied aromatic chemicals were substantially different from the training set chemicals; showing a leverage value proving the applicability of this model for untested aromatic chemicals with calculated h i values lower than the critical value (h*). However, it has to be noted that h i values were estimated using Equation 9 (SI), which does not take into account structural variability or differences other than those expressed by the selected descriptors. Therefore, the applicability of the model in its actual form is restricted to chemicals which have structural similarities to those used for the training set, and should not be employed in prediction of structurally different chemicals (e.g. structural differences in Group 1 chemicals are mostly due to different substituent patterns on the mono-aromatic ring, and the model for Group 1 should not be employed in prediction for poly-aromatic chemicals).
The molecular descriptors provide information on specific physicochemical or structural characteristics of the chemical, and the ability to interpret the encoded value of the descriptors provide information on the molecular features that are most likely to effect the biological activity of studied chemical (Todeschini and Consonni, 2008). However, when using an extensive matrix of molecular descriptors, the mechanistic interpretation of the endpoint of interest may not always provide useful or easily interpretable information. This is evident in the current study, as the descriptors associated with QSBR models for chemical datasets belonging to Group 2 and Group 3 are relatively abstract molecular descriptors of molecular geometry, stereochemistry, conformational index, 2D finger printing and fragments counts. While for Group 1, the descriptors are for hydrophobic, electronic, steric, size and shape properties of chemical and can be more easily interpreted with respect to offering rational explanations of their affect on biodegradation. Therefore, in upcoming sections, the QSBR model for simple aromatic chemicals were principally focused up on; evaluating the model in terms of different model parameters, defining its applicability domain and providing an underlying understanding between the biodegradation rate and the molecular descriptors associated with the model. Furthermore, an attempt was made to improve the existing model by incorporating some other common descriptors specifically associated with biodegradation of chemicals, namely, quantum mechanical, hydrophobic, steric and electronic descriptors (Tables SIe6). 3.2. QSBR model for simple aromatic chemicals (group 1) Table 3 provides the summary statistics for the three-descriptor based QSBR model for simple aromatic chemicals (Equation (4), Table 1). There was no significant inter-correlation (Pearson Correlation Coefficient, P-value > 0.05) between the three descriptors (Tables SIe7). This model showed high stability (R 2 ¼ 0.8924), Fig. 1. Detailed procedure for developing QSBR models (adopted from (Mikolajczyk et al., 2015):). robustness (Q 2 LOO ¼ 0.8718), and external predictive ability (Q 2 F1 ¼ 0.8829, Q 2 F2 ¼ 0.8835, and Q 2 F3 ¼ 0.9178). The plots of the experimental versus predicted values (Fig. 3 A and B) showed very good agreement between BIOWIN3 derived first order biodegradation rates and the model predicted values of biodegradation rate for 60 aromatic chemicals for both training and validation sets (slopes of 0.89). Likewise, model predicted half-life and BIOWIN3 derived half-life also showed good agreement (slope of 0.96). In addition, the plots confirmed the predictive capability of the developed model.
After identifying the best three DRAGON descriptors (nN, nArX and Mor08u), a further 12 descriptors were included in the dataset to develop new models using the same approach (Tables SIe6). The summary of the best 10 models based on 4 descriptors is reported in Table 2. All the models were robust (89.8 < R 2 > 93.1%) and stable (86.9% > Q 2 loo < 91.1%) and have good predictive ability (85.6%<     Q 2 ext >95.8%). In addition, all the descriptors from the developed models provide interpretable and possible mechanistic insights into the model, as these descriptors are related to electronic, steric and lipophilic properties of chemicals. Biodegradation of a chemical has been shown to be influenced by electronic and lipophilic properties, along with functional groups within the chemical (Parsons and Govers, 1990;Cvetnic et al., 2017).

QSBR model descriptors and their interpretation in relation to biodegradation rates
In this study, a simple linear regression model developed (result not shown) for Group 1 chemicals using substituent constant (s), which quantitatively examines the inductive and resonance effect of substituents on the biodegradation rate, had poor fitting and stability (R 2 ¼ 0.173). Recently, authors have suggested using a combination of different categories of descriptors (e.g. quantum chemical, topological, constitutional) for improved QSAR performance in order to predict biological activity (e.g. biodegradation rates) (Mamy et al., 2015;Cvetnic et al., 2017). The type, number and position of substitutions on the aromatic ring are important parameters that determine the topological and electronic characteristics of aromatic chemicals and ultimately influence the degree and rate of biodegradation. The information provided by the descriptors (substituent constant [s], nN, nArX and Mor08u) used in this study is in agreement with the effect of substitutions in the aromatic ring as follows.
Mor08u is an un-weighted MoRSE (Molecular Representations of Structure based on Electron diffraction) descriptors with scattering parameter (s) ¼ 7 Å À1 and calculated according to Equation 10 (SI). The calculated values of different MoRSE descriptors for chemicals are based on their structural features, where distance among the different atoms within a molecule will be the principal means to separate the molecule from others (Devinyak et al., 2014). When QSAR models are developed for a structurally similar group of chemicals, the difference in the value of 3D-MoRSE descriptor is due to several neighbouring atom pairs in the molecule (Devinyak et al., 2014). The differences in these descriptor values result in different physicochemical properties for the monoaromatic chemicals, attributable to differences in the number, type and position of substituents on the aromatic ring, which in turn affect the biodegradation rate. This interpretation is in agreement with results previously published in the literature (Sikarwar and Dixit, 2012), where an individual QSAR model was used to predict the molar refractivity, partition coefficient and polarizability of 34 phenolic chemicals, using 3D-MoRSE descriptors and Eigen values respectively.
Number of Nitrogen (nN) and Halogen Atoms in Aromatic Chemical (nArX): The biodegradability of a molecule is influenced by electronic (inductive and mesomeric), steric and lipophilic properties of a molecular system (Pitter and Chudoba, 1990). In particular, the type, number and position of substituent groups present in the aromatic molecule strongly determine the electronic characteristics of that molecule. Both electronic inductive and mesomeric effects are responsible for attraction or repulsion of electrons, affecting the electron density in the reaction centre. The presence of halogen and nitrogen atoms in the aromatic system will contribute to deplete the electron density of the reaction system. In biodegradation reactions, the initial attack on the aromatic ring is mostly assumed to be electrophilic in nature, therefore the presence of electron e attracting elements (like halogens [-F, -Cl, Br, -I] and nitrogen [-CN, -NO 2 , -NH 3 þ , -CONH 2 ]) deactivate the aromatic ring in certain positions for attack by oxygenases and results in a lower biodegradation rate (Pitter and Chudoba, 1990).

Mechanism of aerobic biodegradation
The most practical and useful approach to QSBRs is to relate biodegradation rate data to molecular descriptors specifically providing relevant information on potential mechanisms of the biodegradation rate-limiting process. Furthermore, biodegradation rates are believed to be a function of the rate of a series of processes that occurs in a stepwise manner (Parsons and Govers, 1990).
Extracellular enzymes initiate microbial mineralization of organic matter by hydrolyzing substrates to sizes sufficiently small to be transported across cell membranes. They then diffuse through the cytoplasm to reach the enzyme where the biodegradation reaction is initiated (Arnosti, 2011). Subsequently, enzyme induction occurs, which ultimately results in the binding of the metabolic enzyme and the chemical followed by transformation of the chemical by that enzyme (Parsons and Govers, 1990;Wammer and Peters, 2005).
The transport of chemicals across the microbial membrane can occur either via active transport mechanisms or passive diffusion. An active transport mechanism is mostly involved in efflux and has little significant influence on biodegradation rate (Parsons and Govers, 1990). It has been shown that the transport of polyaromatic hydrocarbons (PAHs) across the microbial membrane generally occurs via the passive diffusion processes in some PAHdegrading bacteria (Bugg et al., 2000). Furthermore, uptake via passive transport tends to correlate with the descriptors that normally describe the hydrophobicity of the chemicals such as LogP (octanol-water partition coefficient), polarizability or molar refraction, 3D-MoRSE (Banerjee et al., 1984;Parsons and Govers, 1990;Mamy et al., 2015); suggesting diffusion of a chemical through the cytoplasm is an important step in biodegradation.
The structure of chemicals also likely controls the process of metabolic enzyme induction, subsequent binding of the chemical to enzymes, and transformation of the chemical (Parsons and Govers, 1990). The relationship between biodegradability of aromatic chemicals and molecular descriptors associated with the structure of chemicals are discussed in several papers (Wolfe et al., 1980;Okey and Stensel, 1996). More specifically, in monoaromatics, the number, position and type of substituent are associated with electronic properties of the chemicals and influence their biodegradability (Pitter and Chudoba, 1990). In such chemicals, aerobic biotransformation is generally initiated with the addition of molecular oxygen to the aromatic ring (i.e. hydroxylation) by oxygenase or dioxygenase enzymes (Pitter and Chudoba, 1990;Peijnenburg, 1994), followed by aromatic ring cleavage. Hydroxylation of the aromatic ring with subsequent ring cleavage is followed by an electrophilic substitution, which are considered as the rate determining steps (Pitter and Chudoba, 1990). The presence of certain substituent groups increases the electron density of the aromatic ring and accelerates the biodegradation process. Nitrogen and halogenic substituents strongly deplete the electron density of the aromatic ring, decreasing the rate of biodegradation as compared to degradation of aromatic chemicals with other substituents (e.g., OH, CHO, CH 3 ). This effect was demonstrated by several authors (Alexander and Lustigman, 1966;Vuono et al., 2016;Cvetnic et al., 2017), where they showed retarded biodegradation rates of mono-and di-substituted benzene by microorganisms with chloro-, nitro-and sulfonate-substituents, whereas an increased rate was recorded in the presence of hydroxyl and carboxyl groups. The shape and size of chemicals plays an integral role during fitting of a chemical into the active site of the enzyme (Wammer and Peters, 2005). Five descriptors in the current models; Mor08u, molecular weight, van der Waal volume and valence connectivity index ( v X 2 and v X 1 ) are general size and shape descriptors. The association of these descriptors with biodegradation rate indicates that the steric properties of aromatic chemicals have a significant role to play in the biodegradation process. Several QSAR models relating biological activity and the molecular descriptors associated with steric properties have been developed elsewhere (Koch, 1982;Paris et al., 1982;Okey and Stensel, 1996;Ceriani et al., 2015). 3D-MoRSE descriptor like Mor08u, which is weighted by electrotopological states, combines both electronic and topological characteristics of atoms or molecules and plays an influential role in determining the biodegradability of aromatic chemicals (Ceriani et al., 2015).
In most chemical reactions, an energy barrier exists, and it must be surmounted for the reaction to occur. Thus, kinetic and thermodynamic parameters are integral in explaining the observed differences in biodegradation. In addition, thermodynamic feasibility is also considered as an important metric to evaluate the potential of a biodegradation reaction (Finley et al., 2009). Two descriptors, electronic energy and total energy of a molecule are quantum mechanical descriptors and generally provide information of energy associated with a chemical. The relationship between the biodegradability of aromatic chemicals and descriptors associated with the energy of molecules are discussed in several papers (Wammer and Peters, 2005;Yang et al., 2006). Total energy (ToE) of a molecule is the sum of total energy of all electrons (E el ) and repulsion energy between atomic nuclei (E nucÀnuc ) in a molecule (Stewart, 1994).
Electronic energy (E el ) is the sum of repulsion energy between electrons and the attraction energy between electron and atomic nuclei. According to molecular orbital theory, the total electronic energy is directly associated with total energies of the individual occupied molecular orbitals, which also provides the information of total bond energy in a molecule (Karelson et al., 1996;Petrucci et al., 1997). This suggests that the higher the total energy, the higher the total bond energy of a molecule. Thus, the molecule with higher total bond energy has strong attraction between electrons and the atomic nuclei. This implies that, high energy is required to degrade such a molecule. Therefore, when an aromatic chemical has higher total energy, they are more resistant to degradation.
3.5. Implications for predicting biodegradation half-lives for fate and hazard assessment 'Real world' chemical half-lives in different environment compartments are essential for the reliable risk assessment of chemicals, yet, accurate datasets providing information on half-lives are lacking. Several researchers have attempted to convert BIOWIN output into half-lives and ultimately into rates (Gouin et al., 2004;Arnot et al., 2005;Aronson et al., 2006), which allows the prioritization of chemicals according to their relatively biodegradabilities, even if the absolute rates may be inaccurate. We recognize that accurate prediction of half-lives or rates with a model can only be reliably achieved by training the model with actually measured half-lives or rates (Rücker and Kümmerer, 2012). Nevertheless, in the absence of such a dataset, we have developed a QSBR models to predict half-lives of 60 aromatic chemicals and made a comparison between the rates predicted in this study ( Figure SI-2) and the rates predicted with the model developed by Arnot et al. (2005) (Arnot et al., 2005), who developed a simple linear regression model by correlating the BIOWIN outputs and experimental aqueous aerobic half-lives. There was a significant correlation between the first order rates estimated with the aforementioned model and the first order rates used in this study (univariate regression analysis; r 2 ¼ 0.99, p-value <0.05). However, the applicability domain of the model was uncertain and the model failed to provide any insights into the principles underlying biodegradation, while our models does. Nevertheless, this observed correlation not only suggests that the QSBR model predictions are stable, but also provides a framework for evaluating the persistence of chemicals that are within the model applicability domain.
Regulatory frameworks have put much emphasis on identification and prioritization of chemicals based on their environmental hazardous properties (i.e. persistence, bioaccumulation and toxicity (PBT)) rather than on their environmental risk alone (Martin et al., 2017b). Furthermore, their guidance also recommends that persistence assessment of chemicals should be performed prior to bioaccumulation and toxicity assessments in order to avoid unnecessary animal tests; the latter should be carried out only when the chemical is assigned as potentially persistent (ECHA, 2017). Half-lives are the commonly used end-point in the regulatory assessment of persistence (Rücker and Kümmerer, 2012;ECHA, 2017;Martin et al., 2017b). The QSBR models for simple aromatic chemicals developed here, has been shown to correctly predict and interpret respectively half-lives and the affect of chemical parameters on biodegradation of chemicals based on their previous biodegradability classification obtained from BIOWIN3. These models also enabled the generation of first order biodegradation rates of chemicals. On the other hand, the models developed in this study are unable to provide any information on metabolites and transformation products. In addition, they also ignore other natural degradation or removal processes such as photolytic degradation and sorption.

Conclusions
-QSBR models for simple aromatic chemicals were robust, appear reliable and can be more easily interpreted with respect to potential mechanistic explanations of their affect on biodegradation than the models for two other groups of chemicals developed in this study. -This is a first proof-of-principle step showing that QSBR models can be an alternative approach to expensive laboratory tests both in screening for, and making definitive classifications in, persistence assessments of chemicals, in this instance, for mono-aromatic chemicals.
-The derived QSBR model needs further validation and calibration using experimentally determined biodegradation rates. However, a similar approach to that used here could prove useful in deriving accurate QSBRs for other classes of chemicals.

Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.