Novel Models for Accurate Estimation of Air–Blood Partitioning: Applications to Individual Compounds and Complex Mixtures of Neutral Organic Compounds

The air-blood partition coefficient (Kab) is extensively employed in human health risk assessment for chemical exposure. However, current Kab estimation approaches either require an extensive number of parameters or lack precision. In this study, we present two novel and parsimonious models to accurately estimate Kab values for individual neutral organic compounds, as well as their complex mixtures. The first model, termed the GC×GC model, was developed based on the retention times of nonpolar chemical analytes on comprehensive two-dimensional gas chromatography (GC×GC). This model is unique in its ability to estimate the Kab values for complex mixtures of nonpolar organic chemicals. The GC×GC model successfully accounted for the Kab variance (R2 = 0.97) and demonstrated strong prediction power (RMSE = 0.31 log unit) for an independent set of nonpolar chemical analytes. Overall, the GC×GC model can be used to estimate Kab values for complex mixtures of neutral organic compounds. The second model, termed the partition model (PM), is based on two types of partition coefficients: octanol to water (Kow) and air to water (Kaw). The PM was able to effectively account for the variability in Kab data (n = 344), yielding an R2 value of 0.93 and root-mean-square error (RMSE) of 0.34 log unit. The predictive power and explanatory performance of the PM were found to be comparable to those of the parameter-intensive Abraham solvation models (ASMs). Additionally, the PM can be integrated into the software EPI Suite, which is widely used in chemical risk assessment for initial screening. The PM provides quick and reliable estimation of Kab compared to ASMs, while the GC×GC model is uniquely suited for estimating Kab values for complex mixtures of neutral organic compounds. In summary, our study introduces two novel and parsimonious models for the accurate estimation of Kab values for both individual compounds and complex mixtures.

Section S1: How to predict log values of nonpolar analytes using GC×GC model.

Readers can follow the steps given below to estimate the log values of nonpolar analytes
detected on the GC×GC chromatogram using the MATLAB code, which can be assessed from the authors.
1. Obtain optimum GC×GC separation for your sample of interest using the same or equivalent column combination considered in this study.
2. Identify at least15 nonpolar calibration analytes, in addition to an n-alkane series, within the elution window of interest of the GC×GC chromatogram.These analytes may already be present or added in the sample or may be analyzed separately as standard using the same GC×GC instrumental program.
3. Provide the first-and second-dimension retention times, and ASDs of calibration analytes as an input to the MATLAB code.

different species
We recalibrated the Abraham solvation models to enhance their predictive capabilities for log values.We observed discrepancies between certain values of Abraham Solvation   Descriptors (ASDs) and their corresponding values reported in the most recent online database "LSER database for comptox users (2017)" available on the UFZ LSER database website.
Consequently, we determined that the ASDs in question required upgrading to align with the latest reported values 1 .Also, we found that the values of some ASDs taken from the literature 2 , 3 were not even present in any of the previously published datasets available on the UFZ-LSER database (Supporting Information, Table S 7).We identified discrepancies in the recorded values of certain parameters, suggesting errors in their acquisition or reporting.To rectify this issue, we employed the most recent version of the database, "LSER database for comptox users (2017)," as a reference to correct these values.(Supporting Information, Table S  with subscripts representing the corresponding species names.For example, was utilized   to include the log values of rats in Eq. 1.These variables are arbitrary and allow users to   activate or deactivate specific species, enabling the estimation of their log values using a   single equation.The indicator variables were assigned values of 1 or 0. To calculate the log   values for horses using Eq. 1, we set =1 and all other indicator variables to 0. The  ℎ goodness-of-fit statistics for Eq. 1 showed an improvement of 0.004 in R 2 and a decrease of 0.007 log units in RMSE compared to the previous version of this model published in the literature 2 .The improvement in the correlation can be attributed to the correction of certain ASD values, which were replaced with the most up-to-date values obtained from the UFZ LSER database website.The standard errors of each coefficient are presented in parentheses, indicating that the coefficients corresponding to the indicator variables used for log datasets   of dog, rabbit, hamster, guinea pig, sheep, and cat were found to be statistically insignificant. Consequently, the indicator variables for these species were eliminated, and a new regression analysis was performed.As a result, a new ASM equation (Eq.2) was formulated, incorporating four indicator variables.this equation.It is important to note that although the offsets between human and rat (0.16 log units), human and pig (0.22 log units), human and mouse (0.26 log units), and human and horse (0.34 log units) in the log data are relatively small, they should not be ignored.These   differences, although minor, could have potential implications and should be taken into consideration during data analysis and risk assessment related to human health.

4.
Query the MATLAB code to predict the log values based on the first-and second-  dimension retention times of nonpolar analytes detected on the GC×GC chromatogram.Users may also directly overlay the contours of these log values onto GC×GC chromatogram.  Section S2: Analysis and upgradation of ASMs for prediction of log values of

Section S3 :
Reevaluation of Partition Model with Experimental Octanol-Water and Air-Water Data Initially, we combined both experimental and predicted values of the partition coefficients of octanol-water and air-water to create an extended dataset for training the air-blood partition model, aiming to enhance its applicability, especially in cases where experimental data are sparse.To inspect the influence of this combination on the model, we conducted an additional analysis using solely the experimental values.The statistics obtained from this focused analysis are detailed below.

Figure S1 :
Figure S1: Kernel Density Estimation plots comparing the distribution of Abraham solute descriptors (E, S, A, B, V, and L) and partition coefficients for the octanol-water, air-water, and air-blood systems across two datasets: mix.data (blue) and exp.data (orange).The mix.data set includes both experimental values and those of logKow and logKaw estimated using the Abraham Solvation Model to fill missing entries.In contrast, exp.data comprises only experimental logKow and logKaw values.