Spacial Score—A Comprehensive Topological Indicator for Small-Molecule Complexity

The fraction of sp3-hybridized carbons (Fsp3) and the fraction of stereogenic carbons (FCstereo) are two widely employed scores of molecular complexity with strong links to biologically relevant features. However, they do not comprehensively express molecular topology, and they often do not match the chemical intuition of complexity. We propose the spacial score (SPS) as an empirical scoring system that builds upon the principle underlying Fsp3 and FCstereo and expresses the spacial complexity of a compound in a uniform manner on a highly granular scale. The size-normalized SPS (nSPS) can differentiate distributions of natural products and synthetic compounds and is applicable in the analysis of biological activity data. Analysis of the ChEMBL database revealed general trends of increasing selectivity and potency with increasing nSPS. SPS can also be used advantageously in planning and analysis of synthesis programs for direct comparison of chemical transformations and intermediates in reaction sequences.


Figure
Figure S1.A) Examples of compounds with considerable topological complexities and F Cstereo equal zero.B) Examples of compounds with considerable topological complexities and both F sp3 and F Cstereo equal zero.C) Examples of simple compounds with the maximum possible F sp3 score of one.

Figure S2 .
Figure S2.Histogram of F Cstereo scores for deduplicated combined compound collection from DrugBank, Enamine, Dark Chemical Matter and ChEMBL natural product data sets.44% of all the considered molecules has F Cstereo score equal 0.0.

Figure S3 .
Figure S3.Relationship between nSPS and the scores for the Allu and Oprea index (SMCM).The SMCM scores were calculated based on the SMCM interpretation by Voršilák and Svozil.The plot is based on data for 12000 representative compounds selected in equal proportions from DrugBank, Enamine, Dark Chemical Matter and ChEMBL natural product data sets.The calculated Pearson correlation coefficient is equal to 0.46.

Figure S4 .
Figure S4.Examples of compounds with different values of nSPS: 0-1 st , 5 th and 25 th percentile from DrugBank, Enamine, Dark Chemical Matter and ChEMBL natural product data sets.

Figure S5 .
Figure S5.Examples of compounds with different values of nSPS: 50 th , 75 th and 99 th percentile from DrugBank, Enamine, Dark Chemical Matter and ChEMBL natural product data sets.

Figure
Figure S6.A) Relationship between nSPS and F sp3 .The plot is based on data for 12000 representative compounds selected in equal proportions from DrugBank, Enamine, Dark Chemical Matter and ChEMBL natural product data sets.B) Relationship between nSPS and F Cstereo .The plot is based on data for 12000 representative compounds selected in equal proportions from DrugBank, Enamine, Dark Chemical Matter and ChEMBL natural product data sets.

Figure
Figure S7.A) Proportions of high, medium and low activity in ChEMBL assays for compounds at three ranges of F sp3 scores.B) Proportions of high, medium and low activity in ChEMBL assays for compounds at three ranges of F Cstereo scores.

Figure S8 .
Figure S8.Relationship between nSPS and pChEMBL values and nSPS and molecular weight, where compounds are grouped into bins according to the their nSPS values.Average pChEMBL and molecular weight values are calculated for each bin, where each bin contains at least ten compounds.The bins in the top panel corresponds the bins of the bottom panel, and they represent the same molecules.

Figure
Figure S9.A) Proportions of number of targets for compounds at three ranges of F sp3 scores, based on the ChEMBL data.B) Proportions of number of targets for compounds at three ranges of F Cstereo scores, based on the ChEMBL data.

Figure
Figure S10.A) Average F sp3 for FDA-approved drugs over time.Shaded area shows ±1 standard deviation.B) Average F Cstereo for FDA-approved drugs over time.Shaded area shows ±1 standard deviation.C) Average nSPS for FDA-approved drugs over time.Shaded area shows ±1 standard deviation.

Figure
Figure S12.A) Proportions of high, medium and low activity in ChEMBL assays for compounds at three ranges of size-normalised Böttcher complexity scores (C m /HA).B) Proportions of number of targets for compounds at three ranges of C m /HA scores, based on the ChEMBL data.The following classification criteria were applied: low compound complexity (25% percentile): C m /HA ≤ 10.84; medium complexity: 10.84 < C m /HA < 13.35; high complexity (75 th percentile): C m /HA ≥ 13.35.

Figure
Figure S13.A) ROC plot for the ability of the size-normalised Böttcher complexity scores (C m /HA) to discriminate between compounds with high and low-to-moderate potency in ChEMBL assays.AUC of 0.5 indicates no discriminatory ability (dashed diagonal line).Based on the analysed data, the size-normalised Böttcher complexity score has little or no application as a classifier in respect to compound potency.B) ROC plot for the ability of C m /HA to discriminate between promiscuous (6 or more known binding targets) and more target selective compounds (1-5 binding targets).AUC of 0.5 indicates no discriminatory ability (dashed diagonal line).Based on the analysed data, the size-normalised Böttcher complexity score has no application as a classifier in respect to compound selectivity.