Network topology mapping of chemical compounds space

Tsekenis, Georgios; Cimini, Giulio; Kalafatis, Marinos; Giacometti, Achille; Gili, Tommaso; Caldarelli, Guido

doi:10.1038/s41598-024-54594-9

Download PDF

Article
Open access
Published: 04 March 2024

Network topology mapping of chemical compounds space

Georgios Tsekenis^1,2,
Giulio Cimini³,
Marinos Kalafatis⁴,
Achille Giacometti^2,5,
Tommaso Gili⁶ &
…
Guido Caldarelli^1,2,5,7

Scientific Reports volume 14, Article number: 5266 (2024) Cite this article

552 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

We define bipartite and monopartite relational networks of chemical elements and compounds using two different datasets of inorganic chemical and material compounds, as well as study their topology. We discover that the connectivity between elements and compounds is distributed exponentially for materials, and with a fat tail for chemicals. Compounds networks show similar distribution of degrees, and feature a highly-connected club due to oxygen . Chemical compounds networks appear more modular than material ones, while the communities detected reveal different dominant elements specific to the topology. We successfully reproduce the connectivity of the empirical chemicals and materials networks by using a family of fitness models, where the fitness values are derived from the abundances of the elements in the aggregate compound data. Our results pave the way towards a relational network-based understanding of the inherent complexity of the vast chemical knowledge atlas, and our methodology can be applied to other systems with the ingredient-composite structure.

Annotation of natural product compound families using molecular networking topology and structural similarity fingerprinting

Article Open access 19 January 2023

Entropy structural characterization of zeolites BCT and DFT with bond-wise scaled comparison

Article Open access 05 July 2023

On topological indices and entropy measures of beryllonitrene network via logarithmic regression model

Article Open access 26 March 2024

Introduction

The space of chemical compounds comprises hundreds of thousands of different combinations of the over one hundred chemical elements. Such an ample volume was produced by employing several experimental and computational techniques developed for the study of Chemistry over the past centuries. Navigating the vast chemical space is a formidable task and has been the topic of previous research (e.g.^1,2). Motivated by the need to harness the burgeoning complexity of the ever-growing chemicals and materials fields, in this manuscript, we present a constitutive relational network study of inorganic chemistry and materials science, relying on the toolbox of complex networks theory^3,4.

In the past, chemical reaction networks have been presented for small numbers of reactants⁵, without addressing the overall complexity of the problem. Furthermore, in materials science, recent efforts have concentrated on faster and cheaper targeted engineering of materials, the so-called Materials Genome project^6,7. Such an approach customarily relies on aggregate statistics. However, incorporating meaningful relational networks can significantly improve the inferential power of statistical approaches, such as materials cartography⁸. One network approach has been based on the representation of materials phase diagrams^9,10. A different approach was to analyze a set of materials as a network, according to the cross-correlation of the electronic density of states¹¹. Unfortunately, these methods produce fully, or almost fully, connected graphs where all substances are related, which is not very different from an aggregate approach.

Here, we construct and study element-compound networks of extensive catalogues/libraries of chemicals and materials. Furthermore, we successfully model these networks with versatile fitness models derived from maximum entropy methodology. That way, we set large bodies of knowledge onto a new frame of reference, providing novel points of view and enabling further future utility.

Networks from data

We construct relational networks from two different datasets that we sample from two separate databases. The first, CRC, is based on inorganic chemical compounds¹², and the second, AFLOW, is based on inorganic material compounds¹³ (see “Methods”). Each of the datasets contains $n_C$ compounds and $n_E$ elements; the specific values are shown in Table 1.

Table 1 Information about the datasets used to build all the networks. $n_C$ and $n_E$ are the number of compounds and elements. L is the number of bipartite links, while $L_C$ and $L_E$ are the number of monopartite links for compounds and elements, respectively.

Full size table

We build a bipartite network for each dataset by linking every compound c to the elements e it contains (Fig. 1a). For each dataset, the resulting bipartite network is composed of two layers: one consisting of the compounds c and the other of the elements e, and is characterized by a $n_C \times n_E$ binary bi-adjacency matrix B linking the two layers^{14,15,16,17,18}, where the matrix element $B_{ce}=1$ if c contains e and zero otherwise . For each bipartite network, the total number of links, L, is given by the sum of all B matrix elements: $L=\sum _{c,e}{B_{ce}}$. The degree of a node is the sum of its incident links: $d_c = \sum _e B_{ce}$ and $d_e = \sum _c B_{ce}$ for compounds and elements, respectively.

The degree distributions for both layers and both datasets are shown in the four left panels of Fig. 2(a,b,e,f). The degrees ($d_c$) of the compounds layer are discrete since each compound is linked to as many distinct elements it contains. Their overall distribution appears modulated by a Gaussian-like curve. The degree ($d_e$) distributions of the elements appear dominated for larger values by a fat-tail for the CRC network, and by an exponential decay for the AFLOW network, indicating a different inherent complexity of inorganic chemicals vs materials. Oxygen is the most connected element corresponding to the maximum degree, a feature confirmed also by other analyses as we shall see below.

We further consider the relationships between compounds or between elements, by projecting the bipartite network on either layer to get the corresponding monopartite network. In the compounds network (Fig. 1b), the nodes are the compounds, and a pair of compounds are linked if they share a common element. In the elements network (Fig. 1c), the nodes are the elements, with links between elements that co-participate in a compound. The adjacency matrices of the binary monopartite networks, $A_C$ and $A_E$, are obtained by the binary bi-adjacency matrix B: $(A_C)_{cc'}=1$ if $\sum _e B_{ce}B_{c'e}>0$, $(A_E)_{ee'}=1$ if $\sum _c B_{ce}B_{ce'}>0$, and zero otherwise. Summing all non-zero entries in the adjacency matrices gives the number of links in the monopartite networks: $2L_C = \sum _{c,c'} (A_C)_{cc'}$ and $2L_E = \sum _{e,e'} (A_E)_{ee'}$. The degree of compounds and elements of the monopartite networks are respectively given by $k_c = \sum _{c'} (A_C)_{cc'}$ and $k_e = \sum _{e'} (A_E)_{ee'}$.

The degree distributions of the monopartite compounds ($k_c$) and elements ($k_e$) networks are shown in the four right panels of Fig. 2(c,d,g,h) for both CRC and AFLOW. A striking feature of the degree distributions for compounds is that they appear to be composed of two main modes. Further investigation reveals that all the compounds in the high-degree bump contain the oxygen element. We denote with a vertical black dotted line (upper panels) the smallest degree of a compound that contains oxygen. Correspondingly, in the elements networks oxygen has the maximum, or nearly maximum, degree, which we denote with a vertical black dotted line (lower panels). In both datasets we discover that compounds containing oxygen form an oxygen club. This is a maximally interconnected community composed of compounds with a large degree. The oxygen club is a result of oxygen’s prominence in the inorganic chemistry and materials science datasets, as well as the rules of the network. This particular feature is almost impossible to be captured by a specific model and needs to be addressed by further analysis of the structure of the datasets. The shape of the degree distributions of the elements networks appears to have an exponential body for the CRC network and a linear decay for the AFLOW network.

Fitness models

We model these networks by assuming that there is a hidden underlying process where all the elements compete for prominence based on an unknown intrinsic fitness. We discover that the abundance, $a_e$, of each element e, which is simply the element occurrence in all compounds of a dataset shown in Fig. 1d, is an excellent quantity to consider as element fitness. Similarly, we find that the fitness of a compound c can be represented well by the number of element species, $\ell _c$, it contains. We model the bipartite networks by using normalized fitness values

$$\begin{aligned} x^b_{e} = \frac{ a_{e} }{ \sum _{e'} a_{e'} }, \quad y^b_{c} = \frac{ \ell _{c} }{ \sum _{c'} \ell _{c'} } \end{aligned}$$

(1)

as effective parameters of a maximum-entropy fitness model^{15,19,20,21,22,23,24}. Specifically, in our model, each pair of nodes from the two different layers (i.e., an element e and a compound c) is connected according to a linking probability with a Fermionic form

$$\begin{aligned} f(\delta ,x^b_e,y^b_c) = \frac{ \delta x^b_e y^b_c }{1 + \delta x^b_e y^b_c} \end{aligned}$$

(2)

where $\delta$ is a single tuning parameter for each network. The best fitting value $\delta ^{*}$ is extracted by matching the number of links, L, of the real network with that of the model:

$$\begin{aligned} L = \sum _{e,c} f(\delta ^{*},x^b_e,y^b_c) \end{aligned}$$

(3)

Using $\delta ^{*}$ from Eq. (3) and the normalized fitness $\{x^b_e\}$, $\{y^b_c\}$, we calculate the expected model degrees, ${\tilde{d}}_c = \sum _{e} f(\delta ^*,x^b_e,y^b_c)$ and $\tilde{d}_e = \sum _{c} f(\delta ^*,x^b_e,y^b_c)$.

We remark that Eq. (2) derives from an entropy maximisation procedure with degree constraints, where the fitness values replace the unspecified Lagrange multipliers^23,25,26. Hence our modelling is an effective maximum-entropy procedure informed by a heuristic fitness ansatz, where the fitness of the nodes generate the model degrees. The alternative route, which we do not follow here, would be to find the values of the multipliers such that the expected degrees match the empirical values, through e.g. likelihood maximization.

For the monopartite networks, we follow a similar approach. We use abundance for the fitnesses of the elements, while for the compounds we sum up the abundances of the elements they contain, $a_{c} = \sum _{e \in c} a_{e}$. Hence

$$\begin{aligned} x^m_e=\frac{ a_{e} }{ \sum _{e'} a_{e'} },\ \ y^m_{c} = \frac{ a_{c}}{ \sum _{c'} a_{c'} }. \end{aligned}$$

(4)

Links between nodes in the two monopartite networks are computed with a linking function similar to the previous one. We have, for elements and compounds networks respectively

$$\begin{aligned} f(\delta _E,x^m_e,x^m_{e'}) = \frac{ \delta _E x^m_e x^m_{e'} }{1 + \delta _E x^m_e x^m_{e'}}, \qquad f(\delta _C,y^m_c,y^m_{c'}) = \frac{ \delta _C y^m_c y^m_{c'} }{1 + \delta _C y^m_c y^m_{c'}}. \end{aligned}$$

(5)

$\delta _E$ and $\delta _C$ are still free parameters for each network, whose values $\delta _E^{*}$ and $\delta _C^{*}$ are determined using the number of links in the empirical networks:

$$\begin{aligned} 2L_E = \sum _{e,e'} f(\delta _E^{*},x^{m}_e,x^{m}_{e'}), \qquad 2L_C = \sum _{c,c'} f(\delta _C^{*},y^{m}_c,y^{m}_{c'}) \end{aligned}$$

(6)

Again, we calculate the expected degrees, ${\tilde{k}}_e$, $\tilde{k}_c$, using $\delta ^{*}_E$, $\delta ^{*}_C$ from Eq. (6), and the normalized fitness $\{x^m_e\}$, $\{y^m_c\}$, respectively, using ${\tilde{k}}_e = \sum _{e'} f(\delta _E^{*},x^{m}_e,x^{m}_{e'})$ and ${\tilde{k}}_c = \sum _{c'} f(\delta _C^{*},y^{m}_c,y^{m}_{c'})$.

We find very good or exceptional agreement between the real networks and the fitness models regarding the degrees in all cases, as shown in Fig. 2. The higher-order network measure of the degree assortativity exhibits stronger fluctuations but is still captured on average as we report in the SI and Figs. S3 and S4.

Community analysis

We further analyze the community structure emerging from this way of exploring the chemical space. We use the Louvain greedy algorithm²⁷, a method based on the maximization of the modularity Q (a quantity related to how many links tend to connect nodes within communities rather than nodes belonging to different communities^{28,29,30,31,32}). We identify between 3 and 5 communities in AFLOW, with $Q \approx 0.25$, and between 5 and 7 communities in CRC, with a smaller $Q \approx 0.11$, as shown in Fig. 3. The small variability of the results depends on the initialization of the algorithm; below we discuss only the findings that are robust across multiple runs of the algorithm.

As expected, there is a community of compounds of large degree, which has the highest abundance of oxygen ($Z=8$), (Fig. 3a,d). The oxygen community has a high overlap with the oxygen club, but they are not identical, (Fig. 3b,e). The rest of the communities are centered around other, not necessarily prominent elements, (Fig. 3c,f). More specifically, in the CRC network, the second largest community is dominated by hydrogen ($Z=1$), and the third by fluorine ($Z=9$). We notice that the three most prominent elements in the CRC dataset overall, O, H, C ($Z=6$) (Fig. 1d), and the most prominent elements of the three largest communities, are all light elements (first row of their groups in the periodic table) and are highly reactive. In the AFLOW network there are communities that contain most of the oxygen ($Z=8$), sulfur ($Z=16$) and silicon ($Z=14$). We notice that two most prominent elements in the AFLOW dataset, O and S (Fig. 1d), have their own communities, and are the first two elements of the original group VII or the newer group 16 of the periodic table, which are collectively called chalcogens.

Discussion

In summary, we developed a simple but fundamentally effective way to delve into the hidden complexity of large, aggregate, chemical datasets, and reveal their higher-order correlations. We discovered that the connectivity of elements to compounds follows a heterogeneous distribution with different kinds of tails: a fat one for the CRC network and an exponential one for the AFLOW network. We traced this significant difference to the corresponding distributions of elemental abundance in the CRC and AFLOW datasets, as shown in Fig. 1e,f. The connectivity analysis also revealed the special role of oxygen in the networks as we found that it dominates all orders of correlation amongst inorganic AFLOW and CRC, Fig. 1. Therefore, we revealed that oxygen holds a prominent position in the complexity of inorganic chemistry, beyond simply being the most common element³³ (Oxygen has also been found to play a central role in biochemical networks and the complexity of life³⁴). A further community analysis we performed revealed chemical knowledge of purely topological origin. The largest communities in CRC compounds network are dominated by light, highly reactive elements. The picture is starkly different for AFLOW, where the most prevalent elements are somewhat heavier and less reactive. The AFLOW compounds network is less modular, comprising more communities, as compared to the CRC. All of the results presented in this Report are obtained thanks to the network methodology we developed, and cannot be derived from aggregate analyses.

In addition, we were able to formalize our findings through a maximum entropy network approach. Our fitness models were tailored for the bipartite network and its monopartite projections, employing a single-fitted parameter and novel fitness values that are external to the network. Our analysis is able to quantify self-consistently both networks of CRC and AFLOW, and reproduces successfully their statistically different connectivity. The parsimonious modelling methodology we developed can be applied to any bipartite network, or to a pair of complementary networks, such as article-author networks³⁵, recommendation networks³⁶, disease phenome-genome³⁷, countries-products^15,38,39, food ingredients-flavors⁴⁰, social networks^14,16, ecological networks¹⁷, biological and medical networks^18,41,42, and so on.

Network science can benefit chemistry and materials science by reorganizing its extensive body of knowledge through complex networks. Analyzing and modeling chemistry networks allows us to systematize intrinsic behaviors and emergent or occluded patterns into quantitative relations. Such informed chemical/material graph atlases can accelerate decisions on “synthesizability”, and minimize costs for intelligent design of novel composites with desired properties. This can be done by utilizing graphical algorithms and network methods to complete tasks that are computationally overwhelming or demanding to investigate as is the case when starting from raw data, first principles, experimentally, or traditional cheminformatics². Specifically, the network connectivity properties that we study here describe the relation among existing substances, and can inform searches for alternative or novel ones.

Our approach involves large networks of substances, different from approaches that perform learning with neural networks on individual, modestly-sized, molecular graphs^43,44. In its current form, it takes advantage solely of the chemical composition of substances, but it can be systematically expanded to include more material properties as node variables, such as crystal structure, thermodynamic quantities, or mechanical properties. It can utilize more sophisticated measures for weighted linking, e.g. the number of atoms in common, or quantify the similarity of nodes with cosine/dot-product, for further gains.

Methods

Datasets creation from databases

From the AFLOW database¹³ we downloaded all the compounds that also belong to the ICSD catalogue (similarly to¹¹), and have a value entry in the following eleven properties: composition, species, density, volume atom, pressure, valence cell iupac, spin atom, scintillation attenuation length, energy atom, enthalpy atom, eentropy atom (electronic entropy).

We utilized the entire database of Physical Constants of Inorganic Compounds of the CRC Handbook of Chemistry and Physics Online 102nd Edition (2021), which is part of CHEMnetBASE¹².

Both databases may reflect the biases of their creators, historical trends in chemistry, and/or the research interests, needs, and abilities of the scientific and engineering community. The AFLOW database comprises solids, while the CRC database contains compounds in all phases at standard conditions complicating property annotation. The only implicit constraint we imposed on the AFLOW database was the materials to be sufficiently well analysed/studied. We presume that the CRC database was compiled with a similar criterion of most commonly used substances. Our results are proven for our datasets, since the global shapes of the distributions are preserved when we randomly sub-sampled our datasets. As the databases expand and research becomes gradually more systematic, we foresee that the potential benefit from our network analysis would also expand past the space limited by current research.

Graph link density

The link density of the networks of elements is an increasing function of the size of the compounds datasets considered, while the density of the networks of compounds is independent of such size. This is due to the fact that for the elements network, the total number of nodes (i.e. elements) is constant, $n_E \sim O(100)$, while the number of links between elements increases as more compounds are analyzed. For the compounds network, the length of compounds. i.e. the number of elements species, is nearly constant, $1 \le \ell _c \le 8$, and as more compounds are added, both the number of nodes (i.e. compounds) and links increase. This results in a constant link density, which is roughly $\sim 0.20$ for the AFLOW and $\sim 0.42$ for the CRC compounds networks (see SI, Fig. S1).

Data availability

The CRC dataset can be obtained from the table of Physical Constants of Inorganic Compounds of the CRC Handbook of Chemistry and Physics Online 102nd Edition (2021), which is part of CHEMnetBASE¹², at https://hbcp.chemnetbase.com . The AFLOW dataset can be obtained from the AFLOW library of crystallographic prototypes¹³, at http://www.aflowlib.org .

References

Kirkpatrick, P. & Ellis, C. Chemical space. Nature 432, 823 (2004).
Article CAS ADS Google Scholar
Leach, A. R. & Gillet, V. J. An Introduction to Chemoinformatics (Springer, 2007).
Caldarelli, G. Scale-Free Networks (Oxford University Press, 2007).
Barabási, A.-L. The network takeover. Nat. Phys. 8, 14–16. https://doi.org/10.1038/nphys2188 (2012).
Article CAS Google Scholar
Unsleber, J. P. & Reiher, M. The exploration of chemical reaction networks. Annu. Rev. Phys. Chem. 71, 121–142. https://doi.org/10.1146/annurev-physchem-071119-040123 (2020).
Article CAS ADS PubMed Google Scholar
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article ADS Google Scholar
Batra, R., Song, L. & Ramprasad, R. Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater. 6, 655–678. https://doi.org/10.1038/s41578-020-00255-y (2021).
Article ADS Google Scholar
Isayev, O. et al. Materials cartography: Representing and mining materials space using structural and electronic fingerprints. Chem. Mater. 27, 735–743. https://doi.org/10.1021/cm503507h (2015).
Article CAS Google Scholar
Aykol, M. et al. Network analysis of synthesizable materials discovery. Nat. Commun. 10, 1–7. https://doi.org/10.1038/s41467-019-10030-5 (2019).
Article CAS Google Scholar
Hegde, V. I., Aykol, M., Kirklin, S. & Wolverton, C. The phase stability network of all inorganic materials. Sci. Adv. 6, eaay5606 (2020).
Article CAS ADS PubMed PubMed Central Google Scholar
Veremyev, A. et al. Networks of materials: Construction and structural analysis. AIChE Journal 67, e17051 (2021).
Article CAS Google Scholar
CRC Handbook of Chemistry and Physics, Physical Constants of Inorganic Compounds (CRC Press, Taylor & Francis Group, an Informa Group company, 2021). https://hbcp.chemnetbase.com/faces/contents/ContentsSearch.xhtml.
AFLOW Library of Crystallographic Prototypes. http://www.aflowlib.org/.
Holme, P., Liljeros, F., Edling, C. R. & Kim, B. J. Network bipartivity. Phys. Rev. E 68, 056107. https://doi.org/10.1103/PhysRevE.68.056107 (2003).
Article CAS ADS Google Scholar
Saracco, F., Clemente, R. D., Gabrielli, A. & Squartini, T. Randomizing bipartite networks: The case of the world trade web. Sci. Rep. 5, 1–18 (2015).
Article Google Scholar
Faust, K. Centrality in affiliation networks. Soc. Netw. 19, 157–191 (1997).
Article Google Scholar
Ings, T. C. et al. Review: Ecological networks—beyond food webs. J. Anim. Ecol. 78, 253–269. https://doi.org/10.1111/j.1365-2656.2008.01460.x (2009).
Article PubMed Google Scholar
Pavlopoulos, G. A. et al. Bipartite graphs in systems biology and medicine: A survey of methods and applications. GigaScience 7, giy014. https://doi.org/10.1093/gigascience/giy014 (2018).
Article PubMed PubMed Central Google Scholar
Caldarelli, G., Capocci, A., De Los Rios, P. & Muñoz, M. A. Scale-free networks from varying vertex intrinsic fitness. Phys. Rev. Lett. 89, 258702. https://doi.org/10.1103/PhysRevLett.89.258702 (2002).
Article CAS ADS PubMed Google Scholar
Garlaschelli, D. & Loffredo, M. I. Fitness-dependent topological properties of the world trade web. Phys. Rev. Lett. 93, 188701. https://doi.org/10.1103/PhysRevLett.93.188701 (2004).
Article CAS ADS PubMed Google Scholar
Cimini, G., Squartini, T., Garlaschelli, D. & Gabrielli, A. Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5, 15758 (2015).
Article CAS ADS PubMed PubMed Central Google Scholar
Squartini, T. et al. Enhanced capital-asset pricing model for the reconstruction of bipartite financial networks. Phys. Rev. E 96, 032315. https://doi.org/10.1103/PhysRevE.96.032315 (2017).
Article ADS PubMed Google Scholar
Cimini, G. et al. The statistical physics of real-world networks. Nat. Rev. Phys. 1, 58–71 (2019).
Article Google Scholar
Cimini, G., Carra, A., Didomenicantonio, L. & Zaccaria, A. Meta-validation of bipartite network projections. Commun. Phys. 5, 1–12 (2022).
Article Google Scholar
Park, J. & Newman, M. E. Origin of degree correlations in the internet and other networks. Phys. Rev. E 68, 026112. https://doi.org/10.1103/PhysRevE.68.026112 (2003).
Article CAS ADS Google Scholar
Park, J. & Newman, M. E. Statistical mechanics of networks. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 70, 13. https://doi.org/10.1103/PhysRevE.70.066117 (2004).
Article MathSciNet CAS Google Scholar
Nguyen, L. V. et al. Fast unfolding of communities in large networks you may also like the Baxter q operator of critical dense polymers Alessandro Nigro: A Bayesian fusion model for space-time reconstruction of finely resolved velocities in turbulent flows from low resolution measurements fast unfolding of communities in large networks. J. Stat. Mech. 2008, 10008 (2008).
Google Scholar
Newman, M. E. & Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113. https://doi.org/10.1103/PhysRevE.69.026113 (2004).
Article CAS ADS Google Scholar
Newman, M. E. Modularity and community structure in networks. In Proceedings of the National Academy of Sciences, Vol. 103 8577–8582. https://doi.org/10.1073/pnas.0601602103 (2006).
Good, B. H., Montjoye, Y. A. D. & Clauset, A. Performance of modularity maximization in practical contexts. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 81, 046106. https://doi.org/10.1103/PhysRevE.81.046106 (2010).
Article MathSciNet CAS Google Scholar
Zhang, P. & Moore, C. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. In Proceedings of the National Academy of Sciences of the United States of America, Vol. 111 18144–18149. https://doi.org/10.1073/pnas.1409770111 (2014).
Bongiorno, C., London, A., Miccichè, S. & Mantegna, R. N. Core of communities in bipartite networks. Phys. Rev. E 96, 022321. https://doi.org/10.1103/PhysRevE.96.022321 (2017).
Article ADS PubMed Google Scholar
Wang, H. C., Botti, S. & Marques, M. A. Predicting stable crystalline compounds using chemical similarity. npj Comput. Mater. 7, 1–9 (2021).
Article CAS Google Scholar
Raymond, J. & Segrè, D. The effect of oxygen on biochemical networks and the evolution of complex life. Science 311, 1764–1767 (2006).
Article CAS ADS PubMed Google Scholar
Newman, M. E. The structure of scientific collaboration networks. In Proceedings of the National Academy of Sciences, Vol. 98 404–409. https://doi.org/10.1073/pnas.98.2.404 (2001).
Zhou, T., Ren, J., Medo, M. & Zhang, Y. C. Bipartite network projection and personal recommendation. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 76, 046115. https://doi.org/10.1103/PhysRevE.76.046115 (2007).
Article CAS ADS Google Scholar
Goh, K. I. et al. The human disease network. In Proceedings of the National Academy of Sciences of the United States of America, Vol. 104 8685–8690. https://doi.org/10.1073/pnas.0701361104 (2007).
Hidalgo, C. A. & Hausmann, R. The building blocks of economic complexity. In Proceedings of the National Academy of Sciences of the United States of America, Vol. 106 10570–10575. https://doi.org/10.1073/pnas.0900943106 (2009).
Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. A new metrics for countries’ fitness and products’ complexity. Sci. Rep. 2, 1–7 (2012).
Article Google Scholar
Ahn, Y. Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A. L. Flavor network and the principles of food pairing. Sci. Rep. 1, 1–7 (2011).
Article Google Scholar
Lee, D.-S. et al. The implications of human metabolic network topology for disease comorbidity. In Proceedings of the National Academy of Sciences, Vol. 105 9880–9885 (2008).
Zhou, X., Menche, J., Barabási, A.-L. & Sharma, A. Human symptoms-disease network. Nat. Commun. 5, 4212 (2014).
Article CAS ADS PubMed Google Scholar
Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems Vol. 28 (eds Cortes, C. et al.) (Curran Associates Inc., 2015).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, of Proceedings of Machine Learning Research Vol. 70 (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017). https://proceedings.mlr.press/v70/gilmer17a.html.

Download references

Acknowledgements

We would like to thank Stefano Bonetti and Roberta Sinatra for useful discussions. This work has been partially supported by the Grant EU ‘HumanE-AI-Net’, No. 952026, and by the Italian Ministry of Foreign Affairs and International Cooperation (“Mac2Mic”), and by MIUR PRIN-COFIN2022 Grant 2022JWAF7Y.

Author information

Authors and Affiliations

Institute for Complex Systems, National Research Council, Rome, Italy
Georgios Tsekenis & Guido Caldarelli
Department of Molecular Sciences and Nanosystems (DMSN), “Ca’ Foscari” University of Venice, Venice, Italy
Georgios Tsekenis, Achille Giacometti & Guido Caldarelli
Physics Department and INFN, University of Rome Tor Vergata, Rome, Italy
Giulio Cimini
Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Marinos Kalafatis
European Centre of Living Technologies (ECLT), “Ca’ Foscari” University of Venice, Venice, Italy
Achille Giacometti & Guido Caldarelli
Networks Unit, IMT School for Advanced Studies Lucca, 55100, Lucca, Italy
Tommaso Gili
Rara Foundation - Sustainable Materials and Technologies ETS, 30171, Venice, Italy
Guido Caldarelli

Authors

Georgios Tsekenis
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Cimini
View author publications
You can also search for this author in PubMed Google Scholar
Marinos Kalafatis
View author publications
You can also search for this author in PubMed Google Scholar
Achille Giacometti
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Gili
View author publications
You can also search for this author in PubMed Google Scholar
Guido Caldarelli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.T. and G.Ca. conceived the paper, designed research, performed research, wrote the manuscript. G.T. conducted the analysis of empirical data and the numerical calculations. G.Ci., M.K., A.G., T.G., contributed to research, data, and writing of the paper.

Corresponding author

Correspondence to Georgios Tsekenis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tsekenis, G., Cimini, G., Kalafatis, M. et al. Network topology mapping of chemical compounds space. Sci Rep 14, 5266 (2024). https://doi.org/10.1038/s41598-024-54594-9

Download citation

Received: 04 July 2023
Accepted: 14 February 2024
Published: 04 March 2024
DOI: https://doi.org/10.1038/s41598-024-54594-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.