MGOS: A Library for Molecular Geometry and its Operating System

The geometry of atomic arrangement underpins the structural understanding of molecules in many fields. However, no general framework of mathematical/computational theory for the geometry of atomic arrangement exists. Here we present"Molecular Geometry (MG)"as a theoretical framework accompanied by"MG Operating System (MGOS)"which consists of callable functions implementing the MG theory. MG allows researchers to model complicated molecular structure problems in terms of elementary yet standard notions of volume, area, etc. and MGOS frees them from the hard and tedious task of developing/implementing geometric algorithms so that they can focus more on their primary research issues. MG facilitates simpler modeling of molecular structure problems; MGOS functions can be conveniently embedded in application programs for the efficient and accurate solution of geometric queries involving atomic arrangements. The use of MGOS in problems involving spherical entities is akin to the use of math libraries in general purpose programming languages in science and engineering.


Introduction
In physics, chemistry, and materials science, the properties of inorganic molecules result from the arrangement of their atoms [1][2][3]. In biology, the structure of biomolecules determines their function [4][5][6][7][8][9]. A molecule's properties and interactions with its environment depend on the geometrical arrangement of its atoms, and geometry has long been one of key issues in the study of atomic arrangements. In physics and materials science, examples include the diffusion of lithium ions through paths closely correlated with geometric channels [1]; the porosity and surface area of metal organic framework (MOF) for hydrogen storage [2,3], water content regulation in polymer membranes through nanocracks which work as nanoscale valves [10], to name a few. In biology, classic examples are the shape complementarity of the double-helix structure of DNA [11,12], and the lock-andkey [13] and induced-fit theories [14] of small-molecule binding to proteins. There are many other examples: the linear relationship between hydrophobic energy and the loss of solvent accessible surface area [4]; the effect of voids on the solvation and hydration of proteins [6]; the channel structure of ion channels and pumps across cell membranes [7] and in the ribosome for protein synthesis [8]; ferritin as a protein nanocage for iron storage [9]; the Connolly surface of proteins [5]. The examples assert that accurate and efficient geometric computation is critical for understanding and designing molecules.
However, many studies to date on molecular geometry problems have mostly been based on Monte Carlo simulation, counting grid points, or approximations. For instance, molecular volume is commonly estimated by counting the numbers of random points or grid points contained in the molecule [15]; conversely, molecular voids are recognized by removing these grid points [16]. Another example is the imprecise estimation of solvent accessible surfaces [17], which is critical for solvation models used in the calculation of electrostatic energy. Fig. 1 shows the comparison between an analytic [18,19] and a grid-counting [16] method for computing molecular voids using a test data set consisting of 300 biomolecular structures from the Protein Data Bank (PDB [20]). See Appendix A for the 300 PDB codes. In Fig. 1(a), the horizontal axis denotes the size (i.e. the number of atoms) of each molecule of the test set and the vertical axis denotes the number of computed voids in the molecular boundary in which at least one water molecule can be placed. Water molecules are modeled as spherical probes of radius 1.4 Å. The red filled circle corresponds to the output from the BetaVoid program [18] which implements an analytic method using the Voronoi diagram of three-dimensional spherical atoms. The other three types of mark denote the results computed by the VOIDOO program (http://xray.bmc.uu.se/usf/ voidoo.html) [16] corresponding to the grid resolutions of 0.1, 0.5, and 1.0. Fig. 1(b) is a zoom-in of the red rectangular box of Fig. 1(a). Note that VOIDOO finds fewer voids than BetaVoid does. Fig. 1(c) and (d) show the total volume of all the computed voids and Fig. 1(e) and (f) show the computation time taken by the programs. The following observations were made. Compared to the correct solutions computed by the BetaVoid program, VOIDOO finds fewer voids (i.e., it misses many small voids) but significantly overestimates void volumes (despite missing many voids) while it takes significantly more computation time than BetaVoid. VOIDOO, at 0.1 Å grid-resolution, crashes on many moderately sized molecules due to memory shortage. This experiment clearly shows how an analytic approach compares with an inaccurate and inefficient approach using grid points. The experiment was performed on a personal computer with Intel Core i5-4670 CPU (3.4 GHz), 8 GB RAM, and Windows 7 Enterprise K (64 bit).
The use of such resolution-dependent approaches is common despite their unreliable, inconsistent, and sometimes conflicting results [21]. We observe that VOIDOO is still popular in diverse disciplines [22][23][24][25][26][27][28][29][30][31][32] and studies of grid-based algorithms continues [33]. The absence of an overarching analytical theory is because individual researchers have focused on problem-specific, local aspects of geometry problems, concentrating on isolated issues such as surfaces, voids, channels, volumes, areas, and so on. With so many independently developed methods, it has been hard to build a general computational framework for accurately and efficiently solving all these types of geometrical problems.
Here we introduce ''Molecular Geometry (MG)'' as a general framework of mathematical/computational methods for solving molecular structure problems in geometry-priority approaches, and describe the ''MG Operating System (MGOS)'' which is a library of callable C++ routines for implementing the MG approach in analytical methods. The proposed analytical methods are based on the Voronoi diagram of three-dimensional spheres [34], the quasi-triangulation [35,36], and the beta-complex [37]. The MG/MGOS method has three primary advantages: application independence, researcher productivity, and solution correctness/ accuracy. In other words, equipped with MG/MGOS, researchers from diverse disciplines can conveniently and easily build computational models to solve molecular geometry problems and quickly obtain correct (or accurate) solutions. Section 2 briefly reviews the evolution of the geometry concepts applied to atomic arrangements for materials and biomolecules. Section 3 introduces Molecular Geometry as a new computational discipline for studying atomic arrangements. Section 4 introduces the Molecular Geometry Operating System as a tool for implementing MG. Section 5 presents two example molecular geometry problems solved by MGOS. Section 6 presents the application-neutral architecture of MGOS. Section 7 concludes.

How the geometry concept has evolved in the molecular world
Johannes Kepler's treatise The Six-cornered Snowflake in 1611 and Robert Hooke's book Micrographia in 1665 might be the earliest observations of crystallization as a sphere packing process. In Cristallographie in 1783, Rome de L'Isle treated geometry and chemical composition with an equal importance to characterize mineral properties and found ''the law of the constancy of interfacial angles'' which became the foundation of crystallography. Before the advent of X-ray crystallography, crystals were primarily studied from a geometry perspective. In 1805, John Dalton introduced the concept of the spherical atom as the indivisible unit of matter and in 1874, Le Bel and Van't Hoff independently introduced the concept of tetrahedrally coordinated carbon atoms [38,39]. This became the foundation of modern stereochemistry which is the basis of the study of molecular  [18] and VOIDOO [16]. The test set consists of 300 PDB structures ( Table 1 in  structures [40]. Understanding steric effects (i.e. each atom occupies a certain amount of space) is the basis of the stereochemistry of atoms and provides a geometric understanding of the molecular world. The coordination number of an atom, defined by Werner in 1893, is still a commonly used geometric measure of atomic arrangement.
In 1940, Sidgwick and Powell proposed that molecular structure is determined by the electron pairs in the valence shell [41,42]. This idea was developed in 1957 by Gillespie and Nyholm [43] into what is now known as the valence shell electron pair repulsion (VSEPR) model, the name proposed in 1963 [44], which has been used for predicting molecular structure using the Pauli Exclusion Principle, but without solving any explicit equation. VSEPR is one of the simplest and most successful models of molecular structure [44,45], and remains popular. VSEPR can be viewed as a geometric approach to understanding the molecular world.
Molecular biology is the molecular world where geometry has arguably received the most attention. In 1890, Emil Fischer proposed the well-known lock-and-key theory to explain the interactions between biomolecules. This is an excellent example of modeling biomolecular phenomena through geometry [13,46]. In 1953, the year that the double-helix structure of DNA was discovered, Francis Crick suggested the idea of a computational approach to the binding between two small molecules through their surfaces [47]. Crick posited that shape complementarity in the helical coiled coil could be modeled as knobs fitting into holes. This could be the first proposal of explicitly using geometry to understand molecular phenomena, and became the basis of molecular docking. In 1958, Koshland extended the lock-and-key theory to propose the induced-fit theory [14,48,49].
The first determination of the three-dimensional structure of a protein was performed by John Kendrew and Perutz in 1960 [50] when they solved the structure of myoglobin. Since then, protein structure determination has become almost routine work; and the PDB contains 152,500 biomolecular structures as of June 8, 2019 [20]. Given atomic arrangement databases, such as the PDB, geometry analysis becomes one of the most important research topics for researchers. Cavities in biomolecules are fundamental for function, stability, dynamics, ligand binding, etc. The first computational study of cavities in proteins was reported by Lee and Richards in 1971 [51]. Chothia in 1974 found that the hydrophobic energies in proteins are directly related to the solvent accessible surface area of both polar and non-polar groups, and reported the linear relationship between the hydrophobic energy of proteins and the loss of solvent accessible surface area during folding [4,52]. This demonstrates that the atoms in folded globular proteins tend to be tightly packed. Thus a large residue volume, and consequently a low overall density, suggests the model of the protein is a poor one and, conversely, a small volume, and high density, suggests it is more likely to be a good one [52]. A protein's interior is closely packed, with few cavities, so that no water molecules are trapped in non-polar cavities [52,53]. The dense packing is critical in stable folding, and residue volumes are directly related to packing energies and conformational entropies. The stable aggregation of secondary structures increases their interaction area to achieve a high hydrophobicity and results in an increased molecular density.
In the case of enzymes, which are globular proteins, the optimal way of minimizing the volume and the solvent accessible surface area while keeping a constant potential energy is to make the shape as spherical as possible with as few cavities as possible. Due to the potential energy constraint, the overlap between atoms is limited at a certain level. Therefore, this is a geometric optimization problem of packing spherical atoms in a spherical container of an appropriate size. However, certain geometric features need to be conserved for the molecule to maintain its function. For example, proteasomes require their channel structures for disassembling proteins, ribosomes need to conserve their channels for synthesizing proteins, while membrane proteins require channels for the passage of ions. Therefore, to minimize both volume and accessible surface area under the potential energy constraint, while preserving their crucial geometric features, the interior voids of these proteins must be somehow minimized. Hence, the accurate computation of voids in a molecular structure is important for assessing the structure. In this regard, the recognition of molecular cavities, such as channels and voids, the computation of their global properties, and understanding their topological structures are fundamental. As PDB data has been more frequently used, the importance of its quality has also increased. There are now a number of tools for assessing structural quality [54][55][56]. Fig. 2 shows the computational process of solving molecular problems. In Fig. 2(A), Mapping I depicts the traditional approach of going directly from a particular molecular problem M to its solution Sol(M). There are uncountably many molecular problems and each problem can have alternative mappings because its modeling is dependent on the nature of the study. This leads to uncountably many instances of Mapping I. Each mapping instance usually consists of nontrivial computational steps and almost always contains a geometry subproblem involving spherical objects, which in many cases are van der Waals atoms. Earlier studies [1][2][3][4][5][6][7][8][9]57] show this issue is real and highly common. Surprisingly, many seemingly easy geometry problems among spheres remain challenging, if not computationally hard to solve, because of a lack of a suitable mathematical/computational framework. Therefore, researchers often spend a significant amount of time and effort, in the course of solving their geometry problems, developing and implementing their own algorithms. Furthermore, due to the complexity of the geometry problems, researchers usually employ Monte Carlo simulation, grid counting, or other approximate methods.

Molecular geometry: A new approach to study atomic arrangement
MG provides an alternative, orthogonal method to this traditional approach. It bypasses the time-consuming and error-prone Mapping I by taking the walk-around path consisting of Mappings II, III, and IV. First, the problem M is modeled as a geometry problem G involving spherical atoms (Mapping II). Then, G is solved via geometric theorems to give the solution Sol(G) (Mapping III) which is back-transformed toŜol(M) in the original molecular space (Mapping IV). The thesis is thatŜol(M) ≈ Sol(M), possibly with some preconditions. The forward and backward transformations of Mappings II and IV are together called the geometrization while the computational methods for Mapping III form the geometry kernel. The geometrization and geometry kernel together form the basis of the discipline MG (which is different from the earlier notion [42]).
Sol(M) is either close enough to, or a good approximation of, Sol(M) to allow a more intensive computational process such as a molecular dynamics (MD) simulation to be launched. As the computational cost of the walk-around path of Mappings II, III, and IV is significantly cheaper than that of Mapping I, the path may iterate as many times as necessary by refining the geometrization. If the criteria for the convergence ofŜol(M) can be defined, the solution process can iterate, possibly without human intervention. Physicochemical and biological properties should be carefully reflected during the geometrization. Given a proper geometrization and a geometry kernel, the path might be automated to iterate if necessary. Fig. 2(B) depicts the significant reduction of both human effort and computational requirement by the MG approach. Fig. 2(C) through (H) illustrate how a docking simulation program can adopt MG/MGOS in its algorithm. Given a receptor (C) and a ligand (D) for docking, it is desirable to identify a pocket (E) on the receptor surface where the ligand might bind (Mapping II). Then, the conformation of the ligand within the pocket can be found by minimizing the distance between the atom sets of both the ligand and the pocket, where the distance is defined by a geometric measure that can be easily evaluated (F and G) (Mapping III) [58]. Multiple conformations can be found quickly. The ligand conformations can then be used as initial solutions for a global optimization procedure such as the genetic algorithm using a fitness function reflecting the physicochemical and biological measures (H) (Mapping IV). It turns out that the geometrical best-fit solutions using van der Waals radii for atoms are often sufficiently close to the global solution. [59] is another example for side-chain prediction.
The MG approach has two preconditions: a mathematically and computationally well-established geometry kernel and a physicochemically and biologically well-defined geometrization. The MGOS engine's geometry kernel is written in standard C++ and is based on the Voronoi diagram of three-dimensional spheres [34] and its two derivative constructs [37]. The geometrization is inevitably domain-dependent and is somewhat empirical. For example, different sets of atomic radii may be used for different problems [60,61]. The effective Born radius [62,63] may be most appropriate when using the generalized Born approximation of the Poisson-Boltzmann equation to account for the electrostatic contribution to solvation energy. In studying a potassium channel's recognition selectivity, its dependence is likely to be on ion radius rather than charge density [64]. The analysis of protein packing, protein recognition and ligand design [65], etc will be governed by the radii of different atomic groups. Previous studies [1][2][3][4][5][6][7][8][9] can be interpreted as efforts at applying different types of geometrization. A set of geometrization primitives and parameters for each and every application domain should be defined through theoretical studies, experiments, and collaborative thoughts.

MGOS: The engine to implement MG
MGOS implements MG. The usefulness of MGOS is akin to a math library for general-purpose programming languages in science and engineering. Imagine the time and effort it would take a researcher, even with good programming skills, to code from scratch an algorithm for evaluating, say, sin(1.23) or √ 2, without a math library. Would the code be accurate and efficient enough? Any complicated scientific problem is likely to require calls to many such functions, so could one effectively develop an effective program without such a math library?
MGOS consists of a set of natural-language-like application programming interface (API) functions, easily callable from application programs (see Appendix B for the list of current MGOS APIs) and efficiently provides a correct/accurate solution of geometric queries involving the arrangements of spherical objects where the objects are frequently van der Waals atoms. For example, the compute_volume_and_area_of_van_der_Waals_ model() command computes the volume of the space taken by the atoms (with the van der Waals radii) of a given molecule.
The name of the command is clear about its function. The com-pute_voids_of_Lee_Richards_model() command finds all interior voids where an a priori defined spherical probe can be placed (e.g. a sphere with 1.4 Å radius for water) and computes void properties. Computed voids can be further processed. For instance, the voids can be sorted according to volume or boundary area; the atoms whose boundary contribute to each void can be reported; the segment of the atom boundary contributing to the void can be identified and its area computed, etc.
An early attempt at a formal theory to investigate the geometry of atomic arrangement was based on the ordinary Voronoi diagram of points, originally used by Bernal and Finney in 1967 for analyzing liquid structure composed of monosized atoms [66]. Being the most compact representation of proximity among points, the ordinary Voronoi diagram, and its dual called the Delaunay triangulation, has proved the best method for solving spatial problems for points [67]. To extend the theory from points to polysized spheres, we use the Voronoi diagram of spheres [34], also called the additively-weighted Voronoi diagram, which correctly recognizes the Euclidean proximity among the spherical objects between any pair of nearby spheres. Our Voronoi diagram of spheres, along with its derivative structures, the quasi-triangulation [35,36] and the beta-complex [37], provides a powerful computational platform for mathematically rigorous, algorithmically correct, computationally efficient, and physicochemically and biologically significant, and practically convenient method for any geometry problem involving spherical atoms.

Use cases
We show here how a few simple MGOS APIs can be used to easily compute otherwise difficult to compute geometric features such as voids, channels, water-exposed atoms, etc. of a protein consisting of many atoms.  Fig. 3 shows a protein structure (PDB id: ijd0) with more than 4000 atoms. We want to find the boundary atoms exposed to water molecules (modeled as spheres of 1.4 Å radius), and buried atoms. Then, we want to find voids which can contain water molecules and any channel structures that allow the passage of water molecule. Fig. 3(A) shows the space-filling, or CPK-model, of the protein structure. Observe that there is a tiny hole corresponding to a channel penetrating the structure. Fig. 3.(B) shows the quasitriangulation computed by the MGOS API commands in block B1. The command MG.preprocess() computes the Voronoi diagram of the input atoms and transforms it to the quasitriangulation. Fig. 3(C) shows the beta-complex corresponding to water molecules (i.e. spherical probes with 1.4 Å radius). Fig. 3(D) and (D') show the atoms exposed to and buried from bulk water, respectively (computed by block B2). Hence, the union of the structures in Fig. 3(D) and (D') is the input structure in Fig. 3(A). Note that the challenging task of the correct and efficient computation of these structures can be easily and conveniently done by calling a few MGOS APIs. Fig. 3(E) shows the voids (green) that may host one or more water molecule (from a geometric point of view) where the molecule is displayed by a ball-and-stick model. The voids were computed by the program segment in B3. Fig. 3(F) shows the largest (by volume) of the recognized voids, and the atoms whose boundaries contribute to the boundary of this void. We call these atoms the contributing atoms. If it is necessary to investigate if a water molecule can indeed be placed in the void, the biochemical or biophysical properties of the surface segments of the void boundary can be further analyzed by computing the precise geometric information of the patches of atomic boundaries using MGOS APIs. In fact, the compute_voids_of_Lee_Richards_model( WATER_SIZE ) finds all voids that may contain water molecule(s), computes the volume of each void, computes the boundary area of each void, finds the contributing atoms, computes the area of the contributing patch(es) of each contributing atom, etc. The program segment in B4 simply returns the contributing atom information already computed by the command above. Fig. 3(G) shows the channels that may allow a water molecule to move. Like the voids, the surface properties of these channels can be further investigated if necessary. These channels were computed by the program segment in B5. Fig. 3(H) and (I) show two different visualizations of the biggest channel with its contributing atoms and spine, respectively. This biggest channel is located by the program segment in B6. Refer to Supplementary Video 1 for the three-dimensional animation of this computational process.  Lee-Richards solvent accessible model. It is worth noting that without the MGOS engine, it is very difficult to correctly and efficiently find these sets because it is necessary to distinguish the atoms exposed to solvent from those that are buried.  ferritin, a potassium channel, and a metal-organic framework. Program-Use-Case-II requires four pieces of input data: (i) A file containing PDB codes (Fig. 7(a)), (ii) the size of the solvent probe, (iii) the name of output file to store computed results ( Fig. 7(b)), and (iv) the PDB model files.

Case II: Analysis of 100 atomic arrangements
The program begins by including MGUtilityFunctions.h in addition to MolecularGeometry.h because the program also uses some utility functions related to file I/O. The command in line 7 opens a file, say FILE_IN, which contains the 100 PDB codes to use. The first line of FILE_IN contains the number 100 of PDB models. Each of the following lines contains a PDB code as shown in Fig. 7(a). The next command get_the_number_of_PDB_files() returns ''100'' by referring to the first line of FILE_IN. Line 9 sets the size of the solvent probe from the command line invoking program execution. Line 10 opens a blank output file, say FILE_OUT, for the computed results.
The command write_column_names_of_output_file() writes the column names to the first line of FILE_OUT as shown in Fig. 7(b).
The code chunk in lines 12-28 processes each PDB model by computing geometric features, measuring the elapsed times, and writing the results to FILE_OUT.  Fig. 7(b). The code for MGUtil-ityFunctions used in Program-Use-Case-II is shown in Fig. 8. Fig. 9 shows the graphs produced by using Microsoft Excel with the output file FILE_OUT for some computed results for the 100 PDB files. Fig. 9(a), (b), and (c) are the volumes, areas, and the numbers of voids, respectively. Fig. 9(d) shows the time for computing the Voronoi diagram and quasi-triangulation. Fig. 9(e) is the time for computing the volume and area, and Fig. 9(f) for the voids. Note that these graphs can be produced by a few clicks of the mouse button and column choices.

MGOS architecture
The architecture of MGOS has been carefully designed so that any future modifications will not require rewriting the existing code of an application program. If a molecular problem can be properly geometrized in terms of appropriate-sized spherical balls, the MG/MGOS framework can quickly provide the best possible solution. It is expected that the MGOS engine will evolve, with new functions to be added in the future. One area of interest is in developing methods for the optimal design of molecules in the concept of ''operating system'', e.g. in terms of side chain conformations, to develop a program to help engineer proteins.
Software architecture: MGOS is middleware, connecting application programs with a low-level Geometry Library performing geometric computations (Fig. 10). It is composed of a set of API functions callable from application programs; each is implemented by calls to the Geometric Library's functions which are application-independent. In addition to geometric properties, MGOS also makes use of molecular properties such as force-fields, electrostatics, etc.
The Geometric Library, the application-independent low level library, performs geometric computations among spherical objects and is based on three closely related constructs: the Voronoi diagram of three-dimensional spheres, the quasi-triangulation, and the beta-complex.
Topology data structure: Fig. 11 shows the design of the fundamental data structure for topology in the MGOS library. Three types of Voronoi diagrams (i.e., the ordinary Voronoi diagram of points, the power diagram, and the Voronoi diagram of spherical balls) are all stored in the radial-edge data structure (REDS) which is appropriate to represent cell-structured non-manifold objects [68]. ''REDS'' in the figure is a member data of the Voronoi diagram itself, which is denoted by VoronoiDiagram. On the other hand, the dual structure is denoted by Triangulation and has three instances (i.e., the Delaunay triangulation, the regular triangulation, and the quasi-triangulation which are respective dual structures of the three types of Voronoi diagrams above) and is stored in the inter-world data structure (IWDS) [35]. ''IWDS'' in the figure is a member data of the triangulation itself, denoted by

Triangulation.
The dual transformation is implemented between the two classes of REDS and IWDS. Thus the three dual transformations (i.e., the dual transformation between the ordinary Voronoi diagram of points and the Delaunay triangulation, that between the power diagram and the regular triangulation, and that between the Voronoi diagram of spheres and the quasi-triangulation) are all implemented through the transformation between REDS and IWDS. All three transformation instances are facilitated by a single transformation as they are all stored in the same topology data structure. Fig. 12 shows the details of REDS and IWDS. REDS in Fig. 12(a) stores the topology of the Voronoi diagram and has the class definitions of the topological entities of the Voronoi diagram: cells, faces, edges, and vertices which are denoted by VD_Cell, VD_Face, VD_Edge, and VD_Vertex. Each cell points to |F | faces which define its boundary and each face points to two incident cells. Each vertex points to its four incident edges and each edge points to its two vertices. In the ordinary Voronoi diagram of points, or power diagram, a face has only one loop of edges which defines the boundary of the face (thus called the outer-loop). In the Voronoi diagram VD of 3D spheres, however, a face may have an inner-loop(s) in addition to the outer-loop where each corresponds to an edge-graph disconnected from that of the rest of the entire Voronoi diagram. This observation is reflected in the pointer from VD_Face to Loop. In the Voronoi diagram, an edge has three, and only three, incident faces and in REDS, each edge has three copies of its replica called partial edges PartialEdge where each participates in the loop of an incident face. The three partial edges are connected in a circular manner in the counterclockwise orientation around the directed VD_Edge and in our implementation, each VD_Edge points one of the partial edges.
IWDS in Fig. 12(b) stores the topology of the quasitriangulation and has the class definitions of topological entities of cells, faces, edges, and vertices of the quasi-triangulation which are denoted by QT_Cell, QT_Face, QT_Edge, and QT_Vertex. Note that the data structure is designed for the quasi-triangulation because the other two triangulations are its special cases. In the quasi-triangulation, a cell has four faces and each face has two incident cells; A face has three edges and an edge has a set of 1 + N small−world pointers where each of the N small−world pointers indicates the entrance to a small-world. An edge has two vertices and a cell has four vertices. A vertex has a pointer to an incident edge and one to an incident cell.

Conclusions
Despite the importance of the geometry of atomic arrangements in many fields, no general framework of mathematical/ computational theory for the geometry of atomic arrangement exists. In this paper, we introduce ''Molecular Geometry (MG)'' as a theoretical framework and ''MG Operating System (MGOS)'' as a middleware to implement the MG theory.
We assert that MG/MGOS will free researchers from timeconsuming and error-prone tasks of developing and implementing highly sophisticated and complex algorithms of a geometrical nature for molecular structure studies so that they can focus more on fundamental research issues of their own. We anticipate that MG/MGOS will facilitate the enhancement of many popular programs and the development of many new programs from diverse communities of computational science and engineering working on the arrangement of spherical objects, including molecules.
The challenge remaining is how to identify the set of primitive transformations for geometrization so as to cover as diverse a range of applications and as accurate a set of solutions as possible. The extensions of MGOS to dynamic situations for moving atoms and to big models such as geometric cell models are also a challenge. We envision that MG and MGOS together will eventually establish a new paradigm for the computational study of atomic arrangements for both organic and inorganic molecules. MGOS is freely available at http://voronoi.hanyang.ac.kr/software/mgos/.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.