Simplifying computational workflows with the Multiscale Atomic Zeolite Simulation Environment (MAZE)

Zeolites, an important class of 3-dimensional nanoporous materials, have been widely explored for a variety of applications including gas storage, separations, and catalysis. As the properties of these aluminosilicate materials depend on a number of factors (e.g., framework topology, Si/Al ratio, extra-framework cations etc.), detailed experiments (e.g., catalytic properties, adsorption capacities etc.) are often limited to only a handful of materials. Computational methods have played an important role in (1) providing molecular level insights to rationalize experimental observations, and (2) screening large libraries of zeolites to identify promising candidates for experimental synthesis and validation. Different levels of theory and computational chemistry codes are necessary to describe the range of relevant phenomena such as adsorption (e.g., grand canonical Monte Carlo), diffusion (e.g., molecular dynamics), and chemical reactions (e.g., density functional theory). Manipulation of atomic structures, handling of input files, and developing robust workflows becomes quite cumbersome. To mitigate these challenges, we describe the development of the Multiscale Atomic Zeolite Simulation Environment (MAZE) – a Python package that simplifies zeolite-specific calculation workflows by providing a user-friendly interface for systematically manipulating zeolite structures.


Introduction
Zeolites are a broad class of silica-based nanoporous materials which are widely used for various industrial applications including gas separation and catalysis [1][2][3]. Transition metal (TM) exchanged zeolites combine the desirable characteristics of heterogeneous catalysts (high thermal stability and simpler separations) with those of enzymes (high selectivity and reactivity under mild conditions) [3] and have received considerable attention as catalysts for numerous reactions, such as NOx abatement [4] and methane valorization [5]. Computational modeling is often used to provide insights (e.g., thermodynamic stabilities [6], reaction barriers [7] etc.) into the reaction mechanisms and properties of zeolites [8,9]. Given the various lengthand time-scales associated with molecular processes (e.g., adsorption, diffusion, reaction), multiscale approaches that combine wave function theory, periodic density functional theory and classical force fields are often necessary [8]. While a number of broadly-applicable software packages are available for performing these calculations [10], a software toolkit for zeolite-specific tasks would be valuable. In this work, we describe the design and capabilities of a new python-based open-source software package -Multiscale Atomistic Zeolite Simulation Environment (MAZE).
The increasing availability of open-source software packages [11] that offer an user-friendly interfaces has greatly simplified the process of performing computational chemistry calculations. For example, the Atomic Simulation Environment (ASE), provides interfaces to various computational chemistry codes (e.g., VASP [12], LAMMPS [13], GPAW [14]). ASE provides Pythonbased wrappers to the underlying quantum chemical simulation code and offers an intuitive application programming interface (API) for setting up, starting, and analyzing calculations [15,16]. By automating the cumbersome computational setups and subsequent data analysis, ASE simplifies the process of performing and analyzing complex calculations [15,17]. Furthermore, by allowing manipulation through Python scripts, rather than a GUI, these calculations become self-documenting, reproducible and easy to streamline into complex workflows [11,15].

Current limitations with tracking atoms within the ASE interface
Despite an active user community and continued developments within the ASE codebase [15], a few specific structural manipulation tasks are challenging to implement within ASE. Often a variety of structural manipulations (e.g., extracting and reinserting clusters, adding terminal H atoms etc.) are necessary to address a zeolite-specific scientific question-the current ASE interface is not well suited for ''tracking'' the resulting changes in the atom indices. This is illustrated using a simple example below.
In ASE, groups of atoms are represented in memory by Atoms Python objects. In an Atoms object, the properties of all of the atoms are stored in NumPy arrays. When a specific atom in an Atoms object is accessed using the get_item method (e.g., my_atoms[index]) an Atom object is created, which has (among others) the attributes 'tag', 'position', 'symbol' and 'index'. The underlying data structure for storing the atoms properties is highly efficient, since it does not require storing an individual Python object for each atom represented by an Atoms object. Unfortunately, it also means that the indices of each atom can change with each addition or deletion. Fig. 1 shows the structures and atom indices for various glyoxal derivatives, demonstrating how structural manipulations can alter the indices of atoms. Fig. 1 shows how the indices of the individual atoms can change when additions and deletions are performed. The most pronounced difference is in the fourth structure, where the deletion of two atoms causes the decrement of the index 4 and 5 to 2 and 3 respectively. The addition of atoms simply extends the arrays and are thus added to the end. There is no inherent ordering in the indices; the order in which atoms are added to a structure effects the final order of atoms. If a substitution is made by changing the identity of a given atom, then the indices remain unchanged. The mutability of Atoms objects, and subsequent index shifting, introduces significant complexity in tracking the relationship and identity of atoms. These are a major bottlenecks for developing workflows that include both periodic and cluster calculations with zeolites.

The MAZE solution
Recognizing the challenges outlined above, the MAZE package puts the atom relationship tracking at the center of its design, while maintaining compatibility with all existing ASE's features. As demonstrated in the following sections, it greatly simplifies zeolite workflows and reduces the difficulty in performing a series of structural manipulations.

Architecture overview
The MAZE project aims to include all of the functionality of the base ASE package while including additional functionality related to the tracking of atoms. This is incorporated by using inheritance. A zeolite is a group of atoms, so it is appropriate to create a Zeolite class that inherits from ASE's Atoms class. The Zeolite class represents a zeolite and includes additional methods and properties for identifying the unique crystallographic sites. Polymorphism ensures that the Zeolite class has all of the attributes and methods of the parent Atoms class. Thus, all of ASE's methods and classes also work well with it.
The additional functionality of the Zeolite class is divided between two classes, the parent PerfectZeolite class and its subclass Zeolite. The PerfectZeolite class includes the functionality for building a Zeolite from a labeled CIF file and preserving the site labels. The methods included in the PerfectZeolite are all of those related to site identification, and serialization. In a group of zeolites there can be only one perfect zeolite, from which all the derivatives (e.g., Bronsted H versions, adsorbates etc.) are made. A simplified unified modeling language (UML) class diagram for the Zeolite and PerfectZeolite classes is presented in Fig. 2.
Users of the MAZE package will interact primarily with Zeolite objects. The main additional features of the Zeolite class versus the PerfectZeolite class are related to atom manipulation, such as adding atoms, deleting atoms, extracting clusters and capping clusters. By dividing the functionality between two classes, the attributes that make a Zeolite and those involved in structural manipulation can be separated, greatly simplifying the underlying code. The underpinning of the Zeolite functionality is an internal IndexMapper object, which tracks the relationship between the indices of the atoms in the zeolites derived from the same parent structure.

The IndexMapper class
The instances of the IndexMapper class are responsible for tracking the relationship between atom indices. A reference to an IndexMapper object is an attribute of each Zeolite class and related Zeolites share the same IndexMapper. The IndexMapper does not directly encounter Atoms objects, but only works with their indices. The core data structure of the IndexMapper is the main_index, which consists of a collection of nested dictionaries.
The key of the outer dictionary is the unique id of each row of atoms in the object (Fig. 3). The inner dictionary consists of each   The main_index is automatically updated when each atom manipulation operation is performed and does not require additional intervention from the user. The index mapper class can be used to directly map between two related zeolites with the get_index function, yet its core benefit comes about by enabling the structural manipulation functions such as cap_atoms and integrate.
Complimenting this index mapper are the add_atoms and delete_atoms methods in the Zeolite class, which return a copy of the original Zeolite object with the applied modifications and append a new column to the IndexMapper's main index. This new column contains the indices of the newly created Zeolite object and each row indicates the relationship between the atoms in other zeolites. If a Zeolite object is deleted, then the deconstructor  Comparison between MAZE and ASE code for generating a BEA zeolite structure with the Silicon T2 sites replaced by aluminum atoms. The MAZE code uses the built-in make function to read the unmodified CIF file and store the mapping in the site_to_atom_indices dictionary. The longer ASE code requires a modified CIF file as input, and the element mapping to be manually defined. Both codes generate and visualize the same BEA T2 Si→Al structure. will remove its corresponding entry from the IndexMapper, preventing the main index from being cluttered with deleted Zeolite object indices.

Illustrative examples
To demonstrate the capability of the MAZE code and assess its API three distinct tasks were performed. These tasks include building a Zeolite object from a labeled International Zeolite Association (IZA) CIF file, adding and removing atoms, and a complete workflow involving removing a cluster, changing some of its atoms and reinserting it back into the original zeolite.

Building a zeolite from a CIF file
Zeolites often contain multiple distinct T-sites, each of which has unique chemistries arising from differences in the local atomic environment. Comprehensive zeolite screenings studies require all unique T sites to be systematically explored. Computational studies of this type start by downloading a CIF file from the IZA database, placing the file in the project folder and reading the CIF file into an Atoms object with the ase.io.read function. One challenge with this approach is that CIF files downloaded from the IZA database contain extra information about the identity of unique atoms, which is not preserved when the CIF file is loaded with ASE's read function. Thus, various ''hacks'' are needed to align the Atoms object built from the CIF file with their labels. One hack used in our group involves changing a unique site from a silicon to an unused atom such as xenon, and when the Atoms object is loaded reverting it back to a silicon, noting the indices (see Fig. 4, right). This manual tagging mechanism is slow, opaque because the code is no longer self-documenting, and error prone, since it involves manually editing a critical data file.
The MAZE package significantly improves this process by introducing the make method. The make method takes a zeolite IZA code as input, looks for the corresponding CIF file, and if it is not found attempts to download the zeolite CIF file from the IZA database. After locating or downloading the correct CIF file, the make function then builds a Zeolite from a IZA CIF file, and stores the mapping between the indices and their identities in two of the Zeolite objects internal dictionaries. The identities of the sites can then be determined by using the get_site_type method or by accessing the dictionaries directly (Fig. 4).

Structural manipulations
The Atoms class' structural manipulation features allow atoms to be added and removed from the collection and the properties of individual atoms to be altered. The API by which these manipulations are performed is inspired by Python's list manipulation methods. Although familiar to Python users, these manipulations are not self-consistent as some have side effects (e.g., the pop method) while others are side effect free such as the __add__ method. In zeolite workflows, it is common for many derivatives of a single parent zeolite to be generated, and this is complicated by methods with side-effects, due to the need for explicit copying prior to each modification.
In alignment with the goal of the MAZE project, new methods for atomic manipulation were designed, which do not mutate the underlying object, and instead return a copy with the applied modifications. These methods (add_atoms and delete_atoms) simplify the computational workflows and also allow for method chaining improving code readability. A list of the available methods for the ASE Atoms object and the MAZE Zeolite object are shown in table S1.

Cluster extraction, atom capping and integration
The power of these additional structural manipulation features can be demonstrated by performing a complex workflow. A typical zeolite unit cell contains over one-hundred atoms, but the region of chemical interest is frequently confined to the atoms adjacent to a few T-sites. To reduce the computational expense of quantum chemical calculations, the calculations are typically performed on a smaller subset of atoms adjacent to the active sites of interest. This subset of atoms is referred to as a cluster [18]. Capping atoms (usually, hydrogens) are added to the terminal cluster atoms to make chemically meaningful structures. The optimal position for the capping atoms is based on the parent zeotype's structure. After the capped cluster's structure has been optimized, the cluster can be integrated back into the initial zeolite for further downstream analysis.
This workflow is extraordinarily difficult to perform with the ASE base package due to the challenge associated with tracking the relationship between atom indices during the extraction, manipulation, and reinsertion step. The Zeolite class's built-in index mapper ensures that the relationship between atoms can easily be determined and forms the basis for the simple functions that perform this workflow. In Fig. 5 a pictorial representation of stages in the workflow is shown along with the methods needed to perform the transformation from one stage to the next.
The overall workflow has six distinct structures bridged by functions which take the previous structure as an input and output the new structure. The cluster structures (B, C, D, E) have different indices than the BEA frameworks (A, F), yet the indices can easily be mapped to each other using the built-in IndexMapper's get_index method. Since the functions do not alter the zeolite to which they are applied, and instead return a new zeolite object, they can be chained together. The chained methods required to transform structure A into structure F is shown in Fig. 6.
The code presented in Fig. 6 demonstrates how a complex workflow can be achieved with the chaining of several functions together. This simplicity allows for knowability of the operations, precise and complete operability, and robustness due to high readability. The scrambling of the indices with the cluster extraction does not allow for consistent code using the base ASE package. Instead, the indices of each atom must be matched manually at each stage of the process. Thus, the MAZE package interface has increased the knowability, operability and robustness compared to the cumbersome manual workflow required when using the base ASE package.

Impact and conclusion
The improved API of the MAZE package was presented here by demonstrating how to perform representative tasks. Several other features of the MAZE package include database integration and adsorbate additions. A complete description can be found in the documentation, which is referenced in the supplementary material. MAZE's improved API builds on-top of the Atomic Simulation Environment. This new interface facilitates computational zeolite calculations by greatly simplifying the steps needed to perform common zeolite tasks. [2] Computational experiments are less labor intensive than wet lab experiments, but lack of optimal APIs for scientific software and complex workflows can incur a significant time commitment from researchers to setup and run. By creating custom software tailored to the specific task, research can be simplified and larger scale experiments can be conducted.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.