Reliable Computational Prediction of the Supramolecular Ordering of Complex Molecules under Electrochemical Conditions

We propose a computationally lean, two-stage approach that reliably predicts self-assembly behavior of complex charged molecules on metallic surfaces under electrochemical conditions. Stage one uses ab initio simulations to provide reference data for the energies (evaluated for archetypical configurations) to fit the parameters of a conceptually much simpler and computationally less expensive force field of the molecules: classical, spherical particles, representing the respective atomic entities; a flat and perfectly conducting wall represents the metallic surface. Stage two feeds the energies that emerge from this force field into highly efficient and reliable optimization techniques to identify via energy minimization the ordered ground-state configurations of the molecules. We demonstrate the power of our approach by successfully reproducing, on a semiquantitative level, the intricate supramolecular ordering observed experimentally for PQP+ and ClO4– molecules at an Au(111)–electrolyte interface, including the formation of open-porous, self-host–guest, and stratified bilayer phases as a function of the electric field at the solid–liquid interface. We also discuss the role of the perchlorate ions in the self-assembly process, whose positions could not be identified in the related experimental investigations.


Introduction
Supramolecular chemistry deals with intermolecular interactions and structure formation beyond individual molecules, and as such lies at the basis of many nano-and mesoscopic structures found in biology.In recent decades, impressive progress in the experimental branches of this field have resulted in at least two Nobel Prizes in chemistry.By contrast, the theoretical understanding and especially the in silico prediction of supramolecular ordering has lagged behind somewhat.This is easily understood if one considers the sheer size of the systems under study, requiring in many cases consideration of a solid substrate, a sufficiently large number of molecular building blocks or tectons, and a condensed matter medium (i.e. a solvent or electrolyte solution).The interaction of these three components, each with their intrinsic properties, and with optional extrinsic steering (e.g. by light, heat, electric field), will determine the observed supramolecular structures and govern the transitions between them. 1,2n this paper, we propose a new theoretical framework to predict supramolecular ordering of complex molecules at an electrochemical solid-liquid interface.The calculations were inspired by recent experimental work 3 in which particularly clear-cut transitions between supramolecular structures were observed as a function of the applied electric field at a metalelectrolyte interface.The target molecules whose supramolecular ordering is considered constitute an organic salt that consists of a large, disc-shaped polyaromatic cation (PQP + ) and a much smaller, inorganic anion (perchlorate, ClO − 4 ). 4,5e concept of choice to investigate these scenarios would rely (i) on a faithful description of the properties of the system (notably a reliable evaluation of its energy) via ab initio simulations and (ii) in a subsequent step the identification of the optimized (ordered) arrangement of the molecules on the substrate by minimizing this energy via efficient and reliable numerical tools; this optimization has to be performed in a high dimensional search space, spanning all possible geometries of the unit cell and all possible coordinates and orientations of the molecules within that cell.Both these approaches, considered separately from each other, are conceptually highly complex and from the numerical point of view very expensive, which precludes the application of this combined concept even for a single set of external parameters (such as temperature, density, and external field); it is thus obvious that systematic investigations of the self-assembly scenarios of such systems are definitely out of reach.
In this contribution we propose an approach to overcome these limitations via the following strategy: in a first step we map the ab initio based energies onto the energy of a related classical model, where the atomistic units of the molecules are featured as spherical, charged units with Lennard-Jones type interactions and where the electrolyte is treated as a homogeneous, dielectric medium; the interaction between the atomic entities and the metallic surface is modelled by a classical, perfectly conductive, Lennard-Jones like wall potential.
The as yet open parameters (energy-and length scales, charges, etc.) are fixed by matching the ab initio energies of the system with the related energies of the classical model: this is achieved by considering archetypical configurations of the system's building blocks (molecules and surface) and by systematically varying characteristic distances between these units over a representative range.These ab initio energies were then fitted along these 'trajectories' by the parameters of the classical model: the energy-and length-scales of the involved interatomic Lennard-Jones or Mie potentials as well as the atom-wall interaction parameters.
It turns out that this classical model is indeed able to reproduce the ab initio based energies along these 'trajectories' faithfully and with high accuracy.Even though the emerging classical model is still quite complex (as it features both short-range as well as long-range Coulomb interactions and involves mirror charges) it is now amenable to the aforementioned optimization techniques which thus brings systematic investigations of the self-assembly scenarios of these molecules under the variations of external parameters within reach.
As a benchmark test for our approach we have considered the above mentioned system, studied in recent experimental investigations: the cation is PQP + (9-phenylbenzo [1,2]   quinolizino [3, 4, 5, 6-fed] phenanthridinylium, a disk-shaped polyaromatic molecule), while the anion is perchlorate, ClO − 4 ; the self-assembly of these ions on a Au(111) surface under the influence of an external electric field was studied.The high accuracy with which the ensuing classical energies reproduce the ab initio simulation data make us confident about the applicability of the classical model for the subsequent optimization step: using an optimization technique which is based on ideas of evolutionary algorithms we have then identified the self-assembly scenarios of the ions on the Au surface, for a given set of external parameters (temperature, density, and external field).
The computational cost of this optimization step is still substantial.Therefore we postpone a detailed, quantitative and, in particular, systematic investigation of the self-assembly scenarios of the PQP + and the ClO − 4 ions on the Au surface for a broad range of external parameters to a later contribution.Instead we demonstrate in this contribution for selected sets of parameters that our approach is indeed able to reproduce several of the experimentally identified self-assembly scenarios.
These first results provide evidence that our approach is quite promising.It is furthermore completely flexible as it can easily be extended to other organic molecules of similar (or even higher) complexity.On the other hand one might also consider the question if the complexity of the current model (which -as a classical model -is comprehensive in the sense that it contains all atomistic features) could be further reduced, as the numerical effort is still rather high and represents a formidable conceptual bottleneck: one might, for instance, replace larger sub-units of the molecule (such as aromatic rings) by disk shaped units carrying higher electrostatic moments.Both directions will be explored in forthcoming contributions.
The manuscript is organized as follows: In Section 2 we describe the essential features of the experimental setup, introduce an ab initio and a classical representation thereof and discuss the mapping procedure between those different instances.In Section 3 we put forward the memetic optimization procedure based on ideas of evolutionary algorithms in order to identify ordered ground state configurations of complex molecules under electrochemical conditions and in Section 4 we present selected numerical results which demonstrate a semi-quantitative agreement with the experimentally observed self-assembly scenarios of PQP + and ClO − 4 ions on an Au(111)-electrolyte-interface under the influence of an external electrostatic field.We conclude our findings in Section 5. 2 The system and its representations Neither in the ab initio simulations nor in the classical model the electrolyte molecules are considered explicitly.The electrolyte is rather assumed to be a homogeneous effective medium with a permittivity of water, i.e. r = 78.36,[11][12] Hence, in this contribution we use 'electrolyte' as synonym for 'solvent' unless explicit use is required.
We emphasize at this point that in the experiment, an exact specification of the electric field strength is not possible, which limits the degree of quantitative comparison between experiment and theory.

Ab initio simulations
The density functional theory calculations were performed with the software package GPAW 13,14 and the structures handled by the atomic simulation environment. 15The electronic density and the Kohn-Sham orbitals were represented within the projector augmented wave method, 16 where the smooth parts were represented on real space grids with grid spacing of 0.2 Å for the orbitals and 0.1 Å for the electron density.The exchange-correlation energy is approximated as proposed by Perdew, Burke and Ernzerhof (PBE) 17 and weak interactions missing in the PBE functional are described as proposed by Tkatchenko and Scheffler (TS09). 18The TS09 approximation assumes that long range dispersive contributions are ab- ) molecules (structure formulas given in top right insets) close to a Au(111)-surface: two Au-layers are explicitly shown, the golden, shiny area represents the conductive Au-bulk, the black dashed line marks the surface of the electronic density which we interpret as mirror plane.The ions are immersed into an electrolyte (gray, shaded region), which is considered as an effective, homogeneous medium.In the region close to the Au surface (red to blue shaded areas) a homogeneous, electrostatic field E z (bold, colored arrow), oriented in the z-direction, features the electrostatic potential drop between the Au-surface and the reference-electrode inside the electrolyte.The colors of the atoms in the electrolyte correspond to their type, while the color of the mirror-atoms (located in the Au-bulk) specify their partial charges, quantified by the colorbar (see bottom right) in units of the electron charge, e. sent in the PBE functional, such that these can be applied as a correction.The total energy which is written as where E PBE is the PBE energy and E vdW is the TS09 correction.We have introduced a weight factor w S that will allow to incorporate electrolyte effects into the dispersive contributions as discussed below.For interactions in vacuum w S = 1.The presence of the aqueous environment on the electronic and nuclear degrees of freedom included in E PBE is modeled by a continuum solvent model. 19lecular interactions are studied on simulation grids with Dirichlet (zero) boundary conditions.Neumann (periodic) boundary conditions were applied in x-and y-directions in the surface plane for simulations involving the gold surface, while zero boundary conditions were applied in the perpendicular z-direction.The simulation grid was chosen such that at least 4 Å of space around the position of each atom in the non-periodic directions was  attractive contribution to the potential is not to be expected for the interaction of two anions and needs further discussion.We suspect an overestimation of dispersion interactions if these are treated as in vacuum and no screening through the electrolyte is considered.
The aqueous environment influences the van der Waals (vdW) interactions as these are of Coulombic origin. 21In order to derive an approximate expression for the screened vdW interaction of two molecules A and B at distance R inside the electrolyte, we express the C 6 coefficient defining the vdW energy C 6 /R 6 by the Casimir-Polder integral 18,22 where α * A,B are the polarizabilities of the interacting molecules and φ is determined by propagation of the electric field through the embedding medium 22 with φ = 1 in vacuum.

Both α *
A,B and φ are modified relative to vacuum in solution.In the simplest model 22 we may write φ(iξ) = ε −2 ∞ (iξ) with the frequency dependent relative permittivity of the electrolyte ε ∞ .The effects of the electrolyte on the polarizabilities α * A,B should, at least partly, already be included in the TS09 description through the effective atomic polarizabilities derived from the self-consistent electron density calculated within the electrolyte.What is left is the effect of the permittivity entering through the function φ(iξ) in Eq. (2).We assume that the main contribution of φ(iξ) is at the resonance frequencies of α * AB , which are in the optical region for usual molecules.We further assume that ε ∞ (ω opt ) is approximately constant in this frequency region, such that we may pull φ = [ε ∞ (ω opt )] −2 out of the integral.This factor scales the C 6 coefficient and therefore the vdW contribution.In other words, we apply the weight w s = [ε ∞ (ω opt )] −2 in Eq. ( 2) with the experimental permittivity of water in the optical region of ∞ (ω opt ) = 1.7, see Ref. 23 This approach reduces considerably the depth of the suspiciously deep minimum as seen in Fig. 2 such that only a shallow local minimum remains similar to the PBE potential.In what follows we use the same scaling for all the vdW contributions of the DFT potentials in this work.

Classical, atomistic model
In this subsection we describe how we cast our setup into a classical model where the atoms in the molecular constituents are described as spherical particles, each of them carrying a charge.The mapping is guided by the energies obtained via the ab initio simulations detailed above.
The Au(111) surface is modeled as a flat and perfectly conducting surface involving mirror charges as detailed below.However, we note, that the position of the corresponding surface in the DFT calculations does not coincide with the position of the atoms.Before proceeding he following comment is in order: in this mapping procedure the distance of a point charge to a metallic surface is unambiguously defined through the electrons leaking out of the potential defined by the nuclei. 24,25This feature can explicitly be seen in jellium models, 26 but emerges also in implicit calculations 27 where electrons spill out of the surface of metal clusters. 28From the latter study we estimate an effective spill out of the surface of 0.5 Å, a value that agrees qualitatively with estimates from the jellium models, extrapolated to large structures. 25This value will be used in the following for our problem.

The atomistic model
In our atomistic model the molecules are represented as rigid entities composed of atomistic constituents.The molecules are immersed into a microscopic electrolyte, which is treated as a continuous medium of given permittivity.From below the system is confined by a conducting Au(111)-surface (which is assumed to extend in the x-and y-directions), an external field (with respect to the electrolyte) can be applied in z-direction, i.e., perpendicular to the surface (or wall).Fig. 1 schematically depicts all details of this atomistic model for the PQP + ClO − 4 system, confined by the Au-surface.
In order to specify the different entities of the system and their respective interactions we use the following notation: (i) Each of a total number of N molecules is uniquely labeled by capital Roman indices I : for each of these units this index is assigned to its center-of-mass (COM) position vector, R I , to a vector P I , specifying its orientation within the lab-frame in terms of the angle-axis framework 29,30 (see S.I.Section 2.2 for more details), and to the set of coordinates, r N I , of the respective N I atomistic constituents of the molecule in its COM-frame (to which we also refer as its blueprint).The set of COM-positions and orientation-vectors of all N molecules are denoted by R N and P N .The set of all N I atom positions in the lab-frame is given by r n , and the position of each atom in the lab-frame is uniquely defined by a vector r i , labeled with Latin indices (ii) Between all atoms we consider long-range Coulombic interactions (index 'C'), with the inter-atomic distance r ij = |r i − r j | and charges q i and q j of the units i and j; the dielectric constant 0 and the relative permittivity r specify the implicit electrolyte.Further, we introduce short-range interactions (index 'sr') for which we have considered two options: first, a Lennard-Jones potential (index 'LJ'), i.e., for the energy-and length-parameters, ij and σ ij , we have opted for the standard Lorentz-Berthelot mixing rules, 31 i.e., Alternatively, we have also considered for the short-range interactions the Mie potential 32 (index 'Mie'), which can be considered as a generalization of the LJ interaction; its functional form is given by and allows for a variation of the exponents of the repulsive and attractive contributions to the potential, γ (R) ij and γ (A) ij , respectively.ij and σ ij are again parameters for the energy-and the length-scales.The C ij are defined as functions of the exponents: 32 for the exponents we apply arithmetic mixing laws, i.e., γ (iii) We assume the Au-surface to be perfectly conductive, consequently we need to explicitly consider mirror-charges in our model; when further assuming z = 0 as the plane of reflection, the Coulombic interaction becomes with the mirror charges q i = −q i and their positions r i = (x i , y i , z i ) = (x i , y i , −z i ).
(iv) We describe the solid-liquid interface in terms of a slab-geometry with a lower confining wall, i.e., we assume periodicity in the x-and y-directions, but a finite extent, c z , of the geometry in the z-direction which is chosen such that no restriction in the orientation of any molecule occurs, thus c z ≈ 1.2 − 2 nm, given their size and the slab-width.We define the (orthorhombic) lattice vectors, a = (a x , 0, 0), b = (b x , b y , 0), and c = (0, 0, c z ), which, without loss of generality, define the volume of the unit-cell, V = a x b y c z , and which we collect within the matrix V = (a, b, c).Together with the molecular basis, given by R N , P N and all N (rigid) molecular blueprints, r N I , we now define the supramolecular lattice which gives rise to all atomic coordinates in the lab-frame, r n , i.e., the molecular crystal structure of the system (see S.I.Section 2.2).
(v) The interaction between the atomic entities and the Au-surface is described via a LJ-type wall potential, 33 in the above relation, z i is the height of atom i above the surface, σ wi and wi are the length-and energy-parameters of the interactions of each atom i with the wall, respectively.
(vi) Finally, we express the electrostatic interfacial potential between the electrode and the Au-surface by an external, homogeneous electrostatic field, E z (i.e., oriented perpendicular to the surface): we account for this potential via U (field) (z i ) = z i q i E z . 34us and eventually the total potential energy of our model is given by the expression with 'sr' standing for 'LJ' or 'Mie'; we recall that r n is the set of all n atomic positions r i in a lattice with slab-geometry (defined by the unit-cell V).If not present (and not explicitly addressed) the electric field will be dropped in the argument list of Eq. ( 10), that is The notation ' * ' indicates that summation is only carried out over atoms which belong to different molecules I and J (with I = J).The energy given in Eq. ( 10) and the corresponding force-fields are efficiently evaluated using the softwarepackage LAMMPS. 35 evaluate the long-range Coulomb term, [38] The other terms in Eq. ( 10) are evaluated via direct lattice summation techniques.

Parametrizing the classical model via ab-initio calculations
In this work, the blueprint of each molecule r N I is obtained from electronic structure calculations based on density functional theory (DFT), using dispersion corrected ab initio structure optimization, 17,18 as described in Subsection 2.2.The partial charges of the atoms, q i , are parametrized via a Bader analysis 39 and are collected in Tables 2 and 3 in the S.I.Section 2.3.
These charges are directly transferred to the classical model.We repeat that neither in the ab initio simulations nor in the classical model (cf.Subsection 2.3.1) the electrolyte molecules have been considered explicitly; in the latter case we treat the electrolyte as an effective, homogeneous medium, introducing the electric permittivity of water r .
In order to fix the remaining model parameters that specify the interactions in Eq. ( 10) we search for each atomistic entity (labeled i) the set of atomistic model parameters (specified below) which reproduces via Eq.( 10) the ab initio energies as good as possible.On one side we consider either the length-and the energy parameters of the LJ-potential (denoted by L = {σ i , i }) or the length-and the energy parameters together with the exponents of the Mie-potentials (denoted by i }), as well as the wall parameters, W = {σ wi , wi }.To fix these parameters we proceed as follows: (i) We first perform ab initio structure optimization for different, characteristic molecular configurations, specified below.Here, molecules are either positioned next to each other (without considering the wall) or above the Au-surface: in the former case we fix the positions of two selected atoms belonging to different molecules, the atoms being separated by r ij ; in the latter case we keep the height, z k , of one selected atom above the surface constant.Relaxation of all other degrees of freedom leads in the ab initio simulations to spatially and orientationally optimized molecular structures; they are denoted by d n (r ij ) and d n (z k ), respectively, with corresponding energies U DFT (d n (r ij ), V) and U (wall) DFT (d n (z k ), V); they are, themselves, functions of the inter-atomic distance, r ij , and the atom-wall separation, z k , of the selected atoms.
(ii) For every optimized ab initio structure, d n (r ij ) and d n (z k ), obtained in this manner we define a corresponding molecular configuration r n (r ij ) and r n (z k ), which is based on the above introduced rigid atomistic model r N I (with the index I running now over all N molecules present in the respective DFT structure).To this end we synchronize the COM-positions of each molecule I in the ab initio simulation with the corresponding COM-positions R I of its classical counterparts and align their orientation P I accordingly.
(iii) Finally we evaluate the corresponding classical energies via Eq.( 10) at zero electric field, i.e.U L/M (r n (r ij ), V) and U (wall) L/M,W (r n (z k ), V).We search for the best set of parameters L (or M) and W via simultaneously minimizing Of course, in the classical model the same unit-cell, V, and the same number of particles, n, as in the respective ab initio simulations have to be used.Note that in Eq. (11a) the wall-term included in Eq. ( 10) is obsolete since the surface atoms are not considered.
These fits are based on five particularly chosen, archetypical configurations, to be discussed in the following.In the panels of Fig. 3 we display schematic sketches of these configurations of the PQP + and ClO − 4 molecules; these panels show the corresponding energy curves obtained within our classical model, with parameters based on a fitting procedure to the ab initio energy profiles.In practice we first optimize F L/M , given in Eq. (11a), involving thereby all inter-atomic parameters; their values are listed in Table 1 for the LJ and the Mie models.These parameters are then kept fixed and are used in the subsequent calculations to optimize the wall parameters via optimizing F L/M,W , specified in Eq. (11b); the emerging parameters are listed in Table 2.In panel (f) of Fig. 3 we present a visualization of the molecules PQP + and ClO − 4 , using these optimized parameters and providing information about the charge of the atomic entities via the color code.

Identifications of self-assembly scenarios
With the classical model for the PQP + and ClO − 4 molecules introduced in Subsection 2.3.1 at hand we are now ready to identify the ordered ground-state configurations of these molecules as they self-assemble on the Au-surface -immersed into an electrolyte and exposed to an electric field.While we leave a more comprehensive and systematic investigation of these selfassembly scenarios to a future publication, 40 we focus in this contribution on the technical details of our approach and on a few selected sets of external parameters (i.e, the electric field strength and the particle density).
Our overall objective is to find for our system the global minimum of the total free energy, F , at T = 0 K as a function of the positions and orientations of all molecules per unit-cell for a given value of cell volume and E z ; at T = 0 this task reduces to the minimization of the internal energy U .The minimum has to be found in a huge-dimensional parameter space, spanning the positions and orientations of the molecules and by the parameters specifying the unit cell.
For this purpose we use a memetic search algorithm which combines evolutionary search strategies (EA) [41][42][43][44][45][46][47] and local, steepest gradient descent procedures (LG): 48,49 First a total number of N EA different lattice-configurations, G = G(R N , P N , V) as defined in Eq. ( 8), is generated; this population, G N EA , is exposed to concepts of natural (or, rather, artificial) selection.At every iteration step of the EA a new configuration, i.e. an offspring, is created from existing configurations of the most recent population, via crossover and mutation operation.1][52] For an optimal load-balance we additionally spawn a master-thread on one of the mpi-processes to asynchronously distribute optimization tasks of offspring configurations among all idle mpi-processes.The relaxed configurations are gathered by this master-thread, which then decides -via a criterion primarily based on the respective internal energy of the configurations -whether the new relaxed particle arrangements are accepted or rejected.
Since the experimental observations 3,5 provide evidence of a structural organization of the molecules into supramolecular lattices, the center-of-mass coordinates of the molecules, R N , and their orientations, P N , as well as the parameters defining the unit-cell, V, (see Subsection 2.3.1 and Fig. 1 for details), are the variables which have to be optimized for the search of ground-state configurations: We minimize U (r n , V; E z ), defined in Eq. ( 10), with respect to R N , P N , and V, keeping the number of molecules N , the unit-cell volume V (with fixed slab width c z ), and the electrostatic field strength E z constant.
In more detail, we proceed as follows: (i) It is common in evolutionary algorithms to define a genome representation of the entity which is subject to optimization. 53,54In our case we represent a supramolecular lattice configuration phenotypically (rather than genotypically 54 ) by the set, G = {R N , P N , V} as defined by Eq. ( 8), i.e. the set of all COM coordinates and orientations of all molecules as well as the lattice vectors.
(ii) In the first step, two configurations (labeled henceforward by Latin indices), G i and G j , are chosen at random or via the "roulette wheel" method (see item (v) below) from the evolutionary population; 41,42,[55][56][57] this strategy favors parents of high quality hence making them more likely to be used for reproduction than "weaker" configurations, i.e.
configurations with higher energy from the evolutionary population.Then these two configurations are combined via a crossover operation (i.e., a cut-and-splice process -for more details see below) creating thereby an offspring configuration, G i⊕j , with the subscript 'i ⊕ j' emphasizing the executed crossover operation between G i and G j .
supramolecular lattice.[55][56][57] (iii) In the second step, the newly generated offspring configuration, G i⊕j , is then exposed to random mutation moves: these are either translations or rotations of single molecules, swaps of center-of-mass positions or orientations of pairs of molecules or deformations of the unit-cell, each of them with a certain probability and within preset numerical boundaries.This step of the algorithm has the purpose of exploring disconnected areas in parameter space, a feature which is indispensable in global minimization techniques.
(iv) After these two steps, and assuming that the offspring configuration, G i⊕j , does not represent a local minimum with respect to the potential energy, a local energy minimization is performed.Here we mainly rely on the "scipy" implementation of the SLSQP gradient-descent algorithm 48,49 (allowing us to define numerical boundaries and constraints on the parameters during the optimization), which minimizes the forces and torques between the molecules as well as the stress of the unit-cell.These tools are very helpful to keep the unit-cell volume fixed and to prevent re-orientations of the molecules where some of their atomic constituents would be transferred into positions outside the slab geometry, ensuring thus that z i > 0 for all atoms.Subsequently we perform several "basin-dropping" (BD) steps, where we further try to lower the energy of the configuration by applying several small random "moves" in the parameter space of the LG-optimized offspring; from the emerging configurations only the ones with low energies are accepted.This specific operation turned out to considerably improve the convergence rate of the local optimization, in particular if multiple and alternating sequences of LG and BD runs are applied.
(v) After the local search procedure the optimized offspring configuration, G i⊕j , becomes a new candidate to enter the evolutionary population, G N EA .The objective of the EA is to retain the best configurations (i.e., the energetically most favorable ones) within the population and to include only candidates with energy values better or comparable to those of the current population.In an effort to quantify the quality of the candidates, their so-called fitness is evaluated, 41,42,[54][55][56][57] for which we have used in this contribution the function: F (U ) is a monotonic function of the energy U of the candidates, whose value ranges within the interval 0 ≤ F (U ) ≤ F (U min ) = 1; U min and U max are the minimal and maximal energies appearing in the population.The selection parameter s quantifies the reproduction-rate for configurations within the population in the sense that large values of s tend to exclude configurations with low fitness from reproduction; following 57 we commonly use s = 3.The aforementioned "roulette wheel" method for choosing suitable parent configurations also relies on the fitness function (and hence the selection parameter): Assuming that the configurations within the population G N EA are sorted by their respective fitness values in descending order, F (U i ) > F (U i+1 ), the probability, f (U i ), of a configuration, G i , to be selected for reproduction is given in terms of the relative fitness: 41,55-57 N EA being the total number of configurations within the population.With a certain probability (commonly in 20% of all crossover moves) we allow reproduction between randomly chosen configurations.
(vi) Once a new configuration is accepted to enter the population another configuration has to be eliminated.The probability p(U i ) for a configuration, G i , to be replaced is given by a value which is again related to the fitness of the configuration, F (U i ), and the selection parameter s.Thus, configurations with low fitness are more likely to be eliminated.
In any case, a few of the best configurations within the population are retained in an effort to keep the so far best solutions as appropriate parent candidates for the abovementioned crossover procedures (a strategy referred to in the literature as elitism 55 ).
It should be emphasized that this strategy does not follow biological selection mechanisms, 58 where populations are replaced entirely once that new generations have been formed; however, our strategy ensures to protect the best genetic material from extinction during the entire search procedure. 55,57ii) In an effort to maintain diversity within the population an additional operation (in literature referred to as nichening 59 ), is applied: locally optimized offspring configurations will be discarded if the values of their energy is too close to the energy of any configuration currently in the population; avoiding thereby that the population is overrun by structurally identical configurations.At the same time the maintenance of genetic diversity is guaranteed.
However, this procedure alone cannot cope with 'degenerate' configurations, i.e., if structurally distinct configurations have essentially the same energy values (within the specified nichening tolerance).In our approach we allow configurations to enter the population only if their structures differ significantly from those of the competing, degenerate configurations.In order to quantify the structural difference between configurations we associate a feature vector, f i (i.e. a set of order parameters), which collects a set of order parameters pertaining to configuration G i (see S.I.Section 3.2 for details).The degree of similarity between two configurations, G i and G j , is then evaluated by taking the Euclidean distance between the corresponding feature vectors, i.e., ∆ ij = |f i − f j |; similar configurations will have a small distance, while unlike configurations will have a large distance.If ∆ ij is above a certain threshold value, the offspring configuration, G i⊕j , will not be discarded by the energy-nichening operation.
Summarizing, the complexity of the problem at hand forces us to use all the above listed advanced optimization tools, including a basin hopping memetic approach combining the heuristic nature of evolutionary strategies with deterministic local gradient descent algorithms. 41The gradient descent method deterministically evaluates every local minimum of the basin with high accuracy (which is additionally sped up by the "basin dropping" procedure) while the evolutionary search gradually adapts its population to the energetically most favorable solution, exploring the search space for the global optimum.

General remarks and system parameters
In the following we present selected results for self-assembly scenarios of PQP + and ClO − 4 molecules on an Au(111)-electrolyte-interface under the influence of an external electrostatic field, as obtained via the algorithm presented in the preceding sections.Our choice of parameters is guided by the experimentally observed molecular configurations. 3We demonstrate that our proposed strategy is indeed able to reproduce on a semi-quantitative level the experimentally observed self-assembly scenarios. 3As a consequence of the still sizable costs of the numerical calculations we leave more detailed investigations (where we systematically vary the system parameters) and a quantitative comparison of our results with the related experimental findings 3 to a future contribution. 40 be more specific we have used the following values for the (external) system parameters: • an indication for the number of molecules per unit cell is provided by the experiment: 3 we have considered unit cells containing ten, twelve, and 14 pairs of PQP + and ClO − 4 molecules.These numbers in molecules include, of course, also the related mirror molecules and correspond to 630, 636, and 742 atomic entities per unit cell, respectively (which interact via short-range and long-range potentials, which are subject to particle wall interaction and which are sensitive to an external electrostatic field); • also the actual values of the surface area A is motivated by estimates taken from experiment: 3 we have varied A within the range of 6.5 nm 2 to 12.25 nm 2 , assuming a step size of typically 0.5 nm 2 ; systems will be characterized by the surface density of the PQP + molecules, defined as σ PQP = N PQP /A, N PQP being the number of PQP molecules per unit cell; • the range of the experimentally realized values for the electrostatic field strength E z is, however, difficult to estimate since the major drop in voltage occurs near the negatively charged Au-surface and the nearby layers of cations, 34 which is not directly accessible in experiment.Therefore we have covered -at least in this first contribution -several orders of magnitude in the value for E z within a range that extends (on a logarithmic grid) from E z = −1 V/nm to E z = −10 −3 V/nm; in addition, we have also performed calculations at zero electrostatic field.
It should be mentioned that we have used in all these calculations the Mie potential within the classical model, since the related LJ model is not able to fit the ab initio data with a comparable and sufficient accuracy (see also discussion in Subsection 2.3.1).
We have covered in total approximately 176 combinations of these parameters (that is the unit-cell volume V with a constant slab width c z , the number of molecules N , and the electrostatic field strength E z ); for each of these we performed independent evolutionary searches with a population size of typically G N EA = 40 configurations.Some details about the numerical costs of our calculations can be found in S.I.Section 4.1

Results
Selected results for our numerical investigations are presented in Fig. 4 and in -on a more quantitative level -in Table 3.The actual values have been chosen in an effort to reproduce -at least on a qualitative level -the results obtained in the experimental investigations.
Indeed, the sequence of the obtained ordered ground state configurations (shown in panels From the results of our investigations (which are shown only selectively) we learn that an electrostatic field strength of E z = −0.3V/nm always leads to bilayer configurations, similar to the one shown in panel (a) of Fig. 4.This stratified bilayer configuration represents the energetically most favorable one as we vary at fixed E z the volume of the unit cell and the number of molecules within the respective ranges, specified in the preceding section; the numerical data of the related internal energy are compiled in Table 3.
As we proceed to E z = −0.1 V/nm we observe self-assembly scenarios as the ones depicted in panels (b) and (d) of Fig. 4, which correspond to self-hosts-guest configurations observed in experiment; 3 for the data presented in these panels two different values for N PQP (and hence for σ PQP ) have been considered: the mono-layer configuration shown in panel (b) has a slightly better value for the internal energy (per molecule) than the rhomboedrical bilayer configuration shown in panel (d); however, as can be seen from Table 3 the energy differences are very tiny: differences of the order of 10  3 for the numerical details).There are, however, several serious competing structures with minute energy differences at this value of the electric field strength: another open-porous structure, depicted in panel (f), with an energy penalty of less than 8.1 meV per PQP + molecule compared to case (c) and also a considerably denser configuration, depicted in panel (e), with an internal energy value worse by only 11.3 meV compared to case (c) and by 1.13 meV compared to case (f).
1][62] However, as for the results for the energies obtained in the classical model (and which are based on the LAMMPS simulations) we estimate that our results are numerically reliable down to ∼ 10 −6 eV; such minute energy differences between competing structures have not been found so far.In general we observe that the energy differences for the energetically optimal ground state configurations become smaller as the electrostatic field tends towards zero.Even though the optimization algorithm (as outlined in Section 3 has turned out to be very efficient and reliable, we observe (in particular for smaller values of the external field) that updates in the pool of the best individuals can still occur after a large number of optimization steps.

Conclusion and outlook
The prediction of supramolecular ordering of complex molecules at a metal-electrolyte interface using DFT based ab initio calculations is in view of the expected gigantic computational costs, and despite the availability of peta-scale computers, still an elusive enterprise.In this contribution we have proposed a two-stage alternative approach: (i) DFT-based ab initio simulations provide reference data for the energies introduced in a classical model for the molecules involved, where each of their atomic entities are represented by a classical, spherical particle (with respective size, energy parameters, and charges).We modelled the interaction between the atomic entities and the metallic surface by a classical, perfectly conductive, Lennard-Jones like wall potential; the electrolyte is treated as a homogeneous, dielectric medium.
The inter-particle and particle-wall parameters were obtained via the following procedure: considering archetypical configurations (involving pairs of ions and/or ions located close to the surface) DFT energies were fitted by the related energy values of the classical model.
(ii) The second step identifies the ordered ground state configurations of the molecules by minimizing the total energy of the now classical system.This optimization is based on evolutionary algorithms, which are known to operate efficiently and reliably even in high dimensional search spaces and for rugged energy surfaces.; Our new two-stage strategy overcomes the hitherto prohibitive computational cost of modelling the full system, while reproducing the key observations of a well-documented experimental system consisting of disc-shaped PQP + cations and ClO − 4 anions: as a function of increasing electric field at the metal-electrolyte interface, the molecular building blocks are seen to self-organize into an open porous a self-host-guest pattern and a stratified bilayer.Future work will focus on verifying the extent of predictive power of our method towards molecular self-assembly under electrochemical conditions, and on strategies to further streamline and reduce the computational cost of our approach, without sacrificing the reliability of the predicted results.and Energy Lancaster.
Table 1: Numerical results for the optimized LJ and Mie parameters, M (LJ) = {σ i , i } and i }, for each element i, for the results depicted in the Fig.   4.

Introduction
In the Supplementary Information (S.I.) we have collected relevant information which might considerably deteriorate the readability of the main text if it were placed there; still, the details presented in this document might be of relevance for an interested reader of the main text.For simplicity we have used in this document exactly the same section headings as in the main text; this will hopefully help to establish the appropriate association between the respective text passages. 2 The system and its representations

Convergence of theoretical calculations
The DFT-based binding energies the of PQP + and ClO − 4 ions were calculated for different numbers of gold layers in an effort to study the convergence of the results with respect to the number of layers that build up the gold surface.The values for the binding energies, obtained for the different cases are specified below the respective panels of Fig. 1; the panels themselves provide schematic plots for the different types of gold layers.In the related DFT calculations we did not consider the van der Waals scaling, i.e., we chose ω S ≡ 1 in Eq. ( 1) of the main text.
The calculated values for the binding energy obtained for two, three, and four layers of gold provide evidence that our choice for a two layer gold surface is sufficient to proceed with our calculation of the self-assembly scenarios of PQP + and ClO − 4 ions on this Au(111) surface.

Angle-axis framework expressing rigid body orientations
In the optimization procedure put forward in Sec. 3 of the main text we rely on the angle-axis framework 1,2 to express the orientation of rigid molecules within the lab-frame: closely related to the descriptions of rotations based on unit-quaternions, 3-5 we introduce a three-component angle-axis vector, P = (P 1 , P 2 , P 3 ) = θ P, which defines an angle, θ = P 2 1 + P 2 2 + P 2 3 , and a unit-vector, P, which represents the axis of the molecule; both are sufficient to describe any rotation of a rigid body in three dimensions.As discussed in Ref. 2 and following Rodrigues' rotation formula, the 3 × 3 rotation matrix T(P) associated with the angle-axis vector P is given by here I is the 3 × 3 identity matrix and P is the skew-symmetric 3 × 3 matrix obtained from the components of the vector P via In order to transform the coordinates of an atom, r m , defined in the center-of-mass system of molecule I to its lab-frame position, r m , the following transformation needs to be realized: here R I is the center-of-mass coordinate of molecule I and T(P I ) is the rotation matrix associated with the angle-axis vector P I of molecule I , as defined above.

Short-range potentials and parametrization
The short-range Mie potential, defined in Eq. ( 5) of the main text (Subsection II B 1), can be considered as a generalization of the Lennard-Jones (LJ) interaction: 6 if the exponents of the repulsive and attractive parts of the potential are chosen as γ i .In such a case the defining equation for the C ij (see Eq. ( 5) of the main text) guarantees that both the repulsive and the attractive parts of the potential maintain their respective features.
In Fig. 2 we depict the PQP + and the ClO − 4 ions using the actual values for the fitted Mie length parameters, σ i (listed in Table 1 of the main text), as van der Waals radii.In Table 1 of the S.I. we list -for comparison -also the LJ length parameters for the atomic entities of the PQP + and the ClO − 4 molecules as they are commonly used in literature.-values; these entities are colored according to the following scheme: hydrogen (white), carbon (gray), nitrogen (blue), chlorine (green), and oxygen (red).In Tables 2 and 3 we list the coordinates of all atomic units and their associated partial charges (extracted via a Bader analysis 8 ) obtained from the relaxed DFT-structures of the PQP + and ClO − 4 ions which serv as rigid molecular blueprints in the the main text.Since the PQP + cation is built up by 48 atomic units we have supplemented Table 2 by Fig. 3, indicating the labeling of the different atomic entities.In contrast, as the ClO − 4 molecule (see Table 3) is only built up by five atomic units, we have refrained in this case from a schematic presentation of the molecule.
Table 2: Atomic units building up the PQP + ion, labeled by the index i according to the schematic view of the molecule presented in Fig. 3.The positions of these entities (x i , y i and z i , and all in Å), as they were obtained in a relaxed DFT-based configuration, are given with respect to the center-of-mass of the molecule, marked in this figure by a cross.Furthermore, the respective charges of the atomic units, q i (given in units of the elementary charge e), are obtained in a Bader analysis; 8 these charges are directly transferred to our classical model of the PQP + molecule.Table 3: Atomic units building up the ClO − 4 molecule.The positions of these entities (x i , y i and z i , and all in Å), as they were obtained in a relaxed DFT-based configuration, are given with respect to the center-of-mass of the molecule, which coincides with the position of the oxygen atom.Furthermore, the respective charges of the atomic units, q i (given in units of the elementary charge e), are obtained in a Bader analysis; 8 these charges are directly transferred to our classical model of the ClO − 4 molecule.3 Identifications of self-assembly scenarios 3.1 Angle-axis gradient as calculated from the torque The software package LAMMPS 9 allows to evaluate forces and torques of rigid molecules enclosed in a simulation box.Since we are interested in the gradient of the potential energy with respect to angle-axis vectors, P, i.e., ∇ P U = ( ∂U ∂P 1 , ∂U ∂P 2 , ∂U ∂P 3 ), we present here the transformation which is required to transform a three dimensional torque T = (T x , T y , T z ) to an angle-axis gradient, ∇ P U , or in a component-wise notation ∂U ∂P i = ∂ i U , using Latin indices i = 1, 2, 3 for three dimensional vectors.

Figure 1 :
Figure 1: (color online) Schematic visualization of the experimental setup to control the pattern formation of PQP + (and ClO −4 ) molecules (structure formulas given in top right insets) close to a Au(111)-surface: two Au-layers are explicitly shown, the golden, shiny area represents the conductive Au-bulk, the black dashed line marks the surface of the electronic density which we interpret as mirror plane.The ions are immersed into an electrolyte (gray, shaded region), which is considered as an effective, homogeneous medium.In the region close to the Au surface (red to blue shaded areas) a homogeneous, electrostatic field E z (bold, colored arrow), oriented in the z-direction, features the electrostatic potential drop between the Au-surface and the reference-electrode inside the electrolyte.The colors of the atoms in the electrolyte correspond to their type, while the color of the mirror-atoms (located in the Au-bulk) specify their partial charges, quantified by the colorbar (see bottom right) in units of the electron charge, e.
Fig. 2 for different approximations for the total energy in Eq. (1).As expected, the potentials follow the screened electrostatic repulsion [ε ∞ (0)R Cl−Cl ] −1 for large distances R Cl−Cl , where ε ∞ (0) = ε r is the static relative permittivity of water.There is a slight attractive part in the potential around R Cl−Cl 5.2 Å already in the PBE potential which leads to a very shallow local minimum.Including the full dispersion contribution [ε ∞ (ω opt ) = 1] this local minimum substantially deepens and becomes the total minimum of the potential.An

Figure 2 :
Figure 2: The relative energy of two ClO − 4 anions as a function of the distance between their chlorine atoms, R Cl−Cl , where the separated anions define the energy reference; ε(ω opt ) = 1 with full van der Waals (vdW) corrections and ε(ω opt ) = 1.7 with scaled vdW corrections (see text).
(a) Tail-to-tail configuration (see inset of in panel (a) in Fig. 3): We have considered a series of ab initio structure optimizations at constant, but successively increasing nitrogennitrogen distances, r NN , in the x-direction (while keeping y NN and z NN constant) of an anti-parallel oriented pair of PQP + molecules; both cations are vertically decorated with a ClO − 4 molecule.The aromatic parts of the PQP + molecules lie flat in the xand y-directions such that their tails face each other.(b) Face-to-face configuration (see inset in panel (b) in Fig. 3): In this case we consider anti-parallel oriented, but vertically stacked PQP + molecules (both being horizontally decorated by ClO − 4 molecules) under the variation of the nitrogen-nitrogen distance, r NN , in z-direction (while now keeping x NN and y NN constant).Again, the aromatic parts of the PQP + molecules lie flat in x-and y-directions; however, and in contrast to case (a) these units face each other.(c) ClO − 4 -ClO − 4 configuration (see inset in panel (c) in Fig. 3): Here two ClO − 4 molecules are considered, varying the chlorine-chlorine x-distance, r ClCl , while keeping y ClCl and z ClCl constant.(d) Face-to-wall topped configuration (see inset in panel (d) in Fig. 3): In this case a single PQP + molecule, lying flat and parallel to the (x, y)-plane, is located above two layers of Au and is vertically decorated by a ClO − 4 molecule.The cell geometry is assumed to be periodic in the x-and y-directions and finite along the z-axis; in an effort to scan along the z-direction, we have performed a series of ab initio based structure optimizations for selected fixed values of z N , i.e., the z-position of the nitrogen in PQP + above the Au-surface.(e) Face-to-wall beside configuration (see inset in panel (e) in Fig. 3): In contrast to case (d), the PQP + cation is now horizontally decorated by the ClO − 4 anion such that both molecules are adsorbed on the Au-surface.

Figure 3 :
Figure 3: (color online) Energies as obtained in ab initio simulations (black crosses; see Subsection 2.2) and fitted data, using the classical model (involving a LJ interactions -open blue circles -or Mie interactions -open orange diamonds -for the short-range potential between the atomic entities) introduced in Subsection 2.3.1; also shown are -with labels (a) to (f) -five schematic sketches of the five archetypical configurations of the molecules (along with their relative displacements, schematically indicated via the arrows as the distances vary along the abscissa); the related energy curves are used to fit the parameters of the classical models, as outlined in the text; the labels correspond to the itemization (a) to (e) used in Subsection 2.3.2.For the configurations (d) and (e) the LJ 10-4-3 potential 33 has been used between the Au(111) surface and the molecules.In all cases the optimized inter-atomic fitting parameters have been obtained by minimizing the expressions specified in Eqs.(11a) and (11b), i.e., by minimizing difference between the ab initio based energies and the energy emerging from the classical model.In panel (f) the PQP + and the ClO − 4 molecules are drawn to scale, using the classical Mie potential for the short-range interactions introduced in Subsection 2.3.1:atomic entities are shown as transparent spheres with their diameters fixed by their respective optimized σ i -values and their Bader charges (with colored charges according to the color-bar).
(a) to (c)) clearly indicates the transition from a stratified bilayer configuration (identified at a rather strong electrostatic field strength of E z = −0.3V/nm), over a self-hosts-guest monolayer structure (obtained by reducing the field down to E z = −0.1 V/nm), and eventually to an open-porous configuration (identified at E z = −0.01);similar observations have been reported in the related experimental study.3 −4  eV correspond to values where we hit the numerical accuracy of the ab initio based energy values).Eventually, we arrive at the so-called open-porous structures, observed in experiment:3    the ground state configurations depicted in panels (c), (e), and (f) of Fig.4are evaluated at the same electrostatic field strength of E z = −0.01V/nm, assuming different values for N PQP and σ PQP ; the open-porous pattern emerging in panels (c) is the most favorable one in terms of energy per molecule (see Table

Figure 4 :
Figure 4: (color online) Results for the ground state configurations of PQP + and ClO − 4 molecules, adsorbed on a Au(111) surface under the influence of an external electrostatic field E z , as they are obtained via the numerical procedure, as specified in Section 3; calculations are based on the classical model for the molecules, involving the Mie potential (for details see Subsection 2.3.1).In the main panels configurations are shown in a periodically extended view as projections onto the (x, y)-plane and in the respective insets as projections onto the (y, z)-plane; in the main panels the respective unit cells are highlighted by thick black lines.Results are shown for different values of the number of PQP molecules, the surface density σ PQP and the electrostatic field E z : see labels in the different panels and Tab. 3 for details.

6 Acknowledgment
BH acknowledges a DOC Fellowship of the Austrian Academy of Sciences; further he gratefully acknowledges helpful discussions with Moritz Antlanger (Wien), Benjamin Rotenberg (Paris), and Martial Mazars (Paris-Sud) on several issues of this project and thanks the Freiburg Centre for Interactive Materials and Bioinspired Technologies (Universität Freiburg) for the kind hospitality, where part of the work has been carried out.BH also acknowledges the Christiana Hörbiger prize for covering travel expenses to Paris.BH and GK acknowledge financial support by E-CAM, an e-infrastructure center of excellence for software, training and consultancy in simulation and modelling funded by the EU (Proj.No. 676531).The computational results presented have been achieved [in part] using the Vienna Scientific Cluster (VSC).SS and MW acknowledge funding from Deutsche Forschungsgemeinschaft (WA 1687/10-1) and the computing time granted by the John von Neumann Institute for Computing (NIC) within project HFR08 as well as the computational resource bwUni-Cluster funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC.MW is thankful for enlightening discussion with Johannes Fiedler and Stefan Buhmann.SFLM and GK gratefully acknowledge financial support within the TU Wien funded Doctoral College BIOINTERFACE.SFLM further acknowledges support from the Austrian Science Fund (FWF, "Boron Nitride Nanomesh for Actuated Self-Assembly" (I3256-N36))

Figure 1 :
Figure 1: Schematic views of PQP + and ClO − 4 ions, located above gold surfaces built up by different numbers of layers Below each panel the respective binding energies ('b.e.') are specified.
amplitude C ij , given by Eq. (6) of the main text, becomes C ij = 4 and the Mie-potential reduces to the well known LJ form.During the fitting procedure put forward in Sec.2.3.2 of the main text, it occurred at some instances that γ

Figure 2 :
Figure 2: (Color online) Schematic representation of a PQP + and a ClO − 4 ion using the actual values of the fitted Mie length parameters σ (Mie) i for the short-range interactions introduced in Subsection 2.3.1 of the main text: atomic entities are shown as spheres with their diameters fixed by their respective optimized σ (Mie) i

Figure 3 :
Figure 3: (Color online) Schematic view of the PQP + ion where its atomic constituents and the related bonds are depicted.The spheres are colored according to the respective chemical element: hydrogen (white), carbon (gray), and nitrogen (blue).The atomic constituents are labeled by indices i = 1 to N PQP = 48; the positions of each of these entities (with respect to the center-of-mass of the molecule) are listed in Table 2.The black cross marks the center of mass, R PQP , of the PQP + molecule.
in the experiment serves as the solid substrate for adsorption.An electric field, E z , can be applied between a reference electrode located within the electrolyte and the Au surface.The PQP + and the ClO − 4 ions are first treated via DFT based ab initio calculations (see Subsection 2.2).The calculated energies are then used to fix the potential parameters of classical particles (notably their sizes, energy parameters, and charges) which represent the atomic entities of the respective ions; the interaction between the atomic entities and the Au(111) substrate is described by means of a classical wall potential (see Subsection 2.3).
2.1 The systemBoth the DFT calculations and the related classical model are based on a framework that mimics the essential features of the experimental setup, put forward (and discussed) in; 3 this framework is schematically depicted in Fig.1: PQP + and ClO − 4 ions are immersed into an electrolyte (aqueous 0.1M perchloric acid).From below, the system is confined by a Au(111) surface, which 3 (σ i in Å and i in meV).Reference values from the literature are listed in Table 1 in the supplementary information.

Table 2 :
Top row: Numerical results for LJ-length and well-depth parameters, σ wi in Å and wi in meV, between the wall and each element i = [H, C, N] and j = [O, Cl], grouped by the molecules they belong to (PQP + and ClO − 4 ), for intermolecular short-range LJ parameters listed in Table.1.Bottom row: corresponding σ wi and wi parameters for intermolecular short-range Mie-parameters also listed in Table.1.

Table 3 :
Results of evolutionary ground-state search for different electric field strengths, E z , for different unit-cell areas, A and number of PQP + molecules, N PQP , each line represents a evolutionary search.The respective structures are presented in Fig.