LB3D: A Parallel Implementation of the Lattice-Boltzmann Method for Simulation of Interacting Amphiphilic Fluids

We introduce the lattice-Boltzmann code LB3D, version 7.1. Building on a parallel program and sup- porting tools which have enabled research utilising high performance computing resources for nearly two decades, LB3D version 7 provides a subset of the research code functionality as an open source project. Here, we describe the theoretical basis of the algorithm as well as computational aspects of the implementation. The software package is validated against simulations of meso-phases resulting from self-assembly in ternary fluid mixtures comprising immiscible and amphiphilic components such as water–oil–surfactant systems. The impact of the surfactant species on the dynamics of spinodal decomposition are tested and quantitative measurement of the permeability of a body centred cubic (BCC)modelporousmediumforasimplebinarymixtureisdescribed.Single-coreperformanceandscaling behaviour of the code are reported for simulations on current supercomputer architectures. phase, ternary Simulation of fluid mixtures comprising miscible immiscible fluid components as well as amphiphilic the mesoscopic scale. Observable phenomena include self-organisation of mesoscopic complex fluid phases and fluid transport in porous media. for


Introduction
Since its advent almost 30 years ago, the lattice-Boltzmann method (LBM) has gained increasing popularity as a means for the simulation of fluid dynamics. Driving factors for this development are the relative simplicity and locality of the lattice-Boltzmann algorithm. The latter allows for straightforward parallelisation of the method and its application in high performance computing. Today, a wide range of LBM implementations is available ranging from specialised academic packages such as Ludwig for the simulation of complex fluids [1] and HemeLB for the simulation of flow in blood vessels [2] to very versatile open-source projects such as OpenLB [3] and Palabos [4] as well as commercial applications such as PowerFlow [5] and XFlow [6].
The lattice-Boltzmann (LB) code LB3D provides a number of algorithms and scripts designed for the simulation of binary and ternary amphiphilic complex fluid mixtures in bulk and complex geometries using high performance computing environments. As in the case of Ludwig, LB3D focuses on the simulation of complex fluids. While Ludwig implements a top-down model where a free energy has to be provided by the user, LB3D is a bottom-up code in which interactions are specified between particle distributions. LB3D is the only package which handles amphiphilic fluids in such a manner. Originally implemented in 1999, LB3D has since been in constant use and development. This document starts out with an overview of the history of the development and scientific applications of the code. Following this, the paper considers LB3D in version 7.1, the latest release of a re-factored open-source instance of the program which has been development since 2013 and available in this version under the BSD 3-Clause license.
Following work on lattice-gas models [7,8], the development of LB3D started out in the group of Coveney then at Queen Mary University of London, as a summer project of then third year University of Oxford undergraduate student Chin who implemented the amphiphilic fluid model with assistance of Boghosian and Nekovee [9,10]. During the first decade of development and application, the amphiphilic fluid model was employed for extensive research on the properties of complex fluids. Publications on computational science developments in this period report aspects of distributed computing as an integral part of the design, including real time visualisation and computational steering [11][12][13][14][15]. Scientific contributions included the study of the spinodal decomposition and emulsion formation of binary fluid mixtures [16][17][18][19] as well as the self-organisation of mesoscopic phases in general and the cubic gyroid phase in particular [20][21][22]. Flow of complex fluids in porous media and under shear was investigated [23].
The utilisation of substantial resources like the US TeraGrid and UK HPC facilities in ground breaking federation of national grid infrastructures [12][13][14] and the highest performance class of national supercomputers for capability computing [24,25], allowed scientists working with LB3D large-scale numerical investigation of rheological properties of the gyroid phase [26][27][28][29]. Later work explored the parameters of ternary amphiphilic fluids and their flow properties in detail [30][31][32][33][34].
Subsequent research with version 6 of LB3D, which remains unreleased, has focused on fluid surface interaction and coupling of particle models. Version 7.1 of LB3D, however, which is released in conjunction with this paper, includes the amphiphilic fluid model and has been restructured and optimised to ease future development. First steps have been taken to support hybrid parallel programming models in order to ensure compatibility with projected next generation supercomputer architectures on the path to the exascale.
A newly integrated Python interface for the configuration of initial conditions and interactive execution improves ease of use. Moreover, improved in-code documentation and extension of the user manual are intended to encourage new users and developers to harness the potential of LB3D as well to as enhance it further.
The paper has the following structure. In Section 2 we describe the lattice-Boltzmann method (LBM) used. We do not provide the derivation of this model, but show the key points necessary for understanding the implementation; references are provided to the relevant original papers. In Section 3 we discuss implementation details such as memory organisation, communication and scripting capabilities. In Section 4 we discuss tests used to verify and validate the implementation. We also provide performance benchmarks for the code and compare performance and scaling with other LBM codes. The section closes with a conclusion and outlook on future development directions.

Method
In this section we give an overview of the single phase lattice-Boltzmann method as well as its multiphase extension and boundary conditions as implemented in LB3D.

Single phase lattice-Boltzmann
The lattice-Boltzmann equation is a popular method to simulate flows and hydrodynamic interactions in incompressible fluids [35][36][37]. It is a mesoscopic approach where the fluid is represented by populations that evolve according to a fully discrete analogue of the Boltzmann kinetic equation [38,39]. We write the lattice-Boltzmann equation in the following form where the pre-collisional populations f σ ,i and post-collisional populations f * σ ,i are related through the streaming step Eq. (1) describes a collision step and neglects terms of order O(h 2 /τ ) compared to the more common second-order accurate LBE [38][39][40][41]; however, this does not affect the density and flow field which are of primary interest. The populations f σ ,i (x, t) and f * σ ,i (x, t) represent number densities of a fluid species σ at discrete grid points x and discrete time t, moving with discrete velocities c i . LB3D uses the so-called D3Q19 lattice, a three-dimensional Cartesian lattice with 19 velocities [42] obtained from a projected face centred hyper-cubic (FCHC) lattice. The lattice velocities c i can be derived systematically as the abscissae of a Gauss-Hermite quadrature in velocity space [40,43,44]. The D3Q19 lattice is illustrated in Fig. 1 along with the (arbitrary) numbering of the velocities c i implemented in LB3D. Eq. (1) describes the lattice-Bhatnagar-Gross-Krook (LBGK) collision model where the populations relax towards local equilibrium on a single time scale τ σ [42,45]. The equilibrium distribution f eq i (n, u) is a Hermite expansion of the Maxwell-Boltzmann distribution [38,46]. Here we choose an expansion including cubic terms in the velocity where c s denotes the speed of sound, w i are lattice weights, and Q iαβ and W iαβγ are the second and third rank isotropic tensors Note that the Einstein convention for summation over Greek indices is implied. For the D3Q19 lattice, the speed of sound is c s = 1/ √ 3 and the lattice weights w i are [42] w i = 1/3, Hydrodynamic fields are obtained from the populations by calculating velocity moments, e.g., density and momentum density are given by where m σ is the molecular mass. In the presence of a force density F σ (x, t), the lattice momentum densityp σ (x, t) has to be corrected for a discretisation effect, such that the hydrodynamic momentum density of the fluid is [36,40,41,47] Here the sign stems from the fact that we calculate the moments from the post-collisional distributions. Finally, the hydrodynamic flow field is which is used in the lattice-Boltzmann equation (1). Generally the force density F σ (x, t) may accommodate any kind of contribution. In particular, F σ (x, t) may include both intermolecular forces (Shan-Chen forces in our model, cf. Section 2.2) and external forces such as gravity or an artificial Kolmogorov scaled force [48].

Shan-Chen multiphase model
The Shan-Chen approach [49] provides a straightforward way to model multi-component (e.g. water and oil) and/or multi-phase fluids (e.g. water and water vapour). The single phase model is extended by the so-called pseudo-potential method to include multiphase interactions [50,51]. Each fluid component σ is governed by the lattice-Boltzmann equation (1).
The total fluid density is simply It is assumed that there is a common velocityũ for all components.
In the absence of forces, it can be shown that total momentum is conserved if the common velocity is chosen to be [49] u( Note that the common velocityũ(x, t) implies an effective momentum exchange between the species even when no explicit forces are prescribed.
In addition, each fluid component may be subject to an explicit force density F σ (x, t). Similar to the single phase case, we need to redefine the hydrodynamic momentum to account for discretisation effects. The hydrodynamic flow field for the multiphase fluid is then given by [49] u( ) .
If the fluid as a whole is subject to a force density F(x, t), it has to be distributed to each fluid component proportionally to the mass fraction [52]: Eq. (14) ensures that the total force density obeys F(x, t) = ∑ σ F σ (x, t) and that the acceleration for each component is iden- The basic idea of the Shan-Chen approach for multiphase fluids is to introduce coupling forces which are non-local and depend on density gradients. Here, ψ σ (x, t) represents an effective mass of component σ which is realised as a function of component density ρ σ (x, t) [49] where ρ 0 is a reference density. The Green's function G σ σ ′ (x, x ′ ) must be symmetric in σ and x to ensure global momentum con- We choose G σ σ ′ (x, x ′ ) to be short-ranged and allow interaction only between neighbouring lattice sites [52] We can now rewrite Eq. (15) as and Eq. (18) as Note that g σ σ ′ is the coupling strength of components σ and σ ′ .
The number of components is not limited by the Shan-Chen model. Also note that non-zero values of g σ σ allow for self-interaction of component σ . The pseudo-potential forces introduce a densitygradient-dependent term into the equation of states which ensures phase separation for a given critical coupling strength and overall density. Masses are determined up to a constant allowing the use of unit mass where convenient. Mass contrasts of components imply a contrast in dynamic viscosity and warrant the introduction of the effective common equilibrium velocity (12). No differences in mass have been used so far in our published work while, in the low Mach number limit, the common velocity equals the single component bulk velocity.

Amphiphilic fluid components
LB3D supports fluid mixtures of up to three components, one of which is amphiphilic (oil or red, water or blue and surfactant or green). Amphiphilic properties of the surfactant component are modelled by introduction of a dipole field d(x, t) representing the average molecule orientation at each lattice site [10]. The orientation of the dipole vector d is allowed to vary continuously. It is advected with the fluid and thus propagates according to where d * denotes the post-collisional dipole vector which is relaxed through a BGK process on a single timescale The equilibrium orientation is derived from a mean field approach. The mean dipole field in the surrounding fluid can be written as where the contribution from ordinary species is Here q σ is a colour charge for the ordinary species. The contribution from the amphiphilic species can be written as where the traceless second rank tensor D i is given by The equilibrium dipole configuration can then be derived from a Boltzmann distribution which gives and d 0 is a parameter representing the intrinsic dipole strength of the amphiphiles.
With an amphiphilic species present, additional force components are introduced between the species. In addition to the density dependence of the binary model, the resulting forces do not only depend on the fluid densities alone, but on the dipole moment (with the relative orientation to boundaries and neighbouring dipoles) as well. The force on oil and water components σ is now where F int σ is given in Eq. (19), and the additional force due to the amphiphiles is Similarly, the force on the amphiphilic components is where F ασ is the reaction force to F σ α and has the form Finally, the force acting between surfactant components on neighbouring sites is given by The parameters g ασ , g σ α , and g αα in Eqs. (29), (31), and (32) are the coupling strength between water and oil components σ and amphiphilic component α, respectively. Details on the derivation of the force terms can be found in [10,51]. These forces are added in the algorithm in a manner analogous to the pseudo-potential force inducing the phase transition in the binary multi-component system.

Boundary conditions
In the present lattice-Boltzmann model, each lattice site contains either fluid components or an obstacle, e.g., a boundary wall. LB3D v7.1 supports a number of different boundary closures for the unknown pre-collisional populations on fluid nodes that are adjacent to one or more boundary nodes. In the following we describe the simple bounce-back rule and on-site rules for Dirichlet and Neumann boundary conditions that enable inlet and outlet boundary conditions. Furthermore, we describe wetting boundary conditions for surfaces that have specific affinities with respect to different fluid species.

Bounce-back boundary conditions
The bounce-back boundary condition was originally introduced for lattice-gas models and poses a simple way to implement a noslip boundary condition located approximately halfway between a boundary node and an adjacent fluid site [53][54][55]. Simple midgrid boundary conditions achieve zero velocity on a link connecting fluid and an obstacle by inverting the velocity of the impinging populations. This reflection can be written as a modified streaming step, cf. Eq. (2), where x + hc i is a solid site, and the indexī indicates the inverse velocity of i, i.e., c¯i = −c i . For the simple bounce-back rule, the effective location of the no-slip boundary condition is approximately halfway between fluid and solid. A detailed analysis shows that the exact location depends on the collision model employed, and for the LBGK model the location is viscosity dependent [56][57][58].

On-site velocity boundary conditions
It is often desirable to specify the exact position where Dirichlet or Neumann boundary conditions are to be satisfied independently of other simulation parameters. To this end, on-site boundary conditions allow us to specify the velocity or fluxes on the boundary and lead to a fully local closure relation. This approach was originally suggested by Zou and He [59] for D2Q9 and D3Q15 models, and later extended to D3Q19 lattices [60][61][62].
For on-site Neumann (or pressure) boundary conditions, one uses Eqs. (7) and (8) to calculate the unknown velocity component on the boundary from the two known ones and a specified density ρ 0 . Similarly, for on-site Dirichlet (or flux) boundary conditions, one can calculate the density ρ from the two known velocity components and the third specified component. In order to determine the unknown populations, one applies the bounce-back condition to the non-equilibrium part of the populations, e.g., for a boundary normal to the xy-plane, which leads to an expression for the unknown population The equation system for the remaining unknown populations is over-determined, which can be remedied by introducing additional transverse momentum corrections that reflect the stresses introduced by the boundary conditions. The equation system for the remaining populations then reads where the transverse momentum corrections are given by These expressions close the boundary equations. More details and the generalisation to arbitrary flow directions can be found in [62].

Wetting boundary conditions
Specific surface interactions can be realised by introducing a pseudo-density ψ W at obstacle sites x ′ , which is used to calculate a pseudo-potential interaction between the fluid and components and the surface This strategy was first introduced by Martys and Chen [52]. The wetting boundary conditions are usually augmented with bounceback or on-site boundary conditions.

Implementation
The core of the lattice-Boltzmann code LB3D is written in FOR-TRAN 90/95. In addition, version 7.1 provides conduits written in C to facilitate a Python interface that supports scripting during the initialisation stage. The core code is parallelised using MPI-2 for distributed memory and OpenMP for shared memory. It can be compiled and run using arbitrary combinations of MPI and OpenMP threads. LB3D makes use of several external libraries including Python, MPI, OpenMP, HDF5, and XDR. The testing framework and supplementary tools are written in Python.
The execution flow of LB3D is structured in self-contained initialisation, simulation, and finalisation stages. The corresponding subroutine call structure is illustrated in Fig. 2. In the initialisation stage, the code reads an input file describing the simulation setup (number of components, initial conditions, simulation time, output options) as well as the physical parameters such as τ σ , g σ σ ′ , etc. The input file can also specify Python scripts to be executed during initialisation in order to introduce boundary conditions and to modify the initial fluid state (cf. Section 3.7). The simulation stage evaluates the lattice-Boltzmann equation until the specified simulation time is reached. Each time step involves several substeps for the lattice-Boltzmann algorithm, data caching, parallel communication, and file I/O. After completion of the simulation stage, the finalisation stage shuts down the MPI environment, deallocates memory, and polls and prints the execution timers before the program terminates successfully.
We now outline the essential elements of the implementation in more detail which serves to provide a development guide for future extensions.

Data structures
The core data structure of the implementation is a fivedimensional array of fluid populations f σ ,i (x, y, z) for the current time step with index order i, x, y, z and σ . The layout is illustrated in Fig. 3. FORTRAN column major order implies indices varying fastest from left to right. We evaluated different memory layouts for the populations array and found this order to be most effective for the present algorithm implementation. A five-dimensional array is used to store forces F σ (x, y, z, t) with index order α, x, y, z, σ (α denoting the vector component). Information on solid boundary conditions is stored in a three-dimensional array of obstacle flags with indices x, y, z. In addition, a four-dimensional array for cached fluid densities f σ (x, y, z) with indices x, y, z and σ is used.

System initialisation
System initialisation can be performed either from scratch using a set of input options or by re-initialisation from checkpoint files in HDF5 format. The initialisation subroutine first reads any command line arguments including the path to a set of input files. The parameter input file is expected in .ini file format and processed by the module fini_parser. It is structured in sections for the simulation environment, common simulation parameters, physics parameters and output properties. Settings for special boundary conditions and checkpoints are given in separate sections. A unique feature of LB3D is the ability to adjust the system initialisation by providing a Python script which can modify density values for the respective fluid components as well as boundary geometries and wetting interaction parameters. Example input files are given in Figs. 4 and 5. Detailed explanations of all available input parameters can be found in the LB3D user manual.
During initialisation, the MPI parallel environment is set up, MPI and HDF5 data structures are instantiated, the random number generators are seeded, and general variable defaults are set. The simulation parameters are then set to the values specified in the input file, where a set of compatibility checks is executed to catch conflicting options and parametrisations. Subsequently, the lattice data structures are allocated and initialised with a velocity field and obstacle site distribution, where the LB3D Python interface is invoked to execute the specified scripts. Moreover, additional properties such as specific interaction forces as well as in-and outflow boundary conditions are initialised. If restoration from a checkpoint is requested, simulation parameters and lattice data are read from the specified HDF5 files. Once the lattice data has been initialised, any derived properties are computed and an initial MPI communication is performed. The initialisation concludes by writing data output for time step zero and performing a sanity check, i.e. validating the numerical stability of the physical properties of the system.

Algorithmic subroutines
In order to evaluate the lattice-Boltzmann equation (1), the following sequence of subroutine calls is implemented. The execution of these steps is performed in the main time loop subroutine of LB3D as illustrated in Fig. 6. The computation steps are interspersed with communication steps as required by the parallelisation, cf. Fig. 7. In addition, optional steps can be executed for input/output, checkpointing, and sanity checks. In contrast to other lattice Boltzmann implementations, the algorithm starts by performing communication followed by an advection step. Subsequently, calculation and caching of data is required by boundary conditions and interaction forces and is performed before the collision step.
In the current implementation the algorithm performs between 5 and 6 complete spatial loops -one for each advection and collision process, one for the calculation of pseudo-potential forces and two more to pre-calculate and cache momentum and forces. A last optional loop is required for the calculation of dipolar interactions in the ternary model. Dividing the algorithm in this fashion provides a clear structure for the integration of future features. Here the focus is not on optimal performance but rather on readability and extensibility.

Parallelisation
Simulations are run on a three-dimensional rectangular lattice of size tnx × tny × tnz with periodic boundary conditions as default. In order to run parallel jobs, the lattice slab is divided on program start-up into blocks of equal size nx × ny × nz. The exact lattice subdivision may be either specified by the user or chosen automatically by the MPI implementation. The number of lattice sites along each axis must be divisible by the number of processes used along that axis.
The spatial dimensions nx, ny, nz of all arrays on each CPU are extended by a halo region which contains copies of lattice blocks from neighbouring CPUs. The algorithms implemented in the current code require a halo depth of one lattice site, but it can be set to any value for more complicated algorithms. MPI-2 array data types are instantiated for the halo regions such that a single pair of MPI send and receive calls can be used for communication between MPI processes.
Shared memory parallelisation using OpenMP is enabled by wrapping the various spatial loops responsible for the calculation of the respective algorithm steps. Advection is performed using a single lattice buffering adjacent lattice site information for one z-layer at the time. Creation of the buffer is multi-threaded separately from the subsequent advection calculation.

Data output
LB3D writes physical properties of fluids to output files at specified time intervals. The writeable properties include the densities of components and the flow velocity, and the frequency at which data is written can be specified in the input file. Generally, no outputs are written until time step n_sci_start, after which data is written every n_sci_OUTPUT time steps, where OUTPUT stands for a specific property. A value of n_sci_OUTPUT = 0 disables the output of the respective property. The output file format is the Hierarchical Data Format (HDF) [63] which facilitates portable, platform-independent data. HDF provides the possibility to add meta data to the raw data files and LB3D makes use of this feature by adding specific information on parameters as detailed in the manual [64].
LB3D can be instructed to produce checkpoint files at specified time intervals. These files can be used to restart the simulation from a given configuration. When the simulation is being restarted from a checkpoint, it is possible to override any of the simulation parameters and even re-apply initialisation scripts. Therefore checkpoints may be used to create non-trivial initial conditions and ad hoc steering of simulations. Both output and checkpoint files are written in HDF5 [65].  --with-lb3d-py-path =/PATH/TO/lb3d/py $ make By default LB3D builds the binary lb3d in the /lb3d/src directory. Evoking make install by default installs to /usr/local/lb3d. If configure is provided with an install prefix, the binary is copied to /PREFIX/bin/lb3d and Python bindings to /PREFIX/share/lb3d/py. More details on the installation process can be found in the user manual [64].

Workflow
In order to illustrate the typical workflow of a LB3D simulation, we use the porous media example, the results of which are presented in Section 4. The input file is listed in Fig. 4 and the Python script generating the geometry is listed in Fig. 5. The input file specifies simulation parameters like the number of fluid species, system size, and number of time steps to simulate. Further initial parameters can be included through Python bindings. The Python script contains functions to set and unset obstacle and fluid parameters on individual lattice sites or specific geometries. It is also possible to load information from an XDR file which defines applicable boundary conditions for each lattice site (details can be found in the user manual [64]).  6. Algorithm execution in the main time loop of LB3D. As the initialisation step includes a first collision calculation to enforce well-formed initial distributions, the time-loop starts with a communication step and executes advection first. To avoid re-calculation of frequently used properties locally conserved density and momentum as well as interaction forces are cached. Symmetry in the interaction forces permits halving of the computed force components. Every time step or interval can be configured to include checks of system stability and validity, termed sanity checks, as well as scientific and checkpoint output. All file paths are specified relative to the current execution directory. The actual simulation is started by invoking mpirun -n NUMBER_OF_CORES ./lb3d -f input.in where the mpirun command is dependent on the specific system configuration and MPI library. By default LB3D writes output to the directory ./output/ relative to the execution directory. Data analysis can be performed using a range of available HDF5 tools for different languages including Python, C/C++, and FORTRAN. For example, the package h5utils provides a number of converters, i.a., to ASCII and VTK formats for visualisation. More details on the available configuration and evaluation options can be found in the user manual [64].

Case studies
The development of LB3D version 7.1 involved substantial refactoring of an earlier version of the code. Besides integrating new features and cleaning up the input/output routines, the parallelisation structure was extended to make use of shared memory strategies within nodes that comprise more and more cores. In this section we report the results of physics validation as well as performance benchmarks. This is done to confirm the accuracy of the added and altered functionality and probe the parametrisation of the hybrid parallel approach on the current UK National Supercomputer ARCHER.

Validation
To ensure that the re-factoring has preserved all model properties, exemplary mesophase simulations originally performed with prior versions of LB3D have been reproduced. Due to the deterministic nature of the lattice Boltzmann method and the absence of random initial and boundary conditions the obtained results reproduce values calculated by earlier versions exactly. Here we describe simulation parametrisation and results for three test cases. For the spinodal decomposition of an amphiphilic mixture we report the time behaviour of the average fluid domain size as a function of time and evaluate the power law observed. In a second test case we make use of the Python bindings and measure permeability in a model porous medium. Finally we investigate the qualitative formation of amphiphilic mesophases.

Spinodal decomposition of an amphiphilic mixture
Spinodal decomposition is the process of rapid demixing of immiscible fluids, e.g. water and oil. The phase separation is governed by surface tension effects and characteristically exhibits an exponential increase in domain volumes of the demixing components [16,66]. The addition of an amphiphilic component reduces the surface tension between the components and slows the spinodal decomposition process down. The resulting process still behaves exponentially, but the exponent of the observed dynamics of the domain sizes is reduced [20,[67][68][69].
In order to evaluate the dynamics of the ternary model implemented in LB3D, we validate simulation results published by some of us previously [20,69]

Permeability in a model porous medium
Another area of application of LB3D is the modelling of the flow of fluid mixtures over solid surfaces and in porous media. A phenomenological property to classify porous media is the permeability κ which quantifies the relation of a pressure gradient applied on a medium and the resulting flow through it. For a regular lattice of spheres the permeability is accessible semi-analytically [70,71] and has been found to follow where R is the sphere radius, χ is the ratio between radius and separation, assumed to equal one in our case as spheres created by the initialisation algorithm do touch (see Fig. 9). Finally, ζ is a dimensionless drag coefficient which can be approximated by a series expansion in χ. With an approximate value of χ · ζ ≈     Fig. 10 shows the error of permeability measurements as a function of the simulated fluid viscosity. The deviation is a well known artefact of LBGK models when combined with bounce-back boundary conditions. The effect can be reduced by increasing the resolution as it is directly proportional to the surface to volume ratio of the simulation. Another way to correct the error is the integration of a multi-relaxation time collision scheme within LB3D, as is planned in the future.
Simulations were performed on a lattice of 256 × 256 × 256 sites. Simulation domains are initialised with fluid of homogeneous densities of ρ r,b for the two components and a density ρ s for the amphiphilic component respectively. The subsequent selfassembly of the mesophases is driven by the coupling forces, cf. Eq. (15).
In pressure jump experiments the density of the fluid components is increased, which is equivalent to an increase in overall system pressure. We observe the formation of primitive and hexagonal mesophases, respectively, for a reduction in the surfactant self-interaction strength. In our simulations the parameter was changed from g ss = −0.0005 m/(ah) 2 to g ss = −0.001 m/(ah) 2 .
Systems at lower and higher pressure exhibit cubic diamond and gyroid mesophases, respectively. Fig. 11 shows a volume rendering of the observed morphologies. The images display the zero isosurface of the order parameter defining the boundary of immiscible components (cf. [31] for more details).

Performance
During the refactoring of LB3D, we have focused on creating more clearly structured and well documented code. Moreover, the new implementation exhibits a significantly improved single core performance. While OpenMP shared memory parallelisation has been implemented in addition to the prior MPI parallelisation, we focus here on the single core performance and scaling of the MPI based parallelisation. Tests of a hybrid parallelisation approach, naïvely introducing OpenMP wrappers around the main loops, result in loss of performance in all cases. More sophisticated strategies introducing nested loops and explicit OpenMP parametrisation allow for performance gains of the order of 10 percent compared to simple MPI. We have, however, found the successful parametrisations to not only vary with the machine, but to depend on the chosen problem as well. While this may change for explicit shared memory machines and future heterogeneous exa-scale configurations, in the current publication the systematic investigation of hybrid parallel performance has been omitted. Table 2 compares the single core performance of LB3D v7.1 against LB3D v7.0 as measured for simulations of domains of 256 × 256 × 256 lattice sites on 22 nodes of ARCHER comprised of two 12-core CPUs each. The speedup observed between the versions measured in millions of lattice site updates per second (MSUPs) per core is approximately three-fold. Acceleration of computation has been achieved by optimising communication, memory structure and algorithms throughout the code. Most prominently, component information has been separated in memory. Pre-calculation and caching steps have reduced redundant calculations. The change in memory layout has furthermore facilitated optimised MPI array definitions and communication patterns. Fig. 12 illustrates the strong scaling behaviour for geometries of size 1024 × 1024 × 1536 lattice sites and calculations considering single phase flow, binary mixtures or amphiphilic fluid mixtures.
Simulating a system of pure fluid (mixture) for 10,000 time steps on 256, 512, 1024 and 2048 nodes of ARCHER, we observe excellent strong scaling behaviour close to ideal speedup. This is especially true for the more computationally demanding multi-component cases. CPU counts are multiples of 24 corresponding to the ARCHER node size. The scaling remains close to linear for many tens of thousands of cores and for the selected system domain size only just starts to deteriorate on 49,152 cores. The single core performance is comparable to the values given in Table 2.  . 11. Volume rendering of simulation snapshots of zero order parameter iso surfaces for the respective mesophases obtained in simulation. Simulations were run in a volume of 32 3 lattice sites for 30,000 time steps. The order parameter is given by the difference of local densities of immiscible components. The density and interaction parameters control the pressure in the model (see [31]).  Table 2.

Summary
The LB3D simulation code has enabled a substantial amount of scientific research in diverse contexts. Starting from the investigation of ternary amphiphilic fluid mixtures the code has been extended to include a variety of boundary effects and coupled models. Cleaning and refactoring of the code have led to improved readability and extensibility. To ensure scientific accuracy and model applicability of the refactored version, we have reproduced earlier findings of ternary amphiphilic mixtures on the current UK National Supercomputer. While maintaining flexibility and extensibility, the single-core performance has more than doubled. Exploiting the excellent parallelisation behaviour of the lattice-Boltzmann algorithm, the strong scaling behaviour of the code remains close to linear. With LB3D version 7.1, we release an open source version of the code that provides functionality for simulation of amphiphilic mixtures in complex geometries. The documentation and new Python scripting options enhance the ease of use of existing features, while at the same time facilitating continued development and extension. By releasing this code under a BSD 3-clause license, we hope to inspire independent developers to contribute new features to LB3D.