Global Uncertainty-Sensitivity Analysis on Mechanistic Kinetic Models: From Model Assessment to Theory-Driven Design of Nanoparticles

The optimal design of nanoparticle synthesis protocols is often achieved via one-at-a-time experimental designs. Aside from covering a limited space for the possible input conditions, these methods neglect possible interaction between different combinations of input factors. This is where mechanistic models embracing various possibilities find importance. By performing global uncertainty/sensitivity analysis (UA/SA), one can map out the various outcomes of the process vs. different combinations of operating conditions. Moreover, UA/SA allows for the assessment of the model behavior, an inevitable step in the theoretical understanding of a process. Recently, we developed a coupled thermodynamic-kinetic framework in the form of population balance modelling in order to describe the precipitation of calcium-silicate-hydrate. Besides its relevance in the construction industry, this inorganic nanomaterial offers potential applications in biomedicine, environmental remediation, and catalysis most notably due to ample specific surface area that can be achieved by carefully tuning the synthesis conditions. Here, we apply a global UA/SA to an improved version of our computational model in order to understand the effect of variations in the model parameters and experimental conditions (induced by either uncertainty or tunability) on the properties of the product. With the specific surface area of particles as an example, we show that UA/SA identifies the factors whose control would allow a fine-tuning of the desired properties. This way, we can rationalize the proper synthesis protocol before any further attempt to optimize the experimental procedure. This approach is general and can be transferred to other nanoparticle synthesis schemes as well.


Introduction
Calcium-silicate-hydrate (CaO-SiO2-H2O or C-S-H for short) is the most important phase formed during the hydration of cementitious materials [1]. Aside from its key role in the construction industry [1], C-S-H has recently found diverse applications in environmental clean-up [2][3][4], biomedicine [5][6][7], and even catalysis [8,9]. In the biomedical field, for instance, it offers good bioactivity, biocompatibility, and biodegradability [6,7]. Besides these characteristics, the inherent nanostructured construct of C-S-H, provides high surface areas, and its relatively low-cost preparation warrants further research for applications where interfaces play a major role [4,7].
Recently, we developed a formalism to model the nucleation and growth of C-S-H using a population balance equation (PBE) framework [10]. The theoretical framework was fitted to the experimental data collected on the precipitation of a synthetic C-S-H with Ca:Si = 2, prepared under controlled conditions resembling the process of cement hydration (in terms of temporal supersaturation ratio) [10,11]. We estimated the optimal values for the unknown model parameters and explained procedures for the extraction of various output information from the simulation. Additionally, we assessed the merit of our computations by comparing the optimal physical parameters and various outputs against the literature data, wherever available [10].
Here, we build on our previous work and implement two pivotal refinements to improve the simulation speed, robustness, and generality. Specifically, we replace our ad hoc equilibrium solver in the previous work with PHREEQC, a popular freely-available tool widely used for thermodynamic speciation calculations [12]. This allows for a more straightforward adaptation to new precipitation scenarios and opens up the possibility of utilizing the large thermodynamic databases already included within the software [12].
Additionally, we employ the direct quadrature method of moments (DQMOM) for the solution of the PBE, which has several advantages over our previously used QMOM approach [13][14][15]. We give a detailed derivation of DQMOM and relevant subtleties critical to the robust and reliable performance of the method.
Having this improved simulation framework, we assess the behavior of the C-S-H precipitation model by applying global uncertainty/sensitivity analysis (UA/SA) with different model parameters as the source of uncertainty. The propagation of uncertainty into different model outputs such as crystallite dimensions, particle edge length, specific surface areas, and precipitation yield is examined thoroughly using three different methods, namely, PAWN (derived from the developers names, Pianosi and Wagener) [16,17], Elementary Effect Test [18,19], and variance-based sensitivity analysis (VBSA) [19,20]. The application of these complementary methods enables unambiguous appraisal of the model performance, which in turn facilitates complexity reduction, i.e., by fixing uninfluential parameters to reasonable values. This also allows for a more robust calibration during the regression to experimental data [19,21]. Additional implications of such global UA/SA concerns the robustness of model predictions in response to different sources of uncertainty/variability in the model parameters [19,21]. Besides these outcomes, our work provides the first example of UA/SA on a kinetic model of precipitation, and thus can serve as a benchmark for future studies in this direction.
Once we obtained a comprehensive understanding about the model structure, we implement another UA/SA on a model of reduced complexity and incorporate uncertainty from different experimental conditions. The goal is to aid the design of nanoparticulate products by gaining insight from computer experiments. Often, optimal operating conditions for a synthesis protocol are found using one-at-a-time (OAT) experimental designs [7,19]. Besides covering a limited space of the possible input conditions, this practice also overlooks the probable interaction effects between various combinations of the inputs. The latter could produce drastically different behaviour compared to when only one input parameter is changed at a time [19]. A global UA/SA circumvents these limitations and offers an inexpensive alternative to examine a wide range of operating conditions varied in an all-at-a-time fashion [19,21]. With this approach, we propose practical recommendations in order to improve the properties of the final product.
As an example, we demonstrate the key influence of reagent addition rate, in a well-mixed semi-batch reactor, on the accessible specific surface area of the final product. This can be further compounded with adjustments in the solution chemistry to obtain a product with distinctly higher specific surface area.

Computational Details
The overall computational workflow for PBE modelling of precipitation processes is explained in detail in our recent articles [10,22,23] and other literature [14,[24][25][26]. Therefore, in this section we will focus on the developments brought forward by the current work. First, we will review the essential characteristics of the precipitation system to be studied, and enumerate the limitations of our previous work. Then, we will briefly explain the application of DQMOM to solve the PBE model for a well-mixed system with crystallite size as the internal coordinate (detailed derivations can be found in the Supporting Information (SI) Section 1). Next, we will describe the coupling to PHREEQC. After that, we summarize the overall simulation workflow including the improvements introduced in this work. Finally, we will present the underlying idea behind different sensitivity measures employed to assess the input-output relationships in the overall coupled thermodynamic-kinetic framework. Additional implementation details are presented in the SI Section 1 and 2.

Description of Precipitation System and Limitations of Our Previous Work
The precipitation system studied here is the formation of synthetic C-S-H with Ca:Si ratio 2 [10,11]. The precipitate is composed of nanofoils that are a few nm thick and in the order of 100 nm wide (Figure 1(a) and (b)). These two-dimensional nanoparticles are made up of highly defective crystallites, of a thickness typically below 10 nm, which are arranged with liquid crystalline-type orientational order (Figure 1 (b) and (c)). Recently, we proposed a pathway (Figure 1 (d)) for the formation of this nanoparticulate material and tested that by regressing the experimental kinetic data using a computational model based on population balance equation (PBE) modelling. The framework included primary nucleation, true catalytic secondary nucleation, and molecular growth, and accounted for the time evolution of the precipitation driving force by applying thermodynamic equilibrium to the reactions among the aqueous species (local equilibrium assumption [10,14]). From a mathematical perspective, the framework consisted of a set of ordinary differential equations (ODEs) written for the dynamic evolution in the moments of crystallite size distribution (the PBE part) and elemental amounts in the system (the mass balances). Our computational model showed very good plausibility in terms of the goodness of fit, the consistency of the regressed model parameters with respect to the knowledge from the literature, and reasonable mechanistic and kinetic predictions. This includes, for instance, the predicted size of crystallites and particles which were compatible with previous experimental and theoretical observations, or the invariably undersaturated state of solution with respect to portlandite similar to experimental observations [10].
In the computational framework mentioned above, the PBE set was solved using the quadrature method of moments (QMOM) [10,27]. Although QMOM is a reliable and popular method for this task [15,24,28], a widely used variation called direct quadrature method of moments (DQMOM) offers several advantages.
For instance, QMOM requires a moment inversion algorithm at every time step to find the discrete approximation to the size distribution. This is often an ill-conditioned problem and reduces the computational efficiency of the method drastically [14,28,29]. On the contrary, DQMOM directly follows the discrete abscissas and weights approximating the size distribution, and employs commonly used numerical methods such as matrix inversion instead of moment inversion algorithms. This makes the method more convergent and extremely fast [13,14,28], to an extent that Haderlein et al. [14] have employed it for the development of "flow-sheeting" software tools. Other benefits of DQMOM over QMOM are its more efficient coupling to fluid dynamics and straightforward extension to multivariate distributions (namely, with more than one internal coordinate) [15,29]. The latter is particularly important in the case of C-S-H as oftentimes the precipitation leads to variable composition solid solutions [30] necessitating the application of at least two internal coordinates, namely, size and composition [10].
Another limitation of our previous work is related to the manner of applying the local equilibrium assumption. There, we calculated the supersaturation ratio at fixed time steps selected depending on how fast the precipitation consumes the precursors [10]. Therefore, supersaturation ratio was an externally calculated quantity provided to the ODE function (the function calculating the derivatives as a function of time and dependent variables [31]). This is very similar to the approach adopted by Myerson et al. who only recalculated the supersaturation ratio when there was significant (more than 0.1%) change in the amount of the precipitate [26]. This approach was adopted to minimize the computational burden of speciation calculations. Additionally, similar to the work by Haderlein et al. [14] and Galbraith and Schneider [32], a bespoke speciation solver was developed to expedite the overall simulation [10]. Even though our equilibrium solver and that by Haderlein et al. are developed in a general fashion, this practice limits the applicability of the developed tools, as it requires setting up a thermodynamic database for every single new scenario. Instead, there are a number of powerful freely-available thermodynamic solvers with huge databases already implemented, such as PHREEQC and GEMS [12,33]. Therefore, in this work we will present a general-purpose protocol for the coupling of PHREEQC to the PBE simulations of precipitation processes with the former imbedded within the ODE function. Figure 1. Summary of synthetic C-S-H precipitation system studied here. (a) Transmission electron micrograph of C-S-H particles with foil like morphology; (b) schematic representation of C-S-H nanofoils composed of defective crystallites nematically ordered in two dimensions; (c) internal structure of C-S-H crystallites from atomistic simulations [11,34]; (d) the proposed precipitation pathway for synthetic C-S-H of Ca:Si=2 [10]. Adapted with permission from Ref. [10] Copyright 2018 The Royal Society of Chemistry. where is a source term embracing all the solid formation/transformation processes such as nucleation, growth, and aggregation, inflows and outflows of crystallites, and possible changes in the volume of reaction liquor. In the kinetic modelling of precipitation processes, the PBE is solved along with differential equations written for mass balances (viz., the conservation of elements inside the reactor). In the direct quadrature method of moments, the NDF is approximated by a discretized distribution as , , 2 where denotes the discretization nodes with weights at sizes (or abscissas in nm) , , is the overall number of the nodes, and is the Dirac delta function [15,29] In these equations, denotes a diagonal matrix, and are the rates of primary homogeneous and true secondary nucleation processes (crystallites.m -3 .s -1 ), * and * are the respective critical nuclei sizes (in m), and is the volume of the reaction suspension.

Population Balance Equation and Its Solution Using DQMOM
In the current study, three measures have been exercised to make the DQMOM method more robust.
Firstly, the matrix is written with abscissas in nm to reduce the condition number and facilitate the solution of the linear system (that is, Eq. (4) which gives the temporal derivatives of the quadrature weights and weighted abscissas). Secondly, additional reduction in the condition number is attained by left preconditioning (SI Eqs. (26 28)) [13,35]. Thirdly, the temporal derivatives of the log10 of and , are integrated rather than the untransformed variables to bring them into an order of unity and improve the convergence and robustness when using the MATLAB's ODE solver [31].

Thermodynamic Speciation via Coupling to PHREEQC
PHREEQC is a freely-available geochemical reaction solver capable of simulating a variety of processes including solid-liquid-gas equilibria, surface complexation, ion exchange, and much more [12]. Aside from its carefully developed internal database, there are many comprehensive third-party databases including Cemdata18, specifically developed for cementitious systems [30]. With such a broad range of capabilities and extensive thermodynamic infrastructure, coupling PBE simulations with PHREEQC opens up new avenues in the practical and facile application of this powerful method to the understanding and design of particulate processes. Such coupling is facilitated by an already developed module called IPhreeqc, which enables interfacing with different scripting languages such as MATLAB and Python via Microsoft COM (component object model) [36,37]. Very recently, a number of articles have been published reporting the coupling of PHREEQC with PBE simulations [38][39][40]. Nonetheless, to the best of our knowledge none of these publications provided the corresponding computer code and procedures for the implementation.
Here, we developed a function (the file "eqbrmSolver.m" in the SI) that provides a general interface for PHREEQC speciation calculations in MATLAB. Briefly, information about the solution chemistry (e.g., different compounds and their concentrations) and experimental conditions (such as temperature or gases at constant partial pressure in equilibrium with solution) are provided to the interface, which in turn passes the data into PHREEQC solver via the COM object. These inputs are provided using keywords in a fashion similar to PHREEQC syntax ( Figure S 1). This allows the simulation of precipitation in practically unlimited number of systems and scenarios without the necessity to rewrite the speciation code and/or its database.
The information passed as the outputs of speciation calculation are the mass of water solvent, solution density, elemental concentrations, species concentrations and activities, pH, ionic strength, and saturation indices ( log with being the ionic activity product [12]) with respect to different solid phases.
In the current study, we employed the Cemdata18 database [30] for all the aqueous reactions and the density and solubility product of the precipitate (C-S-H with Ca:Si=2) were taken from our previous paper [10]. ). An example simulation is presented in the "demo.m" script provided in the SI. All the core PBE simulations are handled by "pbe.m" function while the output provided by this function suffices for the calculation of any other output of interest. For instance, knowing the moments allows one to calculate different crystallite and particle characteristics such as size and specific surface area. Similarly, knowing the amount of water solvent (input to "pbe.m") and temporal mole amounts of all the elements, one can back calculate the full speciation using the function "eqbrmSolver.m".

Overall Simulation Workflow
For a typical scenario that simulates 24 hours of precipitation with known model parameters (from our previous work [10]) on an ordinary HP laptop, with dual-core Intel® Core™ i5-4310M CPU @ 2.70 GHz 2.70 GHz processor and 8.00 GB of RAM, the run time was 80 seconds. This is roughly half the time it took using our previous PBE code (which was using QMOM and bespoke speciation solver). It is worth noting that in the current version, in contrast to our previous code, the speciation calculations are embedded within the ODE function. In other words, the calculation is performed at every time step selected by the ODE solver, which makes the total number of such calculations much greater in number than in our previous code.
Therefore, the real speed up due to the application of DQMOM (in the modified form introduced here) in place of QMOM is much larger, consistent with previous reports [14,28]. Figure 2. The overall algorithm for the PBE simulation of a precipitation process. Black boxes comprise the main workflow backbone (e.g., "demo.m" script in the SI), blue represents the content of the ODE function, and red refers to the integration of the ODE set using an appropriate solver (MATLAB's ode15s in this case [31]). All the symbols are defined in the main text.

Problem Setting for Uncertainty Assessment
To quantitatively apportion the variability in the output of the PBE simulation to various sources of uncertainty in the input (factors), we applied model-independent sensitivity analysis using three popular methods: PAWN [16,17], Elementary Effect Test or the method of Morris (EET) [18,19], and variance-based sensitivity analysis (VBSA) using quasi-Monte Carlo samples generated by the method of Sobol' [19,20]. The analysis was mostly implemented using the SAFE package, an open-source MATLAB toolbox that includes various functions for the generation of input samples, estimation and evaluation of sensitivity indices, and extensive visualization tools [41]. The target of uncertainty/sensitivity analysis, that is, the PBE model for the precipitation of synthetic C-S-H, has five unknown parameters: interfacial tension ( ), cohesion energy ( ), growth rate coefficient ( ), kinetic order of growth ( ), and crystallite aspect ratio ( ) [10]. Below we will discuss the feasible uncertainty domain for each of these parameters.
The nominal value of interfacial tension from our previous regression to experimental data was estimated to be in the range of 0.05-0.06 J.m -2 [10]. solutions. Therefore, a lower bound of 0.04 J.m -2 was assumed for . Again, numerical experiments showed that smaller would result in unrealistically early nucleation (at very low supersaturation ratios) or even spinodal decomposition [43].
Theoretical considerations dictate that the relative cohesion energy between an already precipitated solid substrate and secondary nuclei is in the range 0-2 [10,[44][45][46]. Preliminary tests, however, indicate that in our system of interest values larger than unity would give extremely small effective interfacial tensions for secondary nucleation, particularly when the interfacial tension is close to the lower bound set earlier.
Additionally, a value of / = 2, which corresponds to coherent interfaces or epitaxial growth, hardly happens in precipitation from liquid solutions because of ions and solvent molecules adsorbed onto the surface of the substrate [46]. This is compounded with the extremely defective nature of C-S-H crystallites hampering the formation of interfaces with matching lattices [10,11]. All things considered, a pragmatic upper bound of 1 was selected for / .
In our previous work, we fitted values in the order of 10 -9 to the parameter and 2 for parameter [10].
For the sake of UA/SA, values within one order of magnitude around 10 -9 were considered. As suggested by Marino et al. [47], to sample the variability space more uniformly, considering the variation of over two orders of magnitude, log was preferred for the sampling. As for kinetic order of growth we sampled in the range 1-3 which is the typical variation range covering rough growth, dislocation-controlled mechanisms and surface nucleation regimes [48,49].
Finally, the ratio of crystallite edge length to its thickness regressed in our previous work was 0.5 [10].
Therefore, an input range 1/3 to 1 was considered for the possible variability. For all the input factors uniform probability distribution functions were considered for the generation of the sample [50].

Sensitivity Measures
In this section, we will briefly explain the three sensitivity methods employed in the current study to facilitate the comprehension of the results. The interested reader is referred to the relevant literature for a more in depth discussion [19,21].
As Saltelli et al. [19] argued, UA/SA consists in the examination of uncertainty in parameters (input factors) propagating through a mathematical model all the way to the model outputs [19]. One way to do this task is through Monte Carlo analysis, wherein a set of row vectors are generated by sampling the input variability space of different model parameters. The accumulation of these sets gives an input matrix with each row corresponding to a set of model parameters whose introduction to the model allows a single simulation run. Therefore, any UA/SA requires an input sample matrix.
In this study, low-discrepancy Sobol' sequences were constructed by first generating an input sample of size 2 , where is the base sample size, and denotes the number of uncertain parameters (e.g., = 5 for five model parameters subject to SA). Sample was generated using Latin hypercube sampling (LHS) strategy [19,21,41]. This sample was then resampled to build three matrices , , and , where and are simply the first and last rows of , respectively, while is a block matrix of recombinations of and where , is an matrix whose columns are all taken from except for the i th column which is taken from [19]. Once we have , , and , the PBE model should be run with all the rows in sample matrices as input model parameters giving a total of 2 sample outputs. Here, the size of crystallites and particles, their specific surface areas (SSA), C-S-H precipitation yield (conversion with respect to equilibrium composition), solution pH, and saturation index with respect to portlandite, all quantities after 12 hours of precipitation, are qualitatively examined as model outputs (uncertainty analysis). Subsequently, quantitative assessment in the form of global sensitivity analysis was performed on three selected outputs: crystallite thickness ( ), particle edge length ( ), and specific surface area of particles ( ). These are of special practical relevance and they can be compared to experimentally measured values [2,5,7,10,51]. We started with a base sample size of N = 2000 (corresponding to 14000 input sample points) and extended the matrix using the SAFE toolbox function "AAT_sampling_extend" to assess the convergence behavior of different sensitivity measures. Throughout this work, bootstrapping over 1000 resamples has been used to estimate the 95% confidence bounds on all the SA indices [41,50].
The first SA method applied here is PAWN, a density-based (or moment-independent) method recently developed by Pianosi and Wagener [16,17]. The central idea behind this method is to compute sensitivity through variations in the cumulative density function (CDF) of the output, induced by fixing one input factor. In practice, this is achieved via estimating the divergence between unconditional output CDF, namely that generated by varying all the input factors, and the conditional CDF generated by fixing an individual factor to a prescribed value. Several values within the input variability space can be assigned to the prescribed value, a practice referred to as multiple conditioning, to generate a number of divergence values that can be aggregated in some kind of statistic [16,17,21]. In PAWN, the divergence is expressed in terms of Kolmogorov-Smirnov (KS) statistics which is the maximum vertical distance between the conditional and unconditional CDFs [16,17]. PAWN, and moment-independent methods in general, are particularly useful in case of highly skewed or multimodal output distributions. In such cases, variance is not an adequate proxy of uncertainty and variance-based methods (see below) can no longer be applied [17].
Another advantage of these methods is that they can be estimated from generic samples, that is, without requiring tailored sampling strategies [17,21]. Therefore, in this work we use the samples generated as described earlier to estimate the average and maximum of KS statistics calculated over 10 conditioning intervals [50]. To distinguish influential and uninfluential input factors, following Khorashadi Zadeh et al. [52] we artificially introduced a dummy input factor that does not appear in the model and, thus has no impact on the output. Therefore, the sensitivity index (maximum of KS statistic) corresponding to this factor defines the threshold for parameter screening [50,52].
The second SA method used here is the Elementary Effect Test [18,19]. In this case, the idea is to correlate model sensitivity with the effect of perturbing the input factors-one at a time-on the model output. An example of this approach is to estimate (e.g., by finite differences) the partial derivatives with respect to different model parameters at their nominal values. In this form, the method is computationally very cheap but only provides local sensitivity information [21]. A global extension of this technique is to compute perturbations from multiple points within the input variability space, followed by aggregating them in some type of statistic. The most popular method in this group uses the average of finite differences (also known as Elementary Effects or EEs) as the sensitivity measure ( ) [19,21]. Here, a refined measure taking the absolute values of EEs is used to avoid cancelation due to sign differences [53]. Beside the average of EEs, standard deviations ( ) provide information about the degree of interaction between the parameters and/or their level of nonlinearity [19,21]. To apply EET using the Sobol' sample we discussed earlier, the input and output were converted to the format required by EET functions. This was done using the function "fromVBSAtoEET" in the SAFE toolbox, which rearranges the matrices , , , and so that they can be used to calculate EET indices from a radial design [19,41]. With this approach, the number of sampling points ( ) would be equal to the Sobol' base sample size ( ).
The last method employed in the current study is the variance-based SA (VBSA) [19,20]. This method assumes that the output variance is an indicator of its uncertainty and the contribution of each input factor to this variance is a measure of sensitivity. This technique handles nonlinear and non-monotonic functions/computational models, as well as those exhibiting interactions between their factors. Besides, it is able to capture the influence of each factor's full-range of variation [20,54,55]. Perhaps, the biggest drawback of this method is the large number of simulation runs it requires for convergence [21,55]. Aside from computational aspects, another limitation of VBSA is that, variance is not a meaningful gauge for highly skewed or multimodal output distributions and hence, for such situations VBSA indices are not appropriate measures of sensitivity anymore [17,21].
In VBSA, typically two types of indices are defined, first-order and total-order. First-order indices (also known as main effects, ) measure the direct contribution of individual input factors to the variance of output distribution. Equivalently, this can be thought of as the reduction in the output variance achievable by fixing inputs one at a time [56]. In a model where output variability is only a result of main effects (lack of interactions between inputs), ∑ 1 and the model is said to be additive. Nevertheless, in complex computational models, this is rarely the case and main effects do not sufficiently describe the output variability. Considering the high computational expense, particularly for larger , that has to be incurred to estimate all the interaction effects, one may calculate total-order sensitivity indices, , which embrace the main effect as well as all the interactions (of any order) involving the input factor [56,57].
Considering the nature of total effect indices, they are particularly suited for parameter screening as having zero total effect is a necessary and sufficient condition for an input parameter to be uninfluential [21].

Uncertainty Analysis with Model Parameters as Input Factors
In this section, we employ visual tools (scatter plots and histograms) to appraise the propagation of uncertainty from model parameters to different model outputs. Figure 3(a) shows the histograms for average crystallite thickness and edge length ( and , respectively) with probability normalization while Figure 3(b) portrays the corresponding results for particle edge length ( ). From Figure 3(a) we can see that the crystallite thickness and edge length are typically a few nm, consistent with previous reports for different C-S-H products [7,10,11,51,58,59]. Particle edge length, instead, is typically 1-2 orders of magnitude larger and its distribution assumes a very long tail spanning up to a few µm (Figure 3(b)) [60].
Concerning the specific surface area (viz., surface area per unit mass of precipitate), the hierarchical structure of the solid C-S-H gives rise to two types of surfaces. One is the overall area of the external surfaces of crystallites (i.e., neglecting the internal crystallite structure; SSACrystallite), and the other only considers the external surface of particles neglecting the surfaces embedded within the bulk of the particles (SSAParticle). Therefore, by definition SSAParticle ≤ SSACrystallite with the equality happening in the absence of secondary nucleation and aggregation (in other words, when every crystallite is a particle by itself). SSACrystallite is calculated from the zeroth moment of crystallite size distribution (which is directly available from PBE simulations) and while we can estimate SSAParticle from and (see the Electronic Supplementary Information in Ref. [10] for the estimation of using geometrical considerations).  Another simulation output is the saturation index ( ) with respect to portlandite, a solid phase that competes with C-S-H for precursor ions during the precipitation [10,11,51]. In Figure 3(f) we have plotted this quantity vs. the solution pH at the end of the precipitation (after 12 h). From this plot, an unambiguous correlation is visible, where increases monotonically with pH. In our previous work, we showed that under the examined operating conditions the system was always undersaturated with respect to portlandite, consistent with experimental observations [10,11]. Nevertheless, according to Figure 3(f) at higher pH values portlandite may precipitate along with C-S-H. Indeed, our simulations with nominal model parameters [10] but at a higher inflow NaOH concentration (e.g., 4 times the value reported in [10] giving a final pH of 13.2; Figure S 14(a)) showed that the system does become supersaturated with respect to portlandite in line with certain experiments (refer to SI Section 3 for further discussion) [51,61]. Now let us examine the mapping of input uncertainty to three selected outputs , , and . From Figure 4, we see that variability in parameter has almost no effect on any of the outputs (as implied by the uniformity of the scattered points and the lack of pattern [21]). Concerning other input factors, however, the relative degree of uncertainty propagation depends on the output. From Figure 4 (a-c, e), parameters , / , log , and are all influential with respect to as the output, with / having less impact compared to others. Consulting Figure 4(f-h, j), uncertainty in / clearly has the highest effect on the variability of (strong pattern formed by the scattered data points) while , log , and have much less of an impact. A similar argument applies to (although to a lesser extent) with the output being much more sensitive to / (Figure 4(k-m, o)). In the next section, we will present the quantitative assessment of sensitivity with respect to different model parameters. Figure 4. Scatter plots of crystallite thickness (a-e), particle edge length (f-j), and particle surface area (k-o) vs. different input model parameters (base sample size 20,000). The latter can also be used to identify the influential and uninfluential input factors by comparing the sensitivity indices to that of a dummy variable (which has no effect on the model outcome) [50,52]. From both indices, we can clearly verify the minimal effect of on the studied model outputs consistent with the conclusions made from the scatter plots as discussed earlier (see previous section and Figure 4). Indeed, taking the maximum KS statistic as the sensitivity measure, it is barely higher than the value estimated for the dummy input ( Figure 5(d-f)). With as the model output, the rest of the parameters are all influential with / being slightly above the dummy variable, and , log , and exhibiting quite similar higher influences ( Figure 5(a,d)). With and to a lesser extent , variability in / has the highest impact on the output uncertainty. Except for the uninfluential factor , the rest of model parameters have similar impact over these two outputs ( Figure 5(b,c,e,f)). Figure 5. PAWN sensitivity indices in the form of mean (a-c) and maximum (d-f) of the KS statistic, with 95% confidence intervals obtained from bootstrapping, for crystallite thickness (a,d), particle edge length (b,e), and particle surface area (c,f) as the outputs (sample size of 140,000). for is invariably much smaller than that of the most influential factor). With as the model output,

Sensitivity Analysis with Model Parameters as Input Factors
/ is identified as the second least influential parameter while close values are predicted for the other inputs (similar to PAWN; Figure 6(a) and Figure 5 (a,d)). Along the same lines, with and to a smaller degree , / has the highest impact on the output with sensitivity indices being 48 and 12 times that of , respectively (Figure 6(b,c)). Figure 6. Sensitivity indices obtained using EET with crystallite thickness (a), particle edge length (b), and particle surface area (c) as the outputs (sample size of 120,000). The results are presented as the mean of Elementary Effects plotted against their coefficients of variation (all the 95% confidence intervals are estimated by bootstrapping).
Another observation from our EET analysis is related to the level of nonlinearity in model parameters and/or the degree of interaction between them. Following a method proposed by Garcia Sanchez et al.   at such a high sample size (140,000) the confidence intervals are wide and there is overlap between the total effect of / with log , and log with (Figure 7(a)). Therefore, we attempted a second SA fixing 2 (which we already know is practically uninfluential) and going up to a sample size of 288,000 (base sample = 48,000). Doing that, the confidence intervals of / fall well below the other three influential parameters ( , log , and ; Figure S   are the main and total effects, respectively, and the 95% confidence intervals are estimated by bootstrapping). Figure 7(b), we note the same problem with as with . This time, however, even at an input sample size of 288,000 the confidence bounds overlap significantly ( Figure S 7(b)). This observation can readily be explained by looking at the probability histograms of the output, extending over three orders of magnitude (Figure 3(b)). In other words, the complication arises from the highly skewed distribution of with a skewness of 30 (compare with 10.7 for ; Figure 3(a)) [17,21]. One remedy to this problem is the application of rank (Figure 7 Especially, with parameters , log , and affecting mainly via interactions (see their zero main effects in Figure 7(b) and Figure S 7(b)), upon transformation they apparently become much less influential (that is, they adopt smaller total effects; Figure 7(e,h) and Figure S 7(e,h)). The same complication can be traced in Figure 7(d,g) (and Figure S 7(d,g)) because parameters mainly affect by way of interactions.

Consulting
Another interesting feature can be seen in variance-based indices with log as the output (Figure 7(i) and Figure S 7(i)). Here, in contrast to the case with untransformed output variable (Figure 7(c) and Figure S 7(c)), the indices have very broad confidence intervals that significantly overlap and make any conclusive deduction impossible [21]. A closer examination of probability distributions of and its log10 transformation reveals that while the former is almost unimodal (Figure 3(d)) the latter is highly multimodal ( Figure S 8). Quantitatively, the Hartigan's dip test of unimodality [66,67] gives p-values of 0.11 (insignificant multimodality) and 0 (significant multimodality) for the untransformed and log10 transformed variable, respectively. Therefore, aside from the complication in converting SA results from transformed outputs back to the original ones [57,64,65], log10 transformation may render the output distribution multimodal limiting the applicability of VBSA to such scenarios [16,21]. It is worth noting that the latter problem should not happen with rank transformation, as the converted distribution is always uniform and thus unimodal [68].

UA/SA with Selected Model Parameters and Experimental Conditions as Input Factors
Now that we have examined the model behavior in detail, we can turn our attention to the holy grail of the current study, that is, the theory-driven design of nanoparticle synthesis processes. Ideally, a precipitation model should be able to explain the process as a function of experimental conditions alone. In other words, all the parameters in the theoretical framework have to be defined as a function of operating conditions such as temperature, concentrations of reagents, ionic strength, etc. This is not an easy task because the development of such models requires extensive and sometimes independent sets of experimental data to identify the mechanistic steps involved and calibrate the corresponding theoretical constructs. For instance, Schroeder et al. attempted to calibrate such a framework for the formation and polymorphic transformation of calcium carbonate [24]. Although they accounted for different physicochemical aspects and correlated different parameters with the environmental conditions inside the reactor, limited success was achieved in reproducing the experimental data given the extremely complicated nature of the precipitation process. In the specific case of C-S-H precipitation, additional complications arise due to the nature of the precipitate usually forming a solid solution whose composition depends on the environmental conditions and may evolve as a function of time [10,69,70]. Therefore, with the experimental kinetic data being scarce for synthetic C-S-H [10,11], it is only possible to semi-quantitatively design the product properties as we will present in this section.
In its novel environmental [2][3][4], biomedical [5][6][7], and catalysis applications [8,9], the accessible specific surface area of C-S-H product (i.e., ) is one of the most important properties of interest. Therefore, in this section we mainly focus on this characteristic, while information about crystallite and particle sizes are addressed for benchmarking against the literature data.
From the discussion in the previous sections, we found that among the model parameters is significantly less influential and its impact is barely above the dummy variable. Additionally, our previous studies showed that the aspect ratio of C-S-H crystallites is 0.5 irrespective of mixing flow rate [10]. Interestingly, the same aspect ratio was found for lower Ca:Si solids based on atomistic simulations [71]. Therefore, in this section we fix these parameters to nominal values  we observe here [73][74][75]. TEM images also give values within the range of a few tens to a few hundreds of nm for the width of C-S-H particle [10,11,72].    Figure 9(b)). Among the experimental conditions, the addition flow rate of silicate solution seems to dominate the output variability, albeit with a lower impact when compared to / (Figure 9(g)). Figure 9(h) shows the colored scatter plot for these two factors with marker colors proportional to the output value. The emergence of color patterns in such a plot is a simple and intuitive tool to assess the degree of interaction between pairs of input factors [21,41]. From Figure 9(h) a weak pattern can be discerned (upper left region) where simultaneous occurrence of high and low / gives rise to exceptionally higher surface areas (see also Figure S 11(c) presenting the corresponding EET results where C.V. for the three most influential parameters are all below unity indicating weak interactions among the parameters). For a quantitative assessment of variability propagation to model outputs, PAWN sensitivity indices were estimated for different model parameters and experimental conditions as input factors. Figure 10 summarizes the results with , , as the outputs (consult Figure S 10 for the convergence of PAWN indices; similar conclusions can also be obtained using EET as depicted in Figure S 11). For , log and are the most influential factors (Figure 10(a,d)). All of the experimental variables have low influences on barely above the dummy index (Figure 10(d)). The larger impact of flow rate can be understood from the fact that at higher addition rates, the supersaturation build up is larger which in turn induces more contribution from nucleation events to the overall precipitate. Put differently, higher nucleation rates give rise to larger number of crystallites among which the remaining precursor is divided giving rise to smaller crystallites (the same trend was also detected in our previous work; see Table 1 in Ref. [10]).
Looking at Figure 10(b,e), is most sensitive to / with the rest of input factors having minimal effects only marginally above the dummy index. Physically, this means that the relative rates of primary and secondary nucleation events determine the final particle size.
With as the SA target, again / is the most influential parameter (Figure 10(c,f)) compatible with our scatter plots (Figure 9(b)). Besides, among the experimental conditions, we can distinguish a comparable dependence on (Figure 10(c,f); similar inference as in scatter plot Figure 9(g)). Conversely, the of the product is much less sensitive (in a global sense) with respect to the other experimental variables. This is a favorable outcome as it allows for optimizing this key property by tuning the synthesis conditions. Therefore, higher surface areas can generally be obtained by increasing irrespective of the value other uncertain input factors assume. From a physical point of view, this can again be explained in the light of supersaturation buildup brought about by higher (providing the limiting reactants-Na2SiO3 and NaOH-faster), which favors primary nucleation over secondary nucleation (and nucleation, in general, over growth) [76]. Consequently, higher particle number concentrations are obtained making the overall surface area larger. Figure 10. Results of SA with selected model parameters and experimental conditions as the input factors. PAWN sensitivity indices in the form of mean (a-c) and maximum (d-f) of KS statistic, with 95% confidence intervals obtained from bootstrapping, for crystallite thickness (a,d), particle edge length (b,e), and particle surface area (c,f) as the outputs (sample size of 54,000).
Previously, Wu et al. synthesized C-S-H (of lower Ca:Si ratios) with specific surface areas ranging between 100 and 500 m 2 /g, obtained by varying the synthesis conditions [7]. Our results show that there is room for further improvement by increasing the addition flow rate of silicate solution although one has to optimize the design of the synthesis reactor for maximal mixing [77]. This can be reinforced by the synergistic effect of lowering the cohesion energy (that is, lowering / ; Figure 9(b)) which can be induced either by increasing the relative concentration of monovalent ions (e.g., adding a sodium salt to the mixture) or by working at lower pH values [42,78]. Of course, this conclusion only applies to the set of synthesis conditions investigated here and carefully calibrated computational models are needed to cover scenarios that are more diverse. This includes, for instance, different reactant ratios (which typically induces variations in the Ca:Si ratio of the precipitate [9]), alternative addition orders, different pH levels, and the inclusion of other reagents/surfactants (beside those used in the original experiments [10]).

Conclusions
In summary, we presented a faster, more user-friendly, and more robust version of our previous PBE modelling framework (put forward in Ref. [10]) describing the process of precipitation from liquid solutions.
This was achieved by replacing our speciation function with an interface to PHREEQC, providing access to the large databases already implemented in this popular software. Thanks to this modification, the adaptation to new precipitation scenarios is made much more straightforward and can be performed using keywords similar to the conventions used in PHREEQC, eliminating the need to prepare the database file for every new system. Another modification was the application of DQMOM, which offers several advantages in terms of speed, robustness, and adaptability over our previously implemented method (QMOM). Subtle technicalities in the implementation of DQMOM to obtain a reliable and quickly converging solution method were explained to allow replication/extension of the current work by other researchers. We also provide fully commented MATLAB codes implementing the PBE simulation workflow in the accompanying Supporting Information.
Upon developing an improved computational framework, three different global uncertainty/sensitivity analysis (UA/SA) methods were applied to understand the behavior of the model in response to uncertainty in various model parameters. For several simulation outputs, either we demonstrated the consistency of the results from different SA measures or explained the reason behind the inadequacy of the applied method. In the latter case, for instance, we presented particle edge length as an output whose highly  Now, defining the moment source term ̅ ̅ ≡ 12 Eq. (11) can be recast into a matrix form as (boldface symbols are vectors and matrices) In our experience, the scaling procedure explained earlier is an inevitable step in the application of DQMOM to the process of nanoparticle formation. Yet another reduction in the condition number of matrix can be readily achieved by preconditioning which makes the convergence of the iterative solution more robust and faster [5]. This is particularly important when dealing with particulate processes that give rise to sharp changes in the particle phase space (e.g., nucleation or aggregation) [4]. Here, we applied a left preconditioning using a diagonal matrix with main diagonal elements With being diagonal, its inverse can trivially be obtained via inverting the main diagonal elements [3].
After these considerations, solving Eq. (28) at each time step of integrating the set of ordinary differential equations (ODE set composed of PBE + mass balances) yields and . For the ODE solvers to work efficiently, the dependent variables have to be properly scaled [6]. Here, having the weights in the range of 10 20 crystallites.m -3 (or higher), both and , , can be extremely large, especially during the burst of nucleation (nucleation rates are in excess of 10 16 crystallites.m -3 .s -1 for typical model parameters [7]). To bring these values to the order of unity, we solved the ODE set for log10 of the weights and weighted abscissas Now, let us describe the constituents of the source term ̅ . For molecular growth (which could be sizedependent [7]) in a homogenous system we have [1] , 31 Therefore, which was obtained using integration by parts [3]. Now, using Eq. (1)

̅ 33
In fact, this is the N-point quadrature approximation of the growth source term for moment order k [1,7]. In matrix format and for abscissas in nanometers where is the volume of the reaction suspension. Again, in matrix form ln 1,10 , … ,10 , … , 10 ⋮ 39 Another complication arises from the fact that PBE only tracks the evolution of crystallites. Therefore, an additional differential equation is required to account for the time variation of particle number concentration ( ; particles.m -3 ) [7,8]. Again, with adopting very large values we solve the ODE for log10 of log 1 ln 10 40 ln 41 Besides the PBE set and the ODE for , we have to solve for mass balances over elemental abundances ( ) inside the solution. Since in the current precipitation system the molar amounts are in mmol range the derivatives are defined in terms of mmol.s -1 to bring them closer to the order of unity. The rate of precipitate formation therefore provides the coupling between kinetics and thermodynamic speciation calculation at each time step of integrating the ODE set [7,9,10].

Additional Details on the Implementation of PBE Simulation Framework
Throughout our PBE simulations, we generally used 10 -12 and 10 -5 , respectively, for the relative and absolute tolerances input to the MATLAB's ODE solver (ode15s). Moreover, to avoid convergence problems with PHREEQC we used a maximum step size of 3 (modified using the keyword data block KNOBS, keyword -step_size; default value is 100). This fine-tunes the variation in the activities of master species in a single iteration and make the solution more robust [11]. Despite these considerations, particularly in the case of very sharp fronts (as with, for instance, extremely fast nucleation at very high or very low accompanied by high / ), ODE solver may converge to less accurate results or may not converge at all. This can readily be checked by comparing the precipitation yield calculated from the third moment against that estimated from Ca elemental balance. The latter value is invariably more reliable as the calculation of moments in moment-based methods is always associated with some error which generally increases with the moment order and depends on the specific problem [2,12]. In our experience, with the default tolerances the discrepancy between the two yields is generally low and within a few percent. Nevertheless, an improvement can be achieved by reducing the absolute tolerance to 10 -6 . This, however, comes at the expense of additional computational time. Therefore, in all our simulations we first used a larger tolerance (10 -5 ) and redid the simulation with smaller tolerance only for cases with yields different by more than 1%.
Another important detail about the implementation of moment-based methods concerns the initial conditions for the solution of population balance equations. When there are no particles in the reactor at t = 0, the initial conditions for the weights and weighted abscissas would be zero [1]. Thus, in the first time step the abscissas are not defined, the matrix is not invertible, and hence, the terms and cannot be calculated. A viable way to work around this issue is to introduce a negligible amount of tiny fictitious seeds in the reactor to start the integration and maintain stability during the computations [1,7,10]. In the current work, N×10 4 m -3 crystallites/particles with sizes (1, 2, …, N)×10 -10 m were used to seed the PBE simulations.
One last implementation detail is related to the simulations performed at temperatures other than the room temperature. In the absence of thermodynamic data on the temperature dependence of C-S-H dissolution reaction (in particular for Ca:Si = 2), we adopted an approximation protocol recommended for the so-called isocoulombic reactions [13,14]. Such reactions have the same number of ions on each side of the reaction, or at least have the same total charge on both sides. This simplifies the temperature dependence of the reaction because of variations in ° for different species canceling out [13,14].
Additionally, Gu et al. [14] noticed that not only Δ ° and Δ ° terms are small for isocoulombic reactions, but also they are usually of opposite signs canceling each other out. This leads to the so-called one-term extrapolation approach which states that for well-balanced aqueous reactions Δ ° does not depend on temperature. This way, only the Gibbs energy change of reaction at one temperature (usually room temperature) suffices to estimate the value at other temperature [14]. The method also applies to isocoulombic reactions involving condensed phases with non-reference state solids (mostly minerals) giving less accurate estimates compared to when the solid is in its reference state [14]. For reactions that are not inherently isocoulombic, combining with other reactions (whose equilibrium constants are known as a function of temperature) is a necessary step before using the one-term extrapolation method [13]. For this purpose, oftentimes the ionization of water is used. For our C-S-H solid, the dissolution reaction reads .
.    index. As we can see, the convergence is achieved already at sample sizes above 50,000 and the 95% confidence intervals are narrow, allowing for a straightforward assessment of relative importance exhibited by various inputs. Overall, the relatively fast convergence of different indices, narrow confidence intervals, and the consistency of the results with scatter plots testify to the suitability of PAWN for the current UA/SA problem.

Supplementary UA/SA Results with Model Parameters as Input Factors
Figure S 4 summarizes the convergence behavior for the average of absolute EEs ( Figure S 4(a-c)) and their C.V. values ( Figure S 4(d-f)). For the mean values convergence is obtained at sample sizes above 60,000 in all case and the confidence intervals are quite narrow ( Figure S 4 (a-c)). Achieving convergence for the C.V.
values, on the contrary, poses a significant challenge and the confidence intervals are relatively wide even at a very large sample size of 120,000 ( Figure S 4 (d-f)). Fixing to a nominal value 2 and estimating the EET indices over a sample of size 195,000 resolves this issue (Figure S 5). With this larger sample, the obtained measures and their C.V. do not change appreciably but the confidence intervals generally become smaller (compare Figure 6 to Figure S 5). In particular, with as the output, the confidence interval of C.V. for log separates from that of other parameters assuming the lowest value among others ( Figure S 5(a)).
Similarly, with as the output, the C.V. for separates from that of and log      Figure S 6(b,c)). Still with the convergence is fast despite having parameters with similar main effects ( Figure S 6(a)). On the contrary, convergence of the total effects introduces a significant challenge in the case of and (Figure S 6(d,e)) given their highly skewed distributions as we discussed earlier ( Figure   3(a,b)). This mainly manifests in wide and significantly overlapping confidence intervals while the average values from bootstrapping already stabilize for larger sample sizes ( Figure S 6(d,e)). This is more evident when is fixed to a nominal value of 2 and the analysis is extended to even larger sample sizes (Figure S       ) which is an endothermic process and is promoted at higher activities [15]. A less strong influence can also be discerned with decreasing the Ca(NO3)2 concentration ( Figure S 13(e)) although this parameter has to be kept high enough to achieve acceptable precipitation yields ( Figure S 12(e)). PBE simulations at NaOH concentration 0.4 mol/kg water (4 times the value in our experiments in Ref. [7]) at room temperature and 50°C show these effects, giving rise to periods of supersaturation ratio in excess of unity for portlandite (Figure S 14). Fortunately, none of these parameters directly influence the C-S-H particle SSA and therefore, it should be possible to produce high surface area C-S-H product without coprecipitating portlandite.