Superposition of artificial experimental error onto calculated time series: Construction of in-silico data sets

The data and complementary information presented here are related to the research in the article of “https://doi.org/10.1016/j.cej.2018.01.027; Chem. Eng. J., 342, 41–51 (2018)”, where sets of in-silico data are constructed to show a novel method for parameter estimation in biodiesel production from triglycerides (Heynderickx et al., 2018) [1]. In this paper, the method for the used error superposition is explained and in order to ensure a ready reproduction by the reader, this work presents the basic steps for superposition of a normally distributed error via a simple Excel® datasheet file.


a b s t r a c t
The data and complementary information presented here are related to the research in the article of "https://doi.org/10.1016/j. cej.2018.01.027; Chem. Eng. J., 342, 41-51 (2018)", where sets of insilico data are constructed to show a novel method for parameter estimation in biodiesel production from triglycerides (Heynderickx et al., 2018) [1]. In this paper, the method for the used error superposition is explained and in order to ensure a ready reproduction by the reader, this work presents the basic steps for superposition of a normally distributed error via a simple Excel s datasheet file. &

Value of the data
The procedure to superpose normally distributed experimental error onto calculated time series is described. The required equations are given and a specific example is elaborated.
Datasheets and algorithms arisen from this application were explicitly exposed and procedures explained.
A reusable Excel s data sheet is given within this paper to create so-called 'in-silico data'. The described procedure can be followed, with a minimal effort, by other users requiring artificial experimental time series with the usual purpose of testing novel procedures to interpret experimental time series.
High applicability and very easy practicability for users in every research field! given that

Data
A set of time series was generated via the numerical integration of a system of differential equations with given initial conditions, as explained in [1], on which normally distributed error was superposed.
This work gives a specific outline for the creation of this superposed experimental error in the generation of so-called 'in-silico data'.
A full data set, as used in Ref. [1], is given in this paper in Figs. 1-8. Final results of the parameter calculation procedure in Ref. [1] are mentioned in Tables 1-3 as data supplement.

Theoretical background
Experimental time series consist of measurements M i of the same quantity at equispaced points in time t i ¼ t 0 þ (i À 1)Δt. Like any measurement the M i are subject to an experimental error: The errors X i are here assumed to be of the Gauss-Markov type. This means that the errors have a normal distribution with zero mean, and are correlated with binary correlation coefficients via Eq. (2): The level of correlation between any two measurements decays exponentially as a function of the time elapsed between them, giving the error a 'memory'. If one measurement has a positive error, for example, there is a high chance the next measurement also has a positive error. Gauss-Markov errors frequently occur in experimental time series, as they have been identified, e.g., by Roelant et al. [3].
Long correlation times τ have a negative impact on the quality of the time series. In other words, correlation times on the time scale of the actual trends to be observed can cause random excursions which are mistaken for actual trends in the measured quantity. As part of the development of novel procedures to interpret experimental time series, such procedures are sometimes tested on artificial data, i.e., model calculated time series with an artificial error superposed. Producing artificial errors of the Gauss-Markov type offers the possibility to account for a realistic error memory. In this data article the authors show how artificial Gauss-Markov errors can be generated.
A Gauss distribution for a random variable X, with average μX and variance σ 2 X , is given by Eq. (3): Consider a measurement error X 0 with normal distribution with mean zero and variance σ 2 0 . The probability that X 0 ¼ x 0 is given by Eq. (4), which is the well-known Gauss distribution, see Eq. (3), with zero mean and variance σ 2 0 [4]: Now consider the error X 1 on the next measurement in time, with normal distribution with mean zero and variance σ 2 1 . If X 0 and X 1 are correlated with binary correlation coefficient, ρ, the probability Table 2 Parameter values obtained for different error values at the given temperatures and C MeOH,0 ¼ 0.068 M [1]. Parameters for 0% error can be found in Table 1.  that X 0 ¼ x 0 and X 1 ¼ x 1 is given by Eq. (5): Eq. (5) is the application of the so-called 'multivariate normal distribution' or 'multivariate Gaussian distribution', typically used in probability theory and statistics [5]. This is a generalization of the one-dimensional (univariate) normal distribution, see Eq. (3), to multiple dimensions. In the twodimensional case, the probability density of the random pair (X, Y) is given by Eq. (6), where ρ is the correlation between X and Y [5]: In the given case, i.e., for errors with a normal distribution with zero mean, Eq. (6) simplifies to Eq. (5). The conditional probability that X 1 ¼ x 1 if it is already known that X 0 ¼ x 0 can then be calculated as the so-called 'conditional probability' via Eq. (7): Table 3 Parameter values obtained for different error values at the given temperatures and C MeOH,0 ¼ 0.124 M [1]. Parameters for 0% error can be found in Table 1.
In probability theory, this conditional probability of an event, say B, is the probability that this event will occur given the knowledge that another event, say A, has already occurred by assumption, presumption, assertion or evidence. This probability is written as P(B|A). If events A and B are not independent, or 'correlated', then the probability of the both of A and B occurring is defined by P(A^B) ¼ P(A) *P(B|A), explaining the origin and meaning of Eq. (7).