On the Numerical Accuracy of Spreadsheets

This paper discusses the numerical precision of ﬁve spreadsheets ( Calc , Excel , Gnu-meric , NeoOﬃce and Oleo ) running on two hardware platforms (i386 and amd64) and on three operating systems (Windows Vista, Ubuntu Intrepid and Mac OS Leopard). The methodology consists of checking the number of correct signiﬁcant digits returned by each spreadsheet when computing the sample mean, standard deviation, ﬁrst-order autocorrelation, F statistic in ANOVA tests, linear and nonlinear regression and distribution functions. A discussion about the algorithms for pseudorandom number generation provided by these platforms is also conducted. We conclude that there is no safe choice among the spreadsheets here assessed: they all fail in nonlinear regression and they are not suited for Monte Carlo experiments.


Numerical accuracy and spreadsheets
Spreadsheets are widely used as statistical software platforms, and the results they provide frequently support strategical information. Spreadsheets intuitive visual interfaces are preferred by many users and, despite their weak numeric behavior and programming support (Segaran and Hammerbacher 2009, p. 282 In this article we used the methodology developed by McCullough (1998McCullough ( , 1999 and applied by Altman (2002); Bustos and Frery (2006); Keeling and Pavur (2007); McCullough (2000); Vinod (2000); ; Yalta and Yalta (2007) for the assessment of spreadsheets in a notebook with an i386 CPU (32 bits) and a desktop iMac with an amd64 CPU (64 bits). Regarding the operating systems, we used GNU/Linux Ubuntu Intrepid kernel 2.6.27-14generic and Microsoft Windows Vista SP1 on the i386 CPU, and Mac OS X Leopard on the amd64 CPU.
We present the analysis of five spreadsheets running in up to three operational systems:  Table 1 shows a summary of the platforms and versions here assessed. These are the latest stable precompiled versions available for each platform at the time the experience was conducted, with the exception of the development release of Gnumeric for Windows, which was the only build available of that spreadsheet for such platform. This diversity is due to different hardwares, operating systems, versions and implementations may produce different numerical precision results. In fact, we observe such behavior in Section 4, and in Section 5 we conclude that certain spreadsheets versions running in specific operating systems are advisable for particular operations.
The analysis is performed checking the number of correct significant digits returned by each spreadsheet when computing the sample mean, standard deviation, first-order autocorrelation, F statistic in one-way ANOVA tests, linear and nonlinear regression, and various distribution functions. The quality of the algorithms implemented for computing pseudorandom numbers is also discussed.
The rest of the paper unfolds as follows. Section 2 presents the historical background of these spreadsheets. Section 3 discusses the methodology for assessing the accuracy of the spreadsheets under consideration. Sections 4 and 5 present the results, discussions and lines for future research.

Historical background
Beginning with the suit StarOffice, the OpenOffice.org Calc spreadsheet was released in 1999. Its source code was released under the LGPL (Lesser General Public License, Free Software Foundation 2007b) in 2000, gathering a community of developers around its improvement and development of new functionalities. Calc currently offers almost all of the functions available in Excel, and many new other functions are included in each release. The add-ons mechanism included in the last version aims at allowing users to easily incorporate new functionalities without reinstalling the main core system.
Microsoft Excel for Mac OS was introduced to the market in 1985 as an alternative for Lotus 1-2-3, the most-used spreadsheet at the time. The first Windows version was issued in 1987. The greatest differential was the graphical interface which, allegedly, made its use easy by providing an intuitive access to its functionalities. Version 5.0 was introduced in 1993, supporting Visual Basic for Applications for the creation of macros to automate procedures (Özkaya 1996). Since 2007 it uses a new file type structure by default: the OOXML (Office Open XML), a Microsoft copyrighted format which is not fully compatible with other spreadsheets. The latest version, available only for Mac OS, is the 2008 and the 2007 for Windows. It also incorporates a mechanism for add-ons, but incompatible with OpenOffice.org.
suit, and it can be easily installed under the Acqua windows system. It claims that its current version provides all the features included in OpenOffice.org Calc, with improved speed and other functionalities (almost all related to interface and media support) common to Mac OS applications. Such improved speed may have impact on the numerical properties of the suite. It supports many of the add-ons provided by OpenOffice.org.
Oleo used to be the main GNU spreadsheet. It can be used both with a graphical interface or in character mode. The first operational interface for the X Windows System was released in 1994. Many of its functions are loaded from the GNU Scientific Library (GSL, Galassi et al. 2003), a powerful numerical engine, but none of the functions herein assessed belongs to GSL. Oleo project has no maintainer currently, and no references about its numerical properties were found in the literature.

Accuracy measurement
There are three kinds of errors that may appear in numerical computation: cancellation, accumulation, and truncation errors (Kennedy and Gentle 1980). These errors are due to the finite nature of binary computer representation of numbers (Knuth 1998, Chapter 4), and to negligent implementation of functions.
Since the particular type (or types) of errors into which an algorithm may incur depends on many factors (implementation, dataset, hardware and operational system, to name a few), extremely careful software development is always required.
In the following we will review the methodology employed for assessing accuracy of simple statistical functions, provided one does not have access to the algorithms they employ. In such situation, the evaluation must rely on the results those functions provide when used in special datasets.
Because of the lack of implementation details from some vendors (e.g., Microsoft) and the hard task required for analyzing source code, as previously said, we adopted the user evaluation viewpoint for assessing the reliability of spreadsheets. That is, we used each spreadsheet to calculate functions for different datasets, from which we know the exact (certified) answers. Comparing these answers with the values obtained from each spreadsheet we have a measure of accuracy: the number of correct significant digits.

Base-10 logarithm and correct significant digits
One of the most convenient measures for comparing two numbers consists in computing the absolute value of the relative error, and taking its base-10 logarithm: where x is the value computed by the software under assessment and c is the certified value. LRE (Log-Relative Error ) relates to the number of significant digits that were correctly computed. For instance, considering x 1 = 0.01521 and c 1 = 0.01522, then LRE (x 1 , c 1 ) = 3.182415. Moreover, if x 2 = 0.0000001521 and c 2 = 0.0000001522, then LRE (x 2 , c 2 ) = 3.182415. In both cases we say that "the number of correct significant digits is approximately 3".
In the following we adopt the convention of reporting one decimal place whenever LRE (x, c) ≥ 1, zero if 0 ≤ LRE(x, c) < 1 and "-" otherwise; there is no worse situation than "-".

Datasets and certified values
The Statistical Reference Datasets from the (American) National Institute of Standards and Technology (1999), NIST for short, were utilized as benchmarks. We used four kinds of datasets to assess the reliability of univariate summary statistics, analysis of variance, linear and nonlinear regression. They consist of nine, eleven, eleven, and twenty-seven datasets respectively, classified in three levels of difficulty: low, average and high.
For univariate summary statistics, analysis of variance, and linear regression, each dataset has certified values calculated using multiple precision arithmetic to obtain 500 digits answers that were rounded to fifteen significant digits. The same procedure was applied for nonlinear regression to achieve eleven significant digits.
NIST provides both observed (real world) and constructed data. For univariate summary statistics, low difficulty data are Lew, Lottery, Mavro, Michelso, NumAcc1 and PiDigits; the first four being observed, and the last two constructed. NumAcc2 and NumAcc3 are average difficulty data and NumAcc4 is the only high difficulty dataset for univariate summary statistics; they are all constructed.
NIST offers nine datasets for assessing the precision of ANOVA statistics, based on the proposal by Simon and Lesage (1989): SmLs01, SmLs02 and SmLs03 with low difficulty; SmLs04, SmLs05 and SmLs06 with average difficulty; and SmLs07, SmLs08 and SmLs09 with high difficulty. Two observed datasets are also provided: the low difficulty SiRstv, and the average difficulty AtmWtAg.
For linear regression, NIST gives Norris and Pontius observed datasets with low difficulty, NoInt1 and NoInt2 generated datatasets with average difficulty, and Filip, Longley, Wampler1, Wampler2, Wampler3, Wampler4 and Wampler5 datasets with high difficulty, being the first two observed and the last five generated. The models associated to these datasets are: three linear, one quadratic, one multilinear, and six polynomial.
There is a diversity of nonlinear regression datasets supplied by NIST. Eight have low difficulty: Misra1a, Chwirut2, Chwirut1, DanWood, Misra1b, Lanczos3, Gauss1 and Gauss2, being the first five observed and the remaining generated. Average difficulty data are Kirby2, Hahn1, Nelson, Misra1c, Misra1d, Roszman1, ENSO, MGH17, Lanczos1, Lanczos2 and Gauss3, where the first seven are observed and the other generated. Finally, high difficulty data are MGH09, MGH10, Thurber, BoxBOD, Rat42, Eckerle4, Rat43 and Bennett5; the first two generated and the rest observed. There are four rational, sixteen exponential, and seven miscellaneous models for these data.
Regarding the distribution functions, those results obtained with Mathematica 5.2 by Yalta (2008) where used. These values are certified to be accurate to six significant digits for all the distribution functions here assessed.

Results
Both the spreadsheet version and the hardware/software platform where it is used are factors that potentially may have some degree of influence. At the moment of this assessment,
The LRE values, see Equation 1, were computed using R 2.8.0 (R Development Core Team 2009) on GNU/Linux Ubuntu Intrepid (kernel 2.6.27-14-generic) whose excellent numerical accuracy has been assessed by Almiron et al. (2009).
Another important issue is the model representation in linear and nonlinear regressions. In previous works, the operator precedence interpretation led to wrong assessments (see Berger 2007), for instance Excel computes -1^2 as 1, while R computes the same operation as -1.
We used full parenthesis in all models to avoid such problem.
This section presents seventeen tables and a discussion, the later devoted to pseudorandom number generators (Section 4.5). The tables discuss results obtained when computing univariate statistics, namely, the sample mean (Table 2), the sample standard deviation (Table 3) and the sample first-lag autocorrelation (Table 4), analysis of variance (ANOVA , Table 5), linear (Table 6) and nonlinear (Tables 7, 8, 9 and 10) regressions and distribution functions (Tables 12,13,14,15,16,17,18 and 19). Each table presents LRE s, whenever the procedure is natively available in each spreadsheet/version/platform. In those cases where the results of at least two versions coincide, they are presented only once.

Univariate summary statistics
All spreadsheets offer specific functions to calculate the sample mean and standard deviation. In Calc, Excel, Gnumeric and NeoOffice, the sample mean can be computed with the AVERAGE function, and the standard deviation with the STDEV function. Oleo offers the functions AVG and STD, respectively. Table 2 shows the LRE s when computing the sample mean; the expression x = n −1 n i=1 x i was used to compute the certified values. Each column presents the results of all considered versions of each spreadsheet since they presented no differences.
We observe that Gnumeric presents the best results among all platforms, achieving the highest  possible precision in every situation. Calc, Excel and NeoOffice, interestingly, attained the same precision with every but two datasets, NumAcc2 and NumAcc4, which pose average and high difficulty, respectively. Oleo presented the worst results, with problems mainly in datasets with average and high difficulty. Table 3 presents the results of assessing the precision when computing the standard deviation, whose certified values were computed with the expression Calc, Gnumeric and NeoOffice exhibited similar behavior, achieving the same results for all datasets, with bad results for NumAcc3 and NumAcc4. Excel 2007 presented similar results to Calc, Gnumeric and NeoOffice, except for the dataset NumAcc2, for which it was worse.
Excel 2008 has several problems with four datasets, two of them of low difficulty. Again, the worst results were produced by Oleo.
Excel 2007 results for sample mean and standard deviation match with those calculated in McCullough and Wilson (2005) and confirmed by McCullough and Heiser (2008).
The first-order autocorrelation certified values were calculated using the expression No spreadsheet provides a specific function for computing this quantity, and it is not possible to calculate it as a simple correlation using the CORREL function. Consider the sample a 1 , a 2 , . . . , a n−1 , a n ; the first-lag autocorrelation can be computed dividing the sample covariance between a 1 , . . . , a n−1 and a 2 , . . . , a n , divided by the sample variance of a 1 , a 2 , . . . , a n−1 , a n . All spreadsheets provide the functions COVAR and VAR. Oleo does not provide any native function for computing the first-order autocorrelation or the covariance, and, therefore, was not assessed with respect to this quantity. 11.0 11.0/NA 11.0 -/11.0 and, in particular, these spreadsheets did not return any correct digit for NumAcc3 and NumAcc4.
The computation of the first-lag autocorrelation reveals differences among versions. Regarding Excel, it loses more than ten correct digits when switching from version 2007 to 2008 and using NumAcc3 e NumAcc4.NeoOffice also presents differences of the same order in these same datasets, but the newer version is improved with respect to the older one.
Finally, Calc exhibits a small difference with respect to operating systems when computing the first-lag autocorrelation of NumAcc2: its Mac OS and Windows versions produce results of almost one more correct digit with respect to Ubuntu.

Analysis of variance
Only Excel 2007 and Gnumeric provide native functions for computing ANOVA, so the accuracy of its F statistic was only assessed for these two spreadsheets. We used Analysis ToolPak in Excel, and ANOVA in Gnumeric. Table 5 shows that both platforms achieved good results when low and average difficulty datasets are examined. We also observe that when they return different results, Gnumeric is better than Excel 2007. It is noteworthy that Excel 2007 produced an unacceptable result for the SmLs09 dataset.

Linear regression
Calc, Excel, Gnumeric and NeoOffice provide the LINEST function for linear regression in every version/platform here assessed. Table 6 presents the LRE values of the least accurate coefficient β and residual standard deviation (RSD) provided by each platform.
In the following tables, "NA" denotes situations for which no numeric answer is returned by  the spreadsheet; a "#NUM!" is placed in the cell where the result is expected, probably due to the fact that huge values are involved. As

Nonlinear regression
Nonlinear regression was assessed with a suite of 27 datasets, transforming each regression into a nonlinear optimization problem, where the objective is minimizing the residual sum of squares. To achieve this, we used the general tool Solver.
The only spreadsheets that provide natively Solver with such capability are Calc 2.4.1 for Ubuntu, Excel 2007 and NeoOffice 2.2.5. The Solver from Excel 2007 provides two methods for solving nonlinear regression problems, namely Newton and Conjugate; they were both assessed.
By default, Calc 3.0.1 and NeoOffice 3.0 only provide a Solver for linear regression. An extension named Solver for Nonlinear Regression enabled the analysis on these spreadsheets. This extension can be found in the official site for extensions of OpenOffice.org (http://extensions.services.openoffice.org/), in a package provided by Sun Microsystems Inc. (who is also the holder of OpenOffice.org Calc copyright) under the LGPL license (for more information about this extension see http://wiki.services.openoffice.org/ wiki/NLPSolver). This extension provides two non-deterministic evolutionary algorithms: DEPS and SCO. DEPS (differential evolution particle swarm) is a hybrid multiagent approach based on the particle swarm optimization paradigm (Zhang and Xie 2003), while SCO (social cognitive optimization) is based on social intelligence and human learning mechanisms (Xie et al. 2002).
Most algorithms considered here accept adjustments, despite loading with default parameters. Nevertheless, as McCullough (1998) comments, adjusting these settings may result in better solutions. In the following we assess these techniques with both the default parameters and sets chosen to improve results.
Nonlinear optimization procedures usually require starting points, and Solver is no exception. Each dataset is provided by NIST with two starting points: the first (denoted 'Start 1' in Tables 7 and 9) is far from final solution and it represents the usual situation where the user has no prior idea as to where the solution is. The second (denoted 'Start 2' in Tables 8 and 10) is closer to final solution.

Using default settings
This situation describes the user who has no prior idea about both the problem, its solution  Tables 7 and 8 present the LRE of the least precise estimated coefficient in the regression; their sparseness clearly depicts how difficult this class of problems is for spreadsheets. In these tables '(M)', '(U)' and '(W)' denote the operating system, i.e., Mac OS, Ubuntu and Windows.
Situations in which negative LRE values were obtained are denoted "-". In order to illustrate the meaning of this notation, Figure 1 presents two examples from Table 8, which use Start 2. Figure 1a shows the results of fitting the BoxBOD dataset with the model y = β 1 (1 − exp{−β 2 x}) + ε; the curves show the results of using the certified values and those obtained with Calc 2.4.1, NeoOffice 2.2.5, Conjugate and Newton in Excel 2007. Clearly, these last results are meaningless, and their entries are denoted "-" in the Table. Figure 1b shows the Thurber dataset explained by the model with the certified values and with those obtained by the Newton and Conjugate algorithms. Although the estimates obtained by the Newton algorithm present negative LRE values, cf.
In the rest of datasets, the best results are obtained with DEPS and SCO, in that order.
As Table 8 shows, many improvements were obtained using Start 2. The most noticeable change was reached for BoxBOD dataset. Here, from negative LRE values achieved from Start 1, reliable results were computed with DEPS and SCO algorithms using Start 2. Regarding DEPS only, this also happens for Rat43. Newton presents improvements related to MGH09 and Eckerle4 datasets. For any dataset complexity, there are situations for which no algorithm is able to compute reliable results using the default algorithm parameters. DEPS presents, nevertheless, the best LRE values for almost all comparisons regarding nonlinear regression when using default settings.

Customizing Solver settings
The blind use of default settings, though highly undesirable, is often found in practice. For instance, Knight (2002) and Kaiser (2002) report erroneous results obtained by pollution researchers in North America and Europe, when using default settings in a regression model. They report the problem as a software glitch, when, in fact, it is not. We emphasize that there is no safety in using the default settings when applying regressions, especially nonlinear regressions.
In the following, we present the assessment of nonlinear regressions with customized settings. Each Solver has its own parameters, according to the algorithm used, so it is not possible to discuss a single set of parameters; parameters identified as having similar effect were set equal. Speed was not considered in the performance of the algorithms, only the numerical accuracy.
Firstly, we consider the parameters which are common to the two nonlinear algorithms available for Excel. In order to avoid, as much as possible, time or iterations limits we used 32, 767 seconds for max time (upper bound available by the software) and 2, 000 as upper bound for iterations. The convergence was set to 1E − 8 and, as recommended by this Solver documentation, we chose forward derivatives and tangent estimates. The rest of the parameters were kept as default.
Secondly, we customized the DEPS and SCO algorithms. Size of swarm and stagnation limit were set to 150. Bigger values were tested, but no LRE improvements were observed.
The stagnation tolerance has an analogy with the tolerance parameter of Excel, and then we used the same value in both cases, namely 1E − 8. Again, all other parameter were kept as default.
Calc 2.4.1 and NeoOffice 2.2.5 do not appear in this analysis because they are not customizable. Following the strategy conducted with the default settings, we present in Tables 9 and 10 the LRE of the least precise estimated coefficient in each regression. Once again, we used '(M)', '(U)' and '(W)' in these tables to denote the operating system.
In a general manner, from Start 1 and 2, the algorithms with customized parametrization yielded better results than those obtained with default settings. Tables 9 and 10 show that, from Start 1 and 2, twenty and twenty-three respectively of twenty-seven datasets satisfied the four correct significant digits of accuracy requested by the McCullough (1998) criterion.
As Table 9 shows, reasonable LRE values were obtained for low difficulty datasets, with the exception of Misra1a and Gauss2. We highlight that Newton is the only algorithm that dealt with these two datasets in a satisfactory way. DEPS algorithm obtained good LRE values for Gauss2 as well.
Lanczos3 is the only low difficulty dataset for which no significant digits were reached from Start 1 by any algorithm herein assessed.
Observing results achieved for low difficulty datasets from Start 2 in Table 10, we note that we better LRE values are obtained when compared with Start 1. Only problems with Lanczos3 dataset were observed.
A common issue from Start 1 and 2 is the poor precision obtained by the SCO algorithm. The biggest LRE values were obtained with the Newton algorithm for both starting points. DEPS is more robust than the rest of algorithms, as can be seen in Tables 9 and 10. Regarding average difficulty datasets, we note from Table 9 that three out of eleven datasets do not present satisfactory LRE values in the Start 1 case: MGH17, Lanczos1 and Lanczos2 (these two last did not produce a single correct digit). Though the overall results are improved with respect to those obtained without parameter tuning, cf. Considering Start 2 in Table 10, we still observe problems in Lanczos1 and Lanczos2 datasets. Concerning all datasets but these two, we obtained satisfactory results for six out of eleven datasets using DEPS algorithm, and six out of eleven using Newton. SCO algorithm reached acceptable results for three of these cases, and Conjugate only for two of them.

Calc Excel 2007 Excel 2008 Gnumeric NeoOffice Oleo
Results achieved with customized settings from Start 2 are roughly the upper bound of reliability that we can get using spreadsheets as nonlinear solvers.
A third starting point was also used with customized parameters: the certified value. It is useful to assess the algorithm's ability to identify when it has reached the solution. Conjugate and Newton (available in Excel) recognized when the current solution is optimal in all but with the Bennett5 dataset. DEPS (from Calc and NeoOffice) fails more often: 12, 13 and 9 times out of 27 datasets, in Windows, Mac OS and Ubuntu, respectively. It is noteworthy, though, that even failing to stop at the certified value, DEPS provides at least LRE = 7 in all datasets but Rat43, for which negative LRE values were obtained, and Bennett5, where one significant correct digit was computed.

Pseudorandom number generation
According to Ripley (1987Ripley ( , 1990) good pseudorandom number generators ought to provide sequences of numbers with the following properties: 1. they resemble outcomes from uniformly-distributed random variables, 2. vectors of moderate dimensions of those random variables are collectively independent, 3. repeatability from a few easily to specify parameters, the seed, in a wide variety of computational environments (hardware, operating system, programming language), 4. speed, 5. long periods.
Verifying the two first properties for a given sequence is tough, and a number of tests has been proposed in the literature. Marsaglia's Diehard tests (Marsaglia 1995) and the NIST Random Number Generation standard (Rukhin et al. 2008) are some of the tools available for such assessment.
From the user viewpoint, good documentation that may lead to informed decisions would suffice. Table 11 presents a summary of the documentation and setting seed availability of all the spreadsheets under assessment.
Calc's documentation provides no information about the algorithm implemented and provides no user function for setting the seed. The source code claims that it uses the C rand function (which has no default implementation specified in the ISO C standard -ISO/IEC 9899:TC2) in its RNG. The only information is that the function generates numbers between 0 and, at least, 32767. Therefore according to the implementation adopted into the library (stdlib.h) used to compile it, a new RNG may be produced yielding, thus, a non-portable function.
NeoOffice, which is based on OpenOffice.org, suffers from the same issue. From the user's perspective, Gnumeric does not provide any high-level means for setting the seed of the random number generator. The generator implemented by Gnumeric is the long-period Mersenne-Twister (Matsumoto and Nishimura 1998).
Microsoft claims that Excel 2003 and 2007 use an implementation of the Wichmann and Hill (1982) algorithm, but as shown by McCullough (2008a) this is not true. The same methodology was applied to Excel 2008, and we also concluded that whichever the algorithm implemented, it is neither the original Wichman-Hill nor the new version (Wichmann and Hill 2006). The Mac OS version does not provide any information about this implementation.
Oleo provides an undocumented function for the generation of integer values in the range 0, . . . , x, with x provided by the user. The only information about this function is available in Oleo's source code, and states that a "linear feedback shift register" is used. Once entered the command, the value is updated regularly at intervals from 1 to 10 seconds, at the user's choice. There is no high-level access to the seed.

Distribution functions
According to Knüsel (1995) the answer given by a program should be correct and reliable as it is printed out. Furthermore, according to Yalta (2008), good statistical software should be able to provide six or seven accurate digits for probabilities as small as 10 −100 .
Yalta (2008)  In this section we present the numerical accuracy of the main statistical distributions assessed by Yalta (2008), but restricted to those situations where difficulties were observed. With this approach we put in evidence the differences between the spreadsheets herein assessed.
The specific name and parameters of functions are the same in all platforms and versions.
Gnumeric provides another sets of functions based on the R platform (Almiron et al. 2009). We tested both sets, and no differences were observed in their numerical precision. Oleo was not assessed since it does not provide native resources for computing distribution functions.
The function BINOMDIST(k, n, p, cum = 1) computes the probability P{X ≤ k}, where X follows a binomial distribution with n trials and p the probability of success on each trial. In Gnumeric, the R-based function R.PBINOM(k, n, p) can be used for the same purpose. Table 12 shows that, for n = 1030 and p = 0.5, Excel 2007 cannot compute any digit correctly for k ∈ {1, 2, 100, 300}, whereas Calc, Excel 2008, NeoOffice and Gnumeric, provide good accuracy in all tested cases, including those in which probabilities are much smaller than 10 −100 .
POISSON(k, lambda, cum) computes P{X ≤ k} if cum = 1, and P{X = k} if cum = 0, where X follows a Poisson distribution with mean λ > 0. Gnumeric's R-based functions R.PPOIS(k, lambda) and R.DPOIS(k, lambda), for cum = 1 and cum = 0, respectively, can be used as well. For Poisson probabilities, we observe in   The function GAMMADIST(x, alpha, beta, cum) computes P{X ≤ x} if cum = 1, and the density at x if cum = 0, with X following a gamma distribution with shape α and scale β. Alternatively, Gnumeric's R-based functions are R.PGAMMA(x, alpha, beta) and R.DGAMMA(x, alpha, beta), for cum = 1 and cum = 0, respectively. As Table 14 shows, Calc, NeoOffice and Gnumeric present exact answers for all cases of cumulative distribution functions. Excel also presents good results except for x = 0.1, where no numerical answer is provided.
If X follows a standard Gaussian distribution, then NORMSINV(p) computes the value of x such that P{X ≤ x} = p, i.e., the p-quantile. Gnumeric offers the alternative R-based function R.QNORM(p, 0, 1). Table 15 shows that Excel has serious problems for values of p < 10 −198 , whereas Calc, NeoOffice and Gnumeric return exact answers whenever p ≥ 10 −300 .
If X obeys a χ 2 distribution with n degrees of freedom, then the function CHIINV(p, n) computes the value of x such that P{X > x} = p. Alternatively, the Gnumeric's R-based    Table 16).
If X has a beta distribution with parameters α and β, then BETAINV(p, alpha, beta) computes the value of x such that P{X ≤ x} = p, i.e, the p-quantile. The R-based function in Gnumeric is R.QBETA(p, alpha, beta). As Table 17 shows, for values of p < 10 −3 Excel is unreliable, while for p < 10 −8 it returns values very far from the correct answers ("-" denotes   If X follows a Student's t distribution with n degrees of freedom, then TINV(p, n, cum=1) computes x such that P{|X| > x} = p. Since the certified values for this law are righttailed probabilities, we computed TINV(2p, n). This can be computed in Gnumeric with the R-based function R.QT(p, n, 0) as well.

Concluding remarks and warnings
This work presented the assessment of well known spreadsheets widely used for statistical computing, among other applications. They were assessed in three areas: estimation, random number generation and statistical distributions.
Regarding univariate summary statistics, Excel 2008 and NeoOffice 2.2.5 are not able to compute first-lag autocorrelation consistently with acceptable precision, being Gnumeric the best choice for sample mean, standard deviation and first-order autocorrelation. Oleo should only be used when no other spreadsheet is available, e.g. when no windows system is at hand. If that is the case, the user should bear in mind that its precision is quite limited.
The only spreadsheets that offer a function for computing the F statistic in one-way ANOVA are Excel 2007 and Gnumeric. They gave accurate results in low and average difficult datasets, whereas for high difficult datasets they presented problems, with Excel 2007 providing worse results.
There is no single spreadsheet able to provide consistently good results when the measure of quality is the number of correct significant digits of the least precise coefficient in linear regression problems. Excel performs well except for Wampler5, for which it provides unacceptable results when compared with the ones computed by Gnumeric. Calc and NeoOffice provide acceptable results in most situations, but with serious problems for the Filip dataset.
When the accuracy of RSD is the measure of quality, one may stick to Excel (mainly in its 2007 version) as the sole spreadsheet for linear regression.
Nonlinear regression was assessed in four situations: using the default settings and fine tuning the parameters, with two starting points for each one, namely, close and far from the certified solution. Using the default settings and both starting points, though some algorithms presented good results in more than half datasets, serious problems were observed with the rest. The DEPS algorithm presented the best success/failure rate, but it did not produce reliable results in more than 30% of the cases; therefore, there is not enough evidence to advise its use for nonlinear regression. Moreover, Calc 2.4.1 and NeoOffice 2.2.5 should not be used for this purpose. Calc 2.4.1 and NeoOffice 2.2.5 do not provide means for adjusting the algorithms parameters, a major drawback for serious users. An overall better behavior is observed with tuned parameters, and differences are noticed between starting points. Regarding the remaining spreadsheets, at least one (two, resp.) digit(s) of accuracy for the first (second, resp.) starting point is (are) not achieved in only four datasets. DEPS again achieves the best results with more than 70% of reliable results, followed by Newton; Conjugate fails in more than 70% of the assessments, while SCO fails in more than 65% (50%, resp.) with the first (second, resp.) start point. When starting in the optimal solution, Conjugate and Newton fail only once in identifying that they already are at the solution. Therefore, there is still not enough evidence to advise their use for nonlinear regression, since they fail in almost 30% of the cases, in the best situations.
Whichever the spreadsheet and platform, issues regarding documentation and/or repeatability prevent their use in serious statistical procedures that employ simulation.
Concerning distribution functions, cases that pose difficulties, identified by Yalta (2008) There are some variations of LRE values in the same platforms when evaluated in different operating systems and architectures. However, those variations are not significative, with the exception of Excel 2008 and Calc 3.0.1 for Mac OS. The former presents improved LRE values for binomial distribution, but worse results when calculating the sample standard deviation. In Calc 3.0.1, the Mac OS version presents significantly worse results for t and F distributions than its Ubuntu and Windows versions.
On the other hand, improvements were observed for some distribution functions. Calc 3.0.1 and NeoOffice 3.0 enhanced their precision, when compared to previous versions, regarding the χ 2 and beta distributions.
Summarizing the main points herein discussed, it is not recommended to use any spreadsheet for nonlinear regression and/or Monte Carlo experiments. Excel incurs in the very same errors of older versions detected, among others, by McCullough (2008b); McCullough and Heiser (2008);McCullough (2004);Nash (2008). Nash (2008) claims that no spreadsheet should be used in classroom, especially for teaching statistics.
Concerning graphics production, McCullough and Heiser (2008) and Su (2008) argue that the default charts produced by Excel do not promote data comprehension, and may cause problems in their understanding. No assessment of the quality of charts produced by spreadsheets other than Excel was found in the literature, and this is a venue for future research.
Regarding costs, spreadsheets distributed under the terms of the GPL/LGPL, namely, Calc, Gnumeric, NeoOffice and Oleo, can be freely used and distributed. Excel's license has to be purchased.
Finally, as a rule of the thumb, every user should be aware that spreadsheets have serious limitations. Other platforms are advisable, being currently R the most dependable FLOSS (Free/Libre Open Source Software, see Almiron et al. 2009).