Spreadsheets in the Cloud – Not Ready Yet

Cloud computing is a relatively new technology that facilitates collaborative creation and modification of documents over the internet in real time. Here we provide an introductory assessment of the available statistical functions in three leading cloud spreadsheets namely Google Spreadsheet, Microsoft Excel Web App, and Zoho Sheet. Our results show that the developers of cloud-based spreadsheets are not performing basic quality control, resulting in statistical computations that are misleading and erroneous. Moreover, the developers do not provide sufficient information regarding the software and the hardware, which can change at any time without notice. Indeed, rerunning the tests after several months we obtained different and sometimes worsened results.


Introduction
Spreadsheets are multipurpose tools commonly used for performing all sorts of computations, including those that involve statistics and data analysis.These programs are based on presenting and manipulating data via an intuitive two-dimensional matrix interface, which is arguably the main reason for their tremendous popularity.A new trend that can potentially make spreadsheet software even more indispensable is the emergence of cloud based alternatives.These are online applications that offer innovative features such as simultaneous multi-user collaboration, real-time data updates from remote sources, as well as improved security and accessibility.
As promising as they may be, an implicit assumption behind the new cloud spreadsheets is that they perform accurately; that is, that users can trust them.In our extensive experience with assessing the accuracy of statistical software in general, and of PC-based spreadsheet software specifically, we have not found such assumptions to be true.Despite their popularity, PC-based spreadsheets cannot be trusted to perform statistical calculations correctly (Almiron, Lopes, Oliveira, Medeiros, and Frery 2010;Knüsel 2002;McCullough 2008b,a;McCullough and Heiser 2008;McCullough and Wilson 1999, 2002, 2005;Yalta 2007Yalta , 2008;;Yalta and Jenal 2009).A notable exception discussed by McCullough (2004a) is the open-source spreadsheet program Gnumeric (The Gnumeric Team 2012).
Spreadsheet use is moving to the cloud; the cloud office suites offered by Google and Microsoft claim 25 and 20 million users, respectively (Press Release Point 2010).As spreadsheets move to the cloud, it is not unreasonable to expect that cloud users will perform myriad statistical operations using cloud spreadsheets.Indeed, Yalta (2008) performed a cursory examination of Google Spreadsheet (Google Inc. 2012), reported "gross errors," and recommended a detailed evaluation to help researchers and practitioners make the decision whether to move from the PC to the cloud.This raises the obvious question, "Can these cloud spreadsheets be trusted to perform calculations correctly?"The answer may very well be "no." Here we present results from thorough testing of three leading cloud spreadsheets namely Google Spreadsheet, Microsoft Excel Web App (Microsoft Corporation 2012), and Zoho Sheet (Zoho Corporation 2012).Our methodology is based on applying Wilkinson's Statistics Quiz (1985) for assessing statistical functionality in general, and using the Knüsel (1989) approach for auditing statistical distributions in particular.The results show that these cloud spreadsheets cannot currently be trusted to perform statistical operations with accuracy.
It is important to understand that these are not "gotcha" tests designed simply to trip up software developers.These are carefully designed tests designed to inform the user about the quality of the software.As McCullough (2004b) wrote about Wilkinson's Tests: "The flaws they are designed to expose have well-known solutions.That is, these are tests which any package could pass.If a software package fails a particular test, there exists a known method of obtaining the correct answer."The same is true of the tests of the accuracy of the statistical distributions.

"Statistics Quiz"
Statistics Quiz (Wilkinson 1985) is a well-known collection of test problems that are designed to expose flaws in statistical software.It comprises 20 tests (some with sub-parts) in six areas: (I) reading an ASCII file; (II) handling real numbers; (III) handling missing data; (IV) regression; (V) analysis of variance; and (VI) operating on a database.The tests are based on a small and effective data set called "Nasty."As can be seen in Table 1, all of the numbers in nasty.datare within the range of representable numbers for 32-bit double precision.The resulting tests aim to provide a standardized approach to expose common flaws for which there are well-known and easily implemented solutions.Thanks to its efficiency, Statistics Quiz has been employed by various authors such as Sawitzki (1994); Bankhofer and Hilbert (1997);McCullough (2004b); Choi and Kiefer (2005); Yalta and Yalta (2009).It has been important in the detection and correction of dozens of errors in statistical packages as well as spreadsheet software.
We applied Statistics Quiz to the following cloud spreadsheets: Google Spreadsheet, Excel Web App, and Zoho Sheet.We skipped Test I (A, B), Test II-E, Test V (A, B, C), and Test VI (A, B), which are not relevant and/or not applicable with spreadsheet software.The tests were performed between May 2-6 2011, and using an Intel Core i7 2.80GHz desktop computer The results for the 12 applied tests are as follows.

Test II-A
In this test, ROUND is printed with only one digit and the expected correct answer is the numbers from 1 to 9. Depending on the language compiler used, this operation is sometimes performed automatically using the "round-to-even" method, resulting in unacceptable output such as, for example, R(1.5) = R(3.5)= 2.In addition, the following expressions are checked: where the correct answers are 18, 0, and 1 respectively.

Conclusion:
All three packages pass this test.

Test II-B
This test involves plotting HUGE against TINY, BIG against LITTLE, and X against ZERO.From looking at the data, it is obvious that the first two plots should be a straight line with a 45 degree angle, and the third plot should be a vertical line.Zoho Sheet passes this test with all the graphs looking good except the disappearing tick-marks in the horizontal axis in BIG against LITTLE.The test is not applicable to Excel Web App, which does not support creating and editing graphs online.Finally, Google Spreadsheet fails this task due to plotting BIG against LITTLE as a horizontal line.The other 2 graphs are passable.Figure 1 presents the BIG against LITTLE plots produced by Zoho Sheet and Google Spreadsheet respectively.

Test II-C
Here, the mean and the standard deviation are computed for all of the series.The expected result is the fifth value as the mean for all variables.Also, the standard deviations should be 2.738612788 times 10 to a power for all variables except ZERO and MISS.The correct answer is 0 for ZERO, and either "undefined" (#DIV/0!) or "missing" (#N/A) for MISS.
Excel Web App and Zoho Sheet return the correct values.Google Spreadsheet fails with standard deviations accurate to only 1 significant digit.These results are presented in Table 2.
Traditional 32-bit double precision is more than sufficient to produce the correct answer, and accurate methods for computing this quantity have long been known (Chan, Golub, and Leveque 1983).We have no idea how Google is performing this calculation.

Conclusion:
Zoho Sheet and Excel Web App pass, Google Spreadsheet fails.

Test II-D
Calculate the correlation matrix of the variables.This should be unity for all correlations except those involving ZERO and MISS.Mathematically (and computationally), correlations should be between −1 and 1.Yet, Google Spreadsheet manages to compute the correlation between X and BIG as 1.19 and the correlation between BIG and itself as 1. obvious that Google Spreadsheet is using a bad algorithm or has not properly programmed a good algorithm.

Conclusion:
Zoho Sheet and Excel Web App pass, Google Spreadsheet fails.

Test II-F
Regress BIG on X.The intercept should be 99999990 and the slope should be unity.All of the packages reproduce this result.Curiously, though, Zoho Sheet manages to compute sum of squared residuals = −13, R 2 = 1.28, and F ratio = −32.31.These should be 0, 1, and either "undefined" (#DIV/0!) or "missing" (#N/A) respectively.We would be interested to know what formula Zoho Sheet is using to compute these statistics.
Conclusion: Zoho Sheet fails, Excel Web App and Google Spreadsheet pass.

Test III-A
This test involves the transformation where the acceptable answers are either 2 or "missing" (#N/A).We performed this with the command =IF((VALUE=3);1;2) using MISS as the reference for VALUE.

Conclusion:
All three packages pass this test.

Test III-B
This test involves the computation which has the correct answer <missing> since one cannot add 1 to something that is missing.We performed this computation using the command =IF((ISNA(VALUE)=TRUE);(D3+1);) using MISS as the reference for VALUE.

Conclusion:
All three packages pass this test.

Test IV-A
In this test, X is regressed on a constant plus the variables X 2 through X 9 in view of the fact that an unstable regression function often cannot handle all of these polynomials.Because the objective of this pragmatic test is the overall regression, the primary focus is on checking whether all of the standard errors are 0 and R 2 is unity. It

Test IV-B
Here, X is regressed on X to see whether the program runs without complaining about this perfectly valid operation, and returns the obvious solution X = 0 + 1X with R 2 = 1 and F 1,7 = undefined (#VALUE!). In

Test IV-C
Regress X on BIG and LITTLE.Since all the variables are perfectly collinear, the design matrix is singular and an infinite number of solutions exist.Accordingly, the program should produce an error message, preferably one that diagnoses and warns of the singularity.Mathematically, the formula is X = c + b 1 BIG +b 2 LITTLE.The "solutions" returned by the three packages are presented in Table 3: three packages, three answers, all of them wrong.On a historical note, we mention that not only did Excel 4.0 released in 1992 fail this test (Sawitzki 1994), but that Excel Web App released in 2009 still fails it, and so does Excel 2010.

Conclusion:
All three packages fail this test.

Test IV-D
In this test, ZERO is regressed on a constant and X.The acceptable behavior is either the program notifies that ZERO has no variance, or reports regression output showing ZERO = 0+0X with total sum of squares equaling zero.
Again, for Web App and Zoho Sheet, we run this test by first setting it up offline on a PCbased spreadsheet, and then uploading it to the cloud.None of the packages warns about the dependent variable, however, the regression output of the three programs shows the expected values.

Conclusion:
All three packages pass this test.

Statistical distributions
Depending on the field or industry, spreadsheet users may make extensive use of statistical distributions.That is why spreadsheets offer so many distributions, not just a few.On the other hand, these are important decision making tools and it is important that they be accurate or not offered at all.In today's computing standards, accuracy means that any program offering statistical distributions be able to reliably compute tail probabilities as small as 10 −100 with at least 6 significant digits.This is perfectly possible on commodity hardware, and there are known and open source algorithms that can provide such precision (Yalta 2008;Bangalore, Wang, and Allison 2009).
The de-facto benchmarking tool for assessing the accuracy of statistical distributions is the program ELV (Knüsel 1989).Like Statistics Quiz, ELV has been employed in the past by numerous studies such as McCullough and Wilson (1999, 2002, 2005); Knüsel (1995Knüsel ( , 1998Knüsel ( , 2002Knüsel ( , 2005)); Bustos and Frery (2006); Yalta (2008).We use the ELV program to assess the reliability of a number of statistical distributions for which a dedicated function is available in Zoho Sheet, Excel Web App, and Google Spreadsheet.The function names are the same in the three spreadsheets where available, however, the algorithms are apparently different.Also, for the binomial and χ 2 distributions, Zoho Sheet offers alternative, more flexible functions named B and CHISQDIST respectively.We do not assess these two functions and consider only the Excel compatible versions that are automatically used in all imported documents.
As can be seen in Table 4, for lower tails of the standard normal distribution, all three packages can return seriously misleading results with 0 digits of accuracy for not so small probabilities.
Google Spreadsheet can even give negative p values.
Google Spreadsheet does not offer the Student's t, χ 2 , or F distributions.For t and F , Zoho Sheet can return 0 for not so small probabilities (on the order of 1E−5).It can also produce misleading results with only 2 digits of accuracy.Excel Web App appears to be accurate for the three distributions.Google Spreadsheet does not offer the computation of any inverse distribution function except for the normal distribution, which seems to be accurate.Zoho Sheet offers an inverse function corresponding to its statistical distributions, however, these are all unstable and can erroneously return results such as 9999.99995,99999999, and 0.00000001 for different distributions.For Excel Web App, the quantiles of the various distributions seem to be accurate, however, it is also noticeable that the computations can be different from those reported offline by Excel 2010.
The above results can best be illustrated with the inverse F distribution.As Table 5 shows, Zoho Sheet and Excel 2007 both have difficulties in the same place, starting at 1E−5.Excel 2010 does only a bit better than Excel 2007.What is surprising is that the answers one gets from a Microsoft spreadsheet depend on whether the spreadsheet is opened in Excel 2010 or Excel Web App.This does not bode well for the idea of portable spreadsheets, even within the Microsoft family.How distressing it would be for a user to find that his results depend on whether his software is PC-based or in the cloud.

The issue of accountability
An important issue with cloud based software is in the department of accountability.In a setting where the web browser acts merely as a terminal window to display results computed somewhere in the cloud, the user often has no information on computational details or whether the results can be replicated in the future.Moreover, the user also has no control over the software and the hardware, which is subject to change by the provider at any time and without warning.For obtaining help and support, there are various online resources and services such as official blogs and help forums, however, these services are informal in nature and the information they provide is often dispersed and lacking in terms of technical details.1  After an extensive scrutinization of the various available online resources, we came to understand that the three cloud spreadsheets analyzed in this study come neither with a version number nor any information that we could find regarding the machine specifications and the operating system used.We also realized that, over the weeks after our initial testing, major revisions have taken place in all of the three programs and there is no method of replicating any earlier result.In order to assess the practical consequences of the current state of this way of supplying tools for data analysis, we reapplied the tests on March 1-2, 2012.
Here is what we found: Zoho Sheet now lists LINEST() among the available functions.From the change log, we understand that this feature has been implemented in August 2011.On the other hand, as can be seen in Figure 2, Zoho Sheet can no longer correctly produce the BIG against LITTLE plot in Test II-B.We do not know when this has happened.The change log announces chart "enhancements" in both August 2011 and January 2012, however the explanations supplied are regarding cosmetic changes such as the colors and the fonts used.
Excel Web App can now create/edit graphs online, and it passes Test II-B by correctly producing all of the three plots including the relatively more difficult BIG against LITTLE plot shown in Figure 2. It also now gets the correct probability for z = −8.5 for the standard normal distribution.On the other hand, Excel Web App now fails Test II-F by computing 4.48E−16 and 7.96E−17 for the standard error of β0 and β1 , both of which should be 0. Also the sum of squared residuals is reported as 2.66E−30, not 0. According to the official blog, the revisions leading to these discrepancies have likely took place in a general update in September 2011, which announces "a lot of behind-the-scenes improvements" in Office Web Apps.
Apparently, since our initial tests in May 2011, Google Docs has also received various updates and there is now a "new version" of Google Spreadsheet.We were unable to find out what version this is, however, according to the support website "An easy way to tell if you're using the new version is if there's a ruler above the editing space.(Just be sure the ruler is enabled from View > Show ruler)" From the official Google Docs blog we infer that the switch to the "new version" has taken place in October 2011.

Conclusion
The relatively new cloud spreadsheets bring important innovations such as real-time data updates and the ability to simultaneously view and make changes to documents by multiple users.However, these programs are first and foremost designed for ease of use, and the fact that scientists constitute only a small fraction of their target audience creates important concerns in the department of accuracy and replicability.Offering statistical functions in a computer program is a serious matter and it is necessary that such functions are extremely accurate and reliable, or not offered at all.After all, these are important tools used for inference and decision making.Indeed, Panko and Ordway (2005) provide ample evidence that the vast majority of financial and business decisions, including some that have caused much havoc for world markets, are in some way made using spreadsheets.Poor statistical tools offered for the sake of compatibility or feature parity with other spreadsheets will do nothing but increase computational garbage and information pollution.Wilkinson's tests (1985) and Knüsel's approach (1989) are entry level test procedures commonly used to assess statistical functionality in general and statistical distributions in particular.We have employed these two methods to perform an all-around assessment of three leading cloud-based spreadsheets namely Google Spreadsheet, Microsoft Excel Web App, and Zoho Sheet.The summary of the results are presented in Table 6.As can be seen, there are errors and/or deficiencies in all of the three programs and in such areas as univariate and multivariate summary statistics, visualization, regression, and statistical distributions.It is particularly striking that Zoho Sheet fails about half of the tests while Google Spreadsheet has more failures than passes.Also, despite being under development for about 25 years, and the numerious earlier reports pointing out the accuracy issues, Excel still has failures in these simple tests, both offline and in the cloud.Moreover, the output can differ depending on whether one uses the PC-based Excel 2010 or the cloud based Excel Web App.Our findings are also consistent with Keeling and Pavur (2011), who perform a detailed testing (including a subset of Wilkinson tests) of Google Spreadsheet along with five other PC-based spreadsheets and report that "it is not recommended for statistical analysis beyond simple averages." An important concern regarding the cloud spreadsheets is that the user has limited or no information on the software and the hardware, which are subject to change by the provider at any time and without warning.Indeed, rerunning the tests after several months, we obtained the different results shown in parentheses in Table 6.These changes indicate that, in the absence of version numbers as well as sufficient information on the computing environment, these programs can not be used for anything that requires verification, such as most business functions.
Based on the findings, it is our understanding that the developers of cloud-based spreadsheets have not been performing basic quality control and providing sufficient documentation, and that the results of their statistical calculations cannot be trusted.Whether the user can expect the developers to correct these problems is a legitimate concern.Zoho and Google have no track record when it comes to fixing errors in spreadsheets, but the errors they have made demonstrate a complete lack of knowledge concerning the literature on the accuracy of statistical software, in particular, spreadsheets.Microsoft has a long track record of failing to fix errors in the PC version of Excel, including Excel 2010 (Knüsel 2011;Mélard 2011).
Whether Microsoft can fix errors in its cloud spreadsheet without having fixed errors in its PC spreadsheet we are doubtful.We leave it to the reader to decide whether other calculations performed by cloud-based spreadsheets can be trusted.

Figure 1 :
Figure 1: Wilkinson's tests II-B results for Zoho Sheet and Google Spreadsheet.

Figure 2 :
Figure 2: March 2012 Test II-B results for Zoho Sheet and Excel Web App.

Table 2 :
Standard deviation calculations for Google Spreadsheet; inaccurate digits in bold.

Table 3 :
is important to note that, in spreadsheets, this type of problem usually is handled by creating an array formula such as LINEST() in Excel.Google Spreadsheet permits this operation, but Excel Web App does not allow the creation of array formulae.However, if an Excel spreadsheet with this operation is uploaded, Excel Web App computes the answer.Zoho Sheet does not even list LINEST among the available functions, yet computes results if a spreadsheet with LINEST is uploaded.Results for Test IV-C.X 9 .For Excel Web App and Zoho Sheet, setting up the test first offline and then importing into the programs reveals that Excel Web App passes this test.Zoho Sheet, however, fails to compute the standard errors, returning the error #VALUE!instead.Conclusion:Excel Web App passes, Zoho Sheet and Google Spreadsheet fail.
Google Spreadsheet fails to run this regression and returns #REF! as a misleading error message.Adding the variables one at a time, we understand that the program cannot handle this case, Google Spreadsheet and Zoho Sheet returns the correct values for each statistic while Excel Web App fails by returning F 1,7 =1.58E+032.Unlike other programs, Excel Web App also does not round to zero and returns the values 8.88E−016, 7.96E−017, 4.48E−016, 2.66E−030 for the constant, standard errors, and the sum of squared residuals respectively.
Conclusion: Zoho Sheet passes, Excel Web App fails, Google Spreadsheet passes.

Table 4 :
Results for the standard normal distribution.
The discrete probability distributions binomial, hypergeometric, and Poisson are available in Google Spreadsheet, however, none of these is reliable.For Poisson, Google Spreadsheet can

Table 5 :
Results for inverse F (1, 1) with probability = p.return misleading results for not so small probabilities.It can even give ∞ as p values.It also cannot calculate when the trial number n > 136 and the finite population size N > 175 for the binomial and the hypergeometric distributions respectively.Zoho Sheet and Excel Web App appear to be accurate for these three distributions.

Table 6 :
and a change log; which are located at https://forums.zoho.com/,http://blogs.zoho.com,and https: //sheet.zoho.com/featuresrespectively.Google Spreadsheet offers a help forum, an official Google Docs Forum, and a support website; which can be found at http://groups.google.com/a/googleproductforums.com/forum/#!forum/docs,http://googledocs.blogspot.com/,and http://support.google.com/docs .Results summary for Wilkinson tests and statistical distributions.The changes after March 2012 tests are shown in parentheses.