Datasets on the statistical properties of the first 3000 squared positive integers

The data in this article are as a result of a quest to uncover alternative research routes of deepening researchers’ understanding of integers apart from the traditional number theory approach. Hence, the article contains the statistical properties of the digits sum of the first 3000 squared positive integers. The data describes the various statistical tools applied to reveal different statistical and random nature of the digits sum of the first 3000 squared positive integers. Digits sum here implies the sum of all the digits that make up the individual integer.


a b s t r a c t
The data in this article are as a result of a quest to uncover alternative research routes of deepening researchers' understanding of integers apart from the traditional number theory approach. Hence, the article contains the statistical properties of the digits sum of the first 3000 squared positive integers. The data describes the various statistical tools applied to reveal different statistical and random nature of the digits sum of the first 3000 squared positive integers. Digits sum here implies the sum of all the digits that make up the individual integer.
& 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). This technique of analysis can be used in data reduction. The data analysis can be applied to other known numbers. The data when completely analyzed can help deepen the understanding of the random nature of integers.

Data
The data provides a description of the statistical properties of the digits sum of the first 3000 squared positive integers and the subsets. The subsets are the even and odd positive integers. The subsets are equivalence and their descriptive statistics are summarized in Figs. 1-3:

Experimental design, materials and methods
The digits sum or digital sum of integers has been a subject of interest because of its application in cryptography, primality testing, random number generation and data reduction. Details on the origin, theories and applications of the digits sum of squared positive integers, integers and other important number sequences can be found in . Recently digits sum and digital root have been applied in the analysis of lotto results [29].

Exploratory data analysis
The true nature of the percentiles are shown using the Harrell-Davis quantile which is a better estimator and a measure of variability because it makes use of the data in totality rather than the percentiles that are based on order statistics. The Harrell-Davis quantile of the digits sum of square of positive integers is shown in Fig. 4.
Bootstrap methods are useful in construction of highly accurate and reliable confidence intervals (C.Is) for unknown and complicated probability distributions. The data for was resampled many times and C.Is was generated for the mean and the standard deviation. Bootstrap results varied slightly with the observed mean and standard deviation and convergence occurs as the confidence level increases. These are shown in Tables 1 and 2: The bootstrap estimate of the mean is closed to the observed one. However, the median remained unchanged. This is an evidence of the robustness and the resistant nature of the median against undue influence of outliers. This is also in agreement with the bootstrap confidence limits. The summary is shown in Table 3.  Table 4.
The boxplot is an exploratory data analysis tool used to display graphically, the quantiles of a given numerical data. Outliers or extreme values are easily precipitated from the data and displayed graphically. The boxplots of the digits sums of squared positive integers and their subsets are shown in Fig. 5:        The data is slightly skewed to the left for the three cases with some outliers appearing in the case of the total. As the sample size increases, the frequency of the occurrence of the numbers below mean reduces and more outliers can also be obtained. On the other hand, more numbers are expected to appear as the sample size increases.
Particular patterns can be depicted through the use of individual value plots of observations. Some unique patterns were obtained for the even, odd and total squared positive integers. This is shown in Figs. 6-8:  The mean plot and median plot are shown in Fig. 9a and b. The mean plot showed the behavior of the mean. This is almost the same result by the bootstrap and bootstrap confidence intervals. As excepted the median plot is an indication of the robustness of the median.
Winsorizing and trimming are two ways of achieving robustness. The robustness of the central tendency (mean) of the digits sum of the first 3,000 squared positive integers was considered. These are shown in Figs. 10 and 11.
The data is robust because the possibility of obtaining outliers or extreme values decreases as more values are expected to cluster around the mean. As the sample size increases, the extreme values become fewer. In the case of trimming, the same result is obtained since there are few extreme values to exclude from the analysis.

Curve estimation
There are few curve estimation models that are available in fitting a given data. The result of fitting the digits sum of the first 3000 squared positive integers using the models is shown in Table 5.

Probability distribution fit
Digits sum of the first 3000 squared positive integers is best fitted by Cauchy distribution and the details are shown in Table 6. This was done using EasyFit software.

Mathematical computational results
The raw data of sum of the digits square of the first 3000 integers can be used to generate another set of numbers by finding the absolute value of the difference of two consecutive numbers and the total data generated is the initial data minus 1. The process was repeated until the mode and the median was equal to one. This is because any further step(s) add little or no effect to the analysis and also to save computational time. Normality is reduced by the process as evidenced by the increase in kurtosis and skewness. This is shown in Table 7.

Remark:
The low values of the R and adjusted R square indicate that the models barely fit the data and can give misleading results when used in prediction. Moreover, the power model provides the best fit and the inverse model provides the worst fit.