Properties of Gompertz data revealed with non-Gompertz integrable difference equation

Abstract: Behaviour of the upper limit estimated by an unsuitable model (the logistic curve model) was mathematically analysed for data on an exact solution of the Gompertz curve model with integrable difference equations. The analysis contributes to identifying a suitable model because the behaviour is independent of noise included in actual data. A suitable model is indispensable for correct forecasts. The following results were proved. The estimated upper limit monotonically increases as the data size increases and converges to the upper limit estimated with the suitable model (the Gompertz curve model) as the data size approaches infinity. Therefore, the upper limit estimated with the logistic curve model is smaller than that estimated with the Gompertz curve model.


PUBLIC INTEREST STATEMENT
The Gompertz curve model has been used for forecasting in various areas. Accurate forecasting requires not only accurate parameter estimation but also appropriate model selection. This paper contributes to selecting an appropriate model because it provides identification of Gompertz data. It analyses the behaviour of the saturation level estimated by the logistic curve model as an inappropriate model when the data size increases. It proves the following results. The estimated saturation level monotonically increases as the data size increases and converges to the saturation level estimated with the appropriate model as the data size approaches infinity. Therefore, the saturation level estimated with the logistic curve model is smaller than that estimated with the Gompertz curve model. This discrete analysis is not made possible by an ordinary forward or central difference equation but by difference equations that have exact solutions; the exact solutions are on the exact solutions of the differential equations.

Introduction
The Gompertz curve model has been used as a growth curve model for forecasting in various areas of applied research such as marketing (Bemmaor, 1992;Franses, 1994aFranses, , 1994bKaldasch, 2011;Meade, 1984), economics (Figueira, Moura, & Ribeiro, 2011), management (Rza̧Dkowski, Glażewska, & SawIńska, 2015), the construction industry (Zhang, Wang, Yang, Zhang, & Lv, 2014), and software reliability engineering (Ohishi, Okamura, & Dohi, 2009;Pavlov, Spasov, Rahnev, & Kyurkchiev, 2018;Satoh, 2000;Yamada, 1992) since it was proposed for specifying a mortality law in actuarial science (Gompertz, 1825). Forecasting with the Gompertz curve model is accomplished by estimating parameters in the model from an obtained dataset. The Gompertz curve model has to be suitable for the obtained dataset when it is used for forecasting because forecasting using an unsuitable model can result in seriously incorrect forecasts (Chu et al., 2009;Martino, 2003;Yamakawa, Rees, Salas, & Alva, 2013). The Gompertz curve model is difficult to identify as suitable for a given dataset from goodness-of-fit measures because regression analysis with a forward or central difference equation cannot recover parameters completely even when data are picked from the exact solution of the suitable differential model. Solutions of the difference equations are generally different from those of differential equations. Parameter estimates by a non-linear least squares estimation may not converge or may be sensitive to starting values, and algorithms may not provide a global optimum. Although the identification from goodness-of-fit measures is relatively correct in the final phase, forecasting in the final phase has little meaning. Forecasting is more variable in the earlier phase, where it has much more meaning.
Forecasting using an integrable difference equation that has an exact solution enables us to forecast accurately in the early phase (Satoh, 2000(Satoh, , 2001Yamada, Inoue, & Satoh, 2002). Integrable difference equations (Hirota, 1979(Hirota, , 2000Hirota & Takahashi, 2003;Morisita, 1965) and their application for forecasting (Satoh, 2000(Satoh, , 2001Satoh & Uchida, 2010;Satoh & Yamada, 2001Yamada et al., 2002) have been studied. A solution of an integrable difference equation plots discrete values on the exact solution of the original differential equation. General difference equations such as forward and central difference equations do not have this property. The forecasting using an integrable difference equation is simple because it uses regression analysis. Regression analysis with an integrable difference equation recovers parameters completely when data are picked from an exact solution of a suitable differential model (Satoh, 2000(Satoh, , 2001Satoh & Yamada, 2001. Estimated parameters change as the data size increases due to noise included in actual data even when a suitable model with an integrable difference equation is applied to the data. These changes depend on the noise. Parameters estimated by an unsuitable model change as the data size increases, too. However, these changes depend on not only noise but also the unsuitable model itself. If behaviour of parameters estimated by an unsuitable model can be analysed for data on an exact solution of a suitable model (that is, data without noise), the analysis will enable us to identify the suitable model because the analysis will be independent of noise.
A mathematical analysis using an integrable difference equation ("applied discrete systems") is possible thanks to an exact solution of a difference equation. An integrable difference equation enables us to conduct regression analysis for a mathematical formula and regression coefficients as a mathematical formula even though we usually conduct regression analysis for numerical data and obtain regression coefficients numerically. Applied discrete systems enable us to analyse the behaviour of parameters estimated by an unsuitable model, whereas a non-linear least squares estimation cannot analyse the behaviour of parameters estimated by an unsuitable model algebraically because it is a numerical analysis.
This paper reveals properties of the Gompertz data with the logistic curve model (Hirota, 1979;Morisita, 1965; as an unsuitable model. The logistic curve model is also used for forecasting in various areas. If a given dataset does not have the properties of the Gompertz data, forecasting using the Gompertz curve model can result in seriously incorrect forecasts. If the dataset has the properties, the Gompertz curve model is one of the best models that describe the data.
The rest of this paper is structured as follows: Section 2 presents an exact solution of the Gompertz curve model. Section 3 presents discrete logistic curve models that have exact solutions as unsuitable models to reveal properties of data on the exact solution of the Gompertz curve model. Section 4 derives a regression equation from the discrete logistic curve models with data on the exact solution of the Gompertz curve model. Section 5 states three lemmas and two theorems. The first theorem is that an estimated upper limit increases monotonically as the data size increases, which occurs as time elapses. The second theorem is that the estimated upper limit converges to the real upper limit, i.e. that estimated by the suitable model when the data size approaches infinity. Section 6 describes the practical use of the theorems for data that include noise. Finally, Section 7 concludes the paper with a summary.

Gompertz data
The Gompertz curve model is described as the following differential equation: where x ðtÞ is a cumulative number up to time t. By integrating Eq.
(1) and assuming that xð0Þ ¼ ka, xðtÞ is written as where a; b c ; and k are parameters estimated through regression analysis. Parameter k is an upper limit, parameter a is xð0Þ=k, and parameter b c is log xð1Þ k = log a. Eq. (1) is discretised as The exact solution of Eq. (3) is The exact solution of the difference equation (4) is on that of the differential equation (2) when b c is regarded as 1 þ δ log b. We call the data "Gompertz data" if the data satisfy Eq. (4). A regression equation for estimating the parameter values is obtained from Eq. (1) as Parameters k, a, and b are estimated from regression analysis aŝ whereâ,b,k,Â n , andB n are estimates of a, b, k, A n , and B n . Note that Y i in Eq. (6) is independent of time-interval δ in Eq.

Discrete logistic curve model
The logistic curve model is described as the following differential equation: where x ðtÞ is a cumulative number up to t. By integrating Eq. (11), xðtÞ is written as (Morisita, 1965) obtained the difference equation of Eq. (11) as The exact solution of Eq. (13) is where m is obtained from x 0 ¼ k=ð1 þ mÞ and t i ¼ iδ. (Hirota, 1979) obtained another difference equation: The exact solution of Eq. (15) is where m is obtained from x 0 ¼ k=ð1 þ mÞ and t i ¼ iδ.
The same regression equation is obtained from both difference equations by  as where Parameters k; r dm ; r dh , and m are estimated from regression analysis aŝ δr dh ¼Â n À 1; where n represents the data size, andÂ n andB n are estimates of A n and B n . The same estimatesk, δr dm , andm are obtained for any δ because δ is not included in Eq. (17). The relationship among r dm ; r dh , and r: holds (see ). The same estimate of m is obtained because of Eq. (23).

Analysis of Gompertz data with logistic model
The Gompertz data: Regression analysis with the logistic curve model obtains where A n and B n satisfy Regression analysis determines A n and B n through the regression equation: The regression coefficients A n and B n are obtained as follows: where x i y i À x n y n ; The estimated upper limit is obtained from Eq. (19), whereÂ n andB n in Eq. (19) are obtained from Eqs. (29) and (30).

Behaviour of upper limit estimated with unsuitable model
The upper limit estimated with the logistic curve model is proved to increase monotonically as the data size increases through the following three lemmas.

Behaviour of slope of regression equation
Lemma 5.1. The sign reversal of the slope of the regression equation is positive and monotonically decreases, i.e.
Proof. We define which is the data curve of Eq. (25). Thus, and are obtained because of Eq. (5). Hence, function f g ðxÞ decreases as x increases and is a downward convex function.

Magnitude relationship between ycoordinates of two consecutive regression lines at x nþ1
Lemma 5.2. The following magnitude relationship between y-coordinates of two consecutive regression lines at x nþ1 A n À B n x nþ1 < A nþ1 À B nþ1 x nþ1 (56) holds for n ! 2.

Value of y -coordinate of intersection of two consecutive regression lines
Lemma 5.3. The value of y-coordinate (y p ) of the intersection of two consecutive regression lines is larger than 1, i.e.

Monotonic increase of estimated upper limit
Theorem 5.4. Let data X i ; ði ¼ 0; 1; 2; . . . ; nÞ; n ! 2 be described by the Gompertz curve model. The function of estimates of the upper limit defined as Eq. (19), monotonically increases on n ! 2, where parameters A n and B n are obtained through the regression equation (Eq. (28)), i.e.
Proof. The x-coordinate of the point on the intersection of two consecutive regression lines x p is smaller than x nþ1 because of Eqs. (37) and (56). Thus, we have for x > x p . Because of Eq. (61), is obtained for n ! 2. From Eq. (67), the estimated upper limitk n is the x-coordinate of a point on the intersection of y ¼ 1: By subtracting 1 from both sides of Eq. (69), is obtained where x > x p . By substitutingk n ð > x p Þ for x, is obtained. Hence, is obtained for n ! 2. □

Limit of upper limit
Theorem 5.5. The limit of the upper limit converges to the upper limit estimated with the suitable model as Proof. We have A n À B n x n y n ; because the regression equation (Eq. (26)) intercepts the data curve of Eq. (25) at two points between ðx 1 ; y 1 Þ and ðx n ; y n Þ and Eq. (25) is a downward convex function. We also have y n < y n because function f g ðxÞ decreases as x increases from Eq. (39). Thus, we have from Eq. (77) and from Eq. (78). Hence, is obtained from Eqs. (79) and (80). We have lim n!1 x n ¼ k: We also have because of the Cesàro means theorem (Hardy, 1991). From Eq. (82), we have lim n!1 y n ¼ lim n!1 x nþ1 x n ¼ 1: The regression equation (Eq. (26)) intercepts the curve of Eq. (24) at two points: ðx L ; y L Þ; ðx R ; y R Þ between ðx 1 ; y 1 Þ and ðx n ; y n Þ. Hence, there exists such that from the mean-value theorem, where x L < x R : From Eq. (53), x n monotonically increases as n increases. Also, x n monotonically increases as n increases because y n ¼ x nþ1 =x n monotonically decreases as n increases. Furthermore, B n monotonically decreases as n increases from Eq. (37). Hence, both x L and x R monotonically increase as n increases. As a result, also monotonically increases as n increases. Since B n monotonically decreases as n increases, holds. Therefore, the limit of B n is positive and finite because From Eqs. (81), (82), (83), (84), and (90), is obtained. Therefore, Eq. (76) holds from the squeeze theorem. □

Practical use
The properties of the upper limit estimated by the logistic curve model and the slope of the regression equation are revealed in Sections 5.1 to 5.5. Furthermore, the upper limit is found to be smaller than that estimated by the Gompertz curve model from Theorems 5.4 and 5.5.
For practical use, we monitor whether the observed data conform to the three Lemmas, Theorem 5.4, and the magnitude relationship between the upper limits estimated by both models every time a new datapoint is obtained. If they do, the Gompertz curve model is one of the most suitable models. If they do not, the data are not regarded as Gompertz data because actual data with small noise retain these properties. If noise is so large that it drowns out the properties, the data are no longer described by the Gompertz curve model.
To confirm that data with small noise retain these properties, a sample dataset was generated with the non-homogeneous Poisson process (NHPP) model whose mean value function was a modified exact solution of the Gompertz equation (Yamada, 1992). The NHPP is a Poisson type model that takes the number of events per unit of time as independent Poisson random variables and is often used in software reliability engineering (Lyu, 1996;Yamada, 2014;Yamada & Tamura, 2016). The mean value function is to satisfy x 0 0 ¼ 0 for the NHPP model (Yamada, 1992). An algorithm generating one realization for the NHPP in (Cinlar, 1975) was used with parameters: in Eq. (4). Generated sample data with the NHPP model are shown in Figure 1.   The sign changed slopes estimated by the logistic curve model maintained positive values and monotonically decreased at n ! 8 from Figure 3 even when data included noise, whereas those estimated by the Gompertz curve model fluctuated in the first 17 data points and slowly increased after.

Conclusion
Behaviour of the upper limit estimated with the logistic curve model as an unsuitable model was analysed for data on an exact solution of the Gompertz curve model. It was found and proved that the estimated upper limit monotonically increases as the data size increases, i.e. time elapses. It was also proved that the upper limit converges to the upper limit estimated with the suitable Figure 3. Sign changed slopes estimated by both models.
model (the Gompertz curve model) as the data size approaches infinity. Therefore, the upper limit estimated with the logistic curve model is smaller than that estimated with the Gompertz curve model as the suitable model for data on the exact solution of the Gompertz curve model. These results were obtained by "applied discrete systems." These results also contribute to identifying a suitable model even though actual data include noise because the estimated upper limit with low noise must behave similarly to that without noise. It was confirmed that the proved properties are conserved except in the very first phase with sample data with noise. If noise is so large that it drowns out the properties, the data should no longer be analysed by the Gompertz curve model. If the above results are not observed from a given dataset, the data should not be analysed with the Gompertz curve model. Forecasting using the Gompertz curve model can result in seriously incorrect forecasts. However, if the above results are observed, the Gompertz curve model is one of the most suitable models. Further study will prove that a series of the estimated upper limits is upward or downward convex. The robustness of the behaviour of the upper limit against noise included in actual data is also to be further studied.

Funding
The author received no direct funding for this research.

Description
Behaviour of the upper limit estimated by an unsuitable model (the logistic curve model) was mathematically analysed for data on an exact solution of the Gompertz curve model with integrable difference equations. The analysis contributes to identifying a suitable model because the behaviour is independent of noise included in actual data. A suitable model is indispensable for correct forecasts. The following results were proved. The estimated upper limit monotonically increases as the data size increases and converges to the upper limit estimated with the suitable model (the Gompertz curve model) as the data size approaches infinity. Therefore, the upper limit estimated with the logistic curve model is smaller than that estimated with the Gompertz curve model.