Gompertz : A Scilab Program for Estimating Gompertz Curve Using Gauss-Newton Method of Least Squares Surajit

A computer program for estimating Gompertz curve using Gauss-Newton method of least squares is described in detail. It is based on the estimation technique proposed in Reddy (1985). The program is developed using Scilab (version 3.1.1), a freely available scientific software package that can be downloaded from http://www.scilab.org/. Data is to be fed into the program from an external disk file which should be in Microsoft Excel format. The output will contain sample size, tolerance limit, a list of initial as well as the final estimate of the parameters, standard errors, value of Gauss-Normal equations namely GN1 GN2 and GN3, No. of iterations, variance(σ), Durbin-Watson statistic, goodness of fit measures such as R, D value, covariance matrix and residuals. It also displays a graphical output of the estimated curve vis a vis the observed curve. It is an improved version of the program proposed in Dastidar (2005)


Introduction
Despite the diversity of available statistical software packages such as SAS (SAS Institute Inc 2005), SPSS (SPSS Inc 2003), PASS (NCSS Statistical Software 2005) there is a shortage of stand-alone programs that allows the user to estimate the parameters of the Gompertz curve using Gauss-Newton method of least squares.SPSS 12.0 (SPSS Inc 2003) provides SPSS Regression Models as add-on enhancements to the full SPSS (SPSS Inc 2003) Base System which has a general non-linear regression.SAS 9.1 (SAS Institute Inc 2005) however does provide a programming interface to develop a program for Gompertz growth models using modified Gauss-Newton method.PASS (NCSS Statistical Software 2005) provides users with a modified Gompertz growth model but the method for estimating the parameters of Gompertz curve cannot be specified by the user.This paper documents a computer program for estimating the parameters of the Gompertz curve using Gauss-Newton method of least squares.It is an improved version of the program proposed in Dastidar (2005).The program is based on the estimation technique proposed in Reddy (1985).In this program • The sample size need not be a multiple of three.
• The user can specify the range of the data for calculating the initial estimates of A 0 , B 0 and C 0 but the range should be a multiple of three.
In this program the critical input data to apply the computer code are: This model is mentioned in Seber and Wild (1989)

Program description
The program proceeds as follows: Before executing the program the following points needs to be checked.
1.The Scilab (version 3.1.1)software should be installed in the computer.
2. A new directory named Scilab should be created in C:\ 3. The necessary input file should be created in Microsoft Excel format and the data should be entered in the format as specified.(Figure3) It should be noted that separate excel files need to be created for separate variables.
4. All required input files should be saved in C:\Scilab directory.
5. The executable file 'Gompertz.sce'also needs to be stored in C:\Scilab directory.
6.The program is executed using the following command in the Scilab command prompt.exec('C:\Scilab\Gompertz.sce'); The critical user input to apply the computer program are sample size and tolerance limit.After these two inputs are entered from the keyboard, the program asks for the input file (stored in C:\Scilab directory).After the necessary data file has been chosen, it displays the input data and then the program output.There is only one output format listing sample size, tolerance limit, the names of the parameters, initial estimates, final estimates, standard errors, value of Gauss-Normal equations namely GN 1 , GN 2 and GN 3 , No. of iterations, variance (σ 2 ), Durbin-Watson statistic (DW), goodness of fit measures such as R 2 , D value where D = n t=1 ((y t − ŷt ) 2 / n t=1 ((y t − ȳt ) 2 ,covariance matrix and residuals.It also displays a graphical output of the estimated curve vis a vis the observed curve.(Figure 1, Figure 2)

Notation and theory
Suppose we have observed time series data y t for t = 1, 2, . . ., n Where a, b and c are positive constants and log u t 's are independent and identically distributed normal random variates with mean 0 and variance σ 2 .Using logarithmic transformation we have From the least squares technique that is by taking partial derivatives of n t=1 U t 2 with respect to A, B and C and equating each one of them to zero we obtain The estimates of A, B and C are obtained by solving the above system of Gauss-Normal Equations 4. Also, it is easy to note that the least squares residual sum of squares is given by where Ȳ = (1/n) n t=1 Y t and C = (1/n) n t=1 C t Unlike the least squares estimators in the linear regression, the estimates of A, B and C may not be unique.
Further, it may be noted here that it is not possible to obtain analytical expressions for C which satisfies Equations 4 and also minimizes Equation 5 (Reddy 1977).However, given such a value of C, the estimates of A and B are given by Gauss-Newton method of least squares is applied in the algorithm to obtain the estimators of A, B and C and their standard errors apart from the usual goodness of fit measures such as R 2 and D where where log ŷt = log â + ĉt .log b.This Gauss-Newton method consists of taking linear expansion of Y t = f t (A, B, C) = A + B.C t around A 0 , B 0 and C 0 and retaining the first degree terms and then using ordinary least squares method to obtain A, B and C. The initial values of A 0 , B 0 and C 0 are given as follows: Where And without the loss of generality it is assumed that the sample size for calculating the initial estimates A 0 , B 0 and C 0 is a multiple of three(n=3r).The above analytical expressions for A 0 , B 0 and C 0 are obtained by assuming the deterministic relation.
For t = 1, 2, . . ., n and solving for A, B and C using essentially three sets of equations.The estimation procedure of A 0 , B 0 and C 0 is termed as 'three point' method and this is a modified version of the method given in Chakravarti and Laha (1967).
Using vector notation let Let F be an n × 3 matrix of partial derivatives where F = δf i (θ i )/δ(θ j ); i = 1, 2, . . ., n and j = 1, 2, 3 and let F 0 be the value of F evaluated at the initial values θ 0 = (θ 10 , θ 20 , θ 30 ).From the Gauss-Newton method we get the estimate of Or the estimate of θ is given by This procedure is repeated with the new values as the starting values and the procedure is terminated if the successive values of (θ − θ 0 ) or ((θ − θ 0 )/θ 0 ) are very small and less than a pre-specified limit.

Sample input/output cases
The output of the computer program is in tabular form.It will contain a list of sample size, tolerance limit, initial as well as the final estimate of the parameters, standard errors, value of Gauss-Normal equations namely GN 1 , GN 2 and GN 3 , No. of iterations, variance, Durbin-Watson statistic (DW ), goodness of fit measures such as R 2 , D, covariance matrix and residuals.It also displays a graphical output of the estimated curve (Figure 1, Figure 2).The program has been analyzed using , data on Tiwari's estimates of Gross Domestic Product (TGDP), Industrial Production (TIP), see Table 1.The sample size taken was 15 and tolerance limit = 0.005.All the 15 observation were taken for calculating the initial estimates A 0 , B 0 and C 0 .
The input file will be an Excel format as shown in Figure 3.The output file will be in text format.The name of the executable file is Gompertz.scewhich contains the code for estimating Gomepertz curve.The TGDP series is depicted in Figure 1, the output from Scilab is given below.

Figure 1 :
Figure 1: Plot of TGDP against year

Figure 2 :
Figure 2: Plot of TIP against year PASS (NCSS Statistical Software 2005)software uses the following modified Gompertz growth model in its package.But the embedded procedure for calculating the estimated parameters cannot be specified by the user.
It should be noted that (Last Observation No.-First Observation No.) should be a multiple of three.