Applying regression models with skew-normal errors to the height of bedding plants of Stevia rebaudiana ( Bert ) Bertoni

The experiment had the objective of fitting regression models to data of the height of the bedding plants cultivated in three multicellular Styrofoam trays with three different cell volumes. We proposed two types of models in the current experiment. First, we fit a model with normal errors and next a model with a skew-normal distribution of errors. The skew-normal regression was suitable for modelling both cases. First, when the model included the time covariate and next when the cell size covariate was part of the model. However, the value of the parameter  for the multivariate model was very high, which is an indication that the skew-normal model is also not the best. Thus, we suggest further fitting using the skew regression model of t-Student. Keyword: skew-normal distribution, parameter estimation, regression with skew-normal errors. Ajuste de modelos de regressão normal com erros assimétricos aplicados à altura de plantas de Stevia rebaudiana (Bert) Bertoni RESUMO. Este trabalho tem por objetivo ajustar modelos de regressão à dados de altura de plantas observados no tempo e cultivados em vasos de diferentes tamanhos. Dois tipos de modelos foram propostos, o modelo de regressão supondo normalidade dos erros e o modelo de regressão supondo erros com distribuição normal assimétrica. O modelo de regressão normal assimétrico mostrou-se mais adequado para a modelagem em duas situações, para o modelo somente com a covariável tempo e também, quando a covariável tamanho do vaso foi incluída no modelo. Todavia, o valor do parâmetro  para o modelo multivariado foi muito grande, o que é um indicativo de que o modelo normal assimétrico também não seja o mais adequado. Nesta situação, sugere-se o modelo de regressão t-Student assimétrico. Palavras-chave: distribuição normal assimétrica, estimação de parâmetros, regressão com erros normais assimétricos.


Introduction
Methods fitting skew-normal distribution were disseminated after Azzalini (1985) through a new class of skew-normal distribution, in which is also found the normal, although the first studies on the skew-normal distribution were reported by Roberts (1966), O´Hogan and Leonard (1976), Aigner et al. (1977), andHill andDixam (1982) decades before the proposition of Azzalini (1985).As the assumption of normal errors in linear regression is not satisfied in many data sets, the alternative has been the transformation of variables.However, transformation can make difficult the explanation of the experimental responses as argued by Azzalini and Capitanio (1999).Thus, this class of skew-normal distribution has been studied because of its robustness in parameter estimation of regression models without normal errors.
Agronomically, the production of bedding plants of Stevia rebaudiana has been facing some constraints on the best stage of development necessary to allow the mechanical transplanting that requires a physical structure for using transplanting machinery as well as the determination of the best physiological development to impede early flowering at pre-and post-transplanting (CARNEIRO, 2007).The number of hours to induce the flowering stages is about 13 hours when the seedlings usually have four to five pairs of leaves (ZAIDAN et al., 1980).
Stevia growers and practitioners under field conditions have difficulties in predict crop development.The early flowering of overgrown bedding plants reduces crop yield during the first growing period of the first growing season thereby reducing the net returns to Stevia growers in the crucial period of the crop establishment.Furthermore, the cell size of the Styrofoam trays also has influence on the time to flowering under field conditions.Thus, the determination of the best stage of development (CARNEIRO, 2007) is crucial for the success of crop establishment under field conditions where the bedding plant height is an important and complementary parameter during the early development of Stevia crops.The stages V 2.i of the plant development indicates that bedding plants are growing under nursery conditions, and the letter i represents the number of nodes or pairs of opposite leaves with a length greater than 5 mm (CARNEIRO, 2007).
The objective of the current experiment is to fit regression models to the height of bedding plants of Stevia rebaudiana (Bert) Bertoni growing in three types of Styrofoam trays with different cell volumes (CARNEIRO, 1990;CARNEIRO et al., 1997) during the stages V 2.i .Normal and skewnormal error assumption will be used to estimate the model parameters using maximum likelihood and Bayesian methods.

Material
The data set was collected from a glasshouse experiment carried out at the Iguatemi Research Station at the latitude of 23º25´S, longitude of 51º57´W, and 550 m altitude in the Universidade Estadual de Maringá, Northwestern Paraná State, Brazil.Pure germinating seeds of Stevia rebaudiana were sown on Styrofoam trays with inverted pyramid shape and cell volumes of 14, 35 and 112 cm 3 .These bedding plants were raised in the onfarm mixture of Dystrophic Red Latosol with 7% of laying hen manure (LHM).Previously, the fresh laying hen manure (LHM) was watered daily for 15 days on transparent plastic film for leaching the excess of salts and aerobic decomposition.Next, the manure was dried under sunlight and ground before amending the soil.Fluorescent daylight lamps (40W) were turned on to maintain the photoperiod above the 13 hours (THOMAS, 2006) and avoid the early flowering stages before the transplanting.The seedlings from every cell volume were daily pruned for two weeks to standardize the stages of development at the 15 th day, because the Stevia seeds require about 10 days to complete the germination.The first harvest was carried out on the 15 th day, when the seedlings had two pairs of leaves, or the V 2.2 stage of development (CARNEIRO, 2007).Thereafter, the seedlings were harvested at 7 days intervals during a period of 57 days.The height of six bedding plants was measured using a professional steel ruler accurate to 1 mm.

Regression models with skew-normal errors
The fitting of statistical models using skewnormal regression is based on the definition of Azzalini (1985).

Definition
A random variable Z has skew-normal distribution with the skew parameter

  
if its density function is given by: where: (.) is the density function; (.) is the standard normal distribution with zero mean and variance 1.The designation of this distribution is Z ~ SN(λ) where λ controls the asymmetry.The expected value and the variance of Z are given by: The density function given by equation ( 1) has mathematical properties that guarantee efficiency and quality of the statistical modelling as detailed by Azzalini (1985).The random variable Y is denoted by Y ~ SN(μ, σ 2 , λ) with the density function: Considering a data set (y 1 , y 2 , ..., y n ) with n independent observations from the variable Y i ~ SN(μ i , σ 2 , λ), i = 1, 2, ..., n, where: Y depends on p covariables X k , k = 1, 2, 3 ..., p, the relationship Y to X k can be: Therefore, i  is a random variable with location parameter zero (0) and scale σ 2 .Since Thus, the parameter vector , σ and λ, or θ = ( t ,σ, λ),  t = ( 0 ,  1 , ...,  p ) ( ) have to be estimated.

Parameter estimation
Maximum log likelihood method The estimate of the parameter vector, θ = ( t ,σ, λ), should be carried out by maximum likelihood, which maximizes through numerical procedures, the logarithm of the likelihood function of the model (3) given by: The estimation of maximum likelihood from the vector  can be found by numerical methods that maximize the logarithm of the likelihood function given in (4).
In this model the 1´s method was applied because the parameter estimation consists of estimating the parameter of the probability distribution with skewed error using WinBUGS as detailed by Henze (1986).
In Bayesian estimates we considered significant at 5% of probability the effects whose credibility intervals of the regression coefficients for a posterior means had no zero values.The posterior marginal distribution for all parameters was obtained by BRugs routine of the R software.The software generated 100,000 values using the MCMC chain (Monte Carlo Markov Chain), with a burn-in period of 10,000 initial values.Thereafter, through the jumps of 10 we selected a sample of 10,000 values.The chain convergence was verified by the CODA routine in the R software using the criterion of Heidelberger and Welch.
Models as presented were applied to the data of bedding plant height (Y i ), initially considering the covariate harvesting time (t i ) and then including the covariate cell sizes (V i ) of the Styrofoam trays.

Results and Discussion
Hereafter, the current models will be nominated as uni-and multivariate, respectively.Figure 1 illustrates the scatter of height of these bedding plants during the harvesting time from all the different cell volumes.The three last harvesting of these bedding plants had a large scatter in the height as corroborated by data in Figures 3 and 4.These responses can be explained by the level of internode length of the overgrown seedlings.In the Figure 2, the growth responses over the harvesting time of the different cell volumes indicate taller plant grown in large cells.This figure shows that in larger cells some plants grow much more than others.This result may be explained by bedding plant physiological responses because of the container volume.Figure 5 shows that the largest discrepancy in the development of the bedding plants starts after the 36 th day of growth at Styrofoam trays with cell volumes of 35 and 112 cm 3 .The explanation rests on the data description by a distribution with skewed tail than that found in the normal distribution.These results suggested that the models were based on the characteristics as indicated by fitting the uni-and multivariate models (Table 1).The fitting either by the linear regression or using the linear multivariate regression, both evaluated by DIC criterion, was found applying the skewed-normal error distribution.The residual normality assumption from fitting regression models with normal errors was not satisfied as reported in the Figures 6 and 7.The regression model with univariate skewnormal error had a high positive value for the asymmetric parameter () meaning a positive asymmetry (Table 1).Based on the prediction intervals, the estimates of the  parameter for both models are large and significant indicating asymmetric data.The most probable explanation is that height alone as seen in nursery stocks is not so good as the stages of development (CARNEIRO, 2007) because the competition for light and the container volume have strong influence on the individual development of the bedding plants.However, some authors have suggested for high values of () that the asymmetric t-Student model should have preference to the normal asymmetric.

Conclusion
Linear regression with normal error may be an alternative method to model the height of Stevia bedding plants growing in different cell volumes of Styrofoam trays during the stages V 2.i .The multivariate model with skew-normal errors is more reliable than the model with normal errors because it improves the goodness of fit.The parameters of the models are similar for both methods of estimation except for the asymmetric parameter, which showed the greatest lack of data normality from the Bayesian method.

Figure 1 .Figure 2 .
Figure 1.Bedding plants height harvested at seven-day intervals from all the cell volumes of Styrofoam trays.

Figure 3 .
Figure 3. Profile of Stevia bedding plants growing in different cell volumes of Styrofoam trays and harvested in different times to investigate the seedling length.

Figure 4 .
Figure 4. Bedding plant height over harvesting time.

Figure 5 .
Figure 5. Profile of Stevia bedding plants growing in three different cell volumes of Styrofoam trays in every harvesting time.

Figure 6 .
Figure 6.Residual distribution (left) and QQ-plot (right) from fitting the univariate model considering a normal error distribution.

Figure 7 .
Figure 7. Residual distribution (left) and QQ-plot (right) from fitting the multivariate model considering a normal error distribution.

Table 1 .
Fitting of univariate and multivariate regression models.