Statistical inference for ordinal predictors in generalized additive models with application to Bronchopulmonary Dysplasia

Discrete but ordered covariates are quite common in applied statistics, and some regularized fitting procedures have been proposed for proper handling of ordinal predictors in statistical models. Motivated by a study from neonatal medicine on Bronchopulmonary Dysplasia (BPD), we show how quadratic penalties on adjacent dummy coefficients of ordinal factors proposed in the literature can be incorporated in the framework of generalized additive models, making tools for statistical inference developed there available for ordinal predictors as well. The approach presented allows to exploit the scale level of ordinally scaled factors in a sound statistical framework. Furthermore, several ordinal factors can be considered jointly without the need to collapse levels even if the number of observations per level is small. By doing so, results obtained earlier on the BPD data analyzed could be confirmed.


Introduction
Bronchopulmonary Dysplasia (BPD) is a chronic lung disease often found in preterm infants with lungs not fully developed. Disturbance of lung development and severity of BPD is caused by various peri-and postnatal factors including prematurity itself, as well as pre-and postnatal infections [1]. BPD is measured on ordinal scale with grades 0, 1, 2, 3, but often dichotomized as 0: 'no/ mild BPD' and 1: 'moderate/severe BPD' . One goal of the study reported here is to investigate whether the time after birth some specific bacteria were found for the first time in the children's upper airways has an effect on BPD. Initially, n = 102 preterm infants with a birth weight < 1000 g and gestational age ≤ 32 + 0 weeks were analyzed within a retrospective cohort study at the tertiary perinatal center of Justus-Liebig-University Giessen (Germany) between January 2014 and June 2017. Two infants, however, had to be excluded from further analyses at some point due to missing information on some bacterial colonization. Earlier analyses already showed that the later bacteria were found, the lower the risk of developing BPD [2]. However, it is not fully understood yet which specific bacteria have an effect, and in which way. Therefore we will draw special attention to the time period/week (after birth) three types of bacteria-gram negative/positive and pathogenic-were found for the first time in the upper airway of the respective child. Although 'time' is supposed to be a continuous variable, information is only available in a discretized way here, because samples were only obtained once a week. Furthermore, the last category 'week > 6' is open/censored. If an observation is falling in the last category, we only know that until week six the respective germ had not been found yet. So, the corresponding covariate may only be considered as categorical but ordinal.

BMC Research Notes
*Correspondence: jan.gertheiss@hsu-hh.de 1 School of Economics and Social Sciences, Helmut Schmidt University/ University of the Federal Armed Forces, Hamburg, Germany Full list of author information is available at the end of the article Besides the information on bacteria, some additional risk factors need to be taken into account, such as the weight and sex of the child, the number of days antibiotics and steroids were given, or information on multiples. For doing so, a logit model with categorical/ordinal predictor 'bacteria/week' and additional, potentially confounding covariates may be fit. Typically, a categorical predictor is included as dummy-coded factor, ignoring, however, the information on the categories' ordering (if so). In the case presented, additional problems are caused by the fact that some categories/levels only have a few observations, and sometimes all those observations are falling in the same response category. Consequently, some coefficients in a logit model fit via usual maximum likelihood tend towards ±∞.
For preventing numerical problems like inflating regression coefficients, penalization can be a viable solution [3]. Furthermore, a penalty term can be used for exploiting/respecting a covariate's ordinal scale level. Following [4][5][6][7], for instance, a difference/smoothing penalty might be put on adjacent dummy coefficients of the ordinal factor when fitting the model. An approach that has already been applied successfully in medical research; see, e.g., [8,9]. In the BPD application, however, the question naturally arises how to test for significance of the ordinal predictor in the penalized setting. In a linear model with normal errors, this can be done using a (restricted) likelihood ratio test [10][11][12][13], after rewriting the ordinal penalty as a mixed model [14][15][16]. However, the corresponding test is not available for generalized linear models, such as the logit model considered here. In this note, we will illustrate how technology developed for generalized additive models [17,18] can be used to fit generalized linear and additive models with ordinal smoothing penalty, and conduct further statistical inference.

Methods
Given a response y with distribution from a simple exponential family, and a set of covariates x 1 , . . . , x p , a generalized additive model [17] has the form: where µ is the (conditional) mean of y given the covariates, h is a (known) response function, and η is comparable to the linear predictor in generalized linear models [19,20]. The difference to a generalized linear model is that non-linear functions f j , j = 1, . . . , p , are allowed in η , but still the structure of η is additive. Of course, if f j are restricted to be linear, a generalized linear model is obtained as a special case. In a (generalized) additive model, however, it is usually only assumed that functions f j are reasonably smooth; and one way to fit such models, as for instance implemented in the popular R package mgcv [18,21], is to specify a set of basis functions for each predictor and to employ an appropriate, quadratic smoothing penalty on the corresponding basis coefficients. That means, we assume that with B j1 (x), . . . , B jq j (x) being a reasonably rich set of basis functions chosen for function f j , and β j1 , . . . , β jq j are the corresponding basis coefficients. When fitting those basis coefficients to the data, a penalty term J j (β j ) is typically added for each covariate x j , penalizing wiggly basis coefficients and thus wiggly functions f j . The strength of the penalty and hence the amount of smoothing is controlled through a tuning parameter, often denoted by j . Now suppose you have a categorical predictor x j with levels 1, . . . , k j . Then, there is a somewhat natural basis: the basis of (dummy) functions ( l = 1, . . . , k j ) Since we know that x j can only take values 1, . . . , k j , we do not need to think about the type and number of basis functions, placing of knots, etc., as we usually do with continuous covariates. If now a (quadratic) first-order difference penalty is put on the basis/dummy coefficients β j = (β j1 , . . . , β jk j ) ⊤ , this gives exactly the smoothing penalty as mentioned above [4][5][6][7]. Alternatively, the second-order penalty can be used [14]. One of the benefits of considering ordinal predictors along with quadratic difference penalties in the framework of generalized additive models is that after implementing basis (3) in the appropriate way, gam() from mgcv can be used directly to fit a generalized linear/additive model with ordinal predictor(s) as needed for the BPD data. Besides pure model fitting, however, this provides us with additional tools; in particular, built-in estimation of the penalty/smoothing parameter(s) via (restricted) maximum likelihood ((RE) ML), further statistical inference, such as confidence intervals, and checking significance of smooth terms. Those tools utilize the mixed model and Bayesian interpretation of quadratic smoothing penalties on basis coefficients such as (4) and (5); compare [18,22,23] for details.
Add-on functions implementing the ordinal basis for use within mgcv have been made publicly available through R package ordPens [24]. After installing and loading ordPens, the gam() function from mgcv can be used with smooth terms s(..., bs = "ordinal", m = 1) or s(..., bs = "ordinal", m = 2) for the first-and second-order penalty, respectively. See the ordPens manual (R function ordSmooth()) for details and examples. To investigate whether the p-values of Wald-type tests with respect to smooth terms as provided by summary.gam() are reliable if using the ordinal basis/smoothing penalty, we used the confounder model, i.e., the model with information on bacteria removed, to estimate BPD probabilities. That means, the null hypothesis that the effect of (ordinal, bacteria-specific predictor) x is zero, is true by construction in this hypothetical model, because fitted BPD probabilities do not depend on x, given the other covariates. Using those probabilities, we simulated 'new' BPD response data, fit the model with smooth ordinal x added (and smoothing parameter estimated by REML), and stored the p-value of x. For x, we used information on gram negative/positive and pathogenic bacteria, respectively. As noted above, the corresponding ordinal factor gives the week colonization by the respective type of (oral) bacteria was detected for the first time. For each type of x, this was repeated 1000 times. Figure 1 shows QQ-plots of the p-values observed on the simulated data employing the first-and second-order penalty, respectively. Since the distribution of smaller p-values is particularly relevant when testing with usual α ≤ 0.1 , we restrict plotting to that area. It is seen that p-values obtained when employing the first-order penalty (red) are typically too small. Problems with the first-order penalty can be explained by the fact that the null space of the corresponding smooth term has dimension zero (compare the mgcv manual). In other words, the null hypothesis in the framework of mixed models (which is used for estimation here), a zero variance component, is on the boundary of the parameter space, which means that standard theory does not apply [10,11]. Results for the second-order penalty (blue), by contrast, look very encouraging. Consequently, we will only report results on the actually observed BPD data for the second-order penalty below. In earlier analyses [2], separate models were fit for each type of bacteria, and the first two weeks were collapsed to make (unpenalized) model fitting with dummy-coded, ordinal factors feasible. Thanks to the penalties presented here, we are now able to include all three predictors jointly while using all information with the resolution available.  Table 1 (top) shows the results for the parametric terms if using the second-order penalty (5) for the smooth terms. In particular, it is seen/confirmed that low birth weight is a risk factor for BPD, and also male infants and multiples have an increased risk of developing BPD. Antenatal steroids, by contrast, may decrease the risk. Results for the different types of bacterial colonization, which are included as ordinal predictors with smooth effects, are also given in Table 1 and Fig. 2. We see that the only significant effect is detected for pathogenic bacteria. The fitted function (Fig. 2, right) gives the impression that early detection is associated with increased risk of BPD. Statistical uncertainty, however, is very large (due to the small number of samples with week/level 1) as indicated by the confidence interval. When excluding information on gram negative and positive bacteria from the model, results for the remaining terms (Table 1, bottom) as well as fitted functions/coefficients for pathogenic bacteria (not shown) look very similar as before. In summary, our results using the ordinal smoothing approach are in line with earlier analyses [2], but allow for considering all three ordinal predictors (gram negative/positive, pathogenic bacteria) jointly, without the need to collapse levels.

Limitations
With respect to the application/BPD data, the main limitation is the small number of samples in week 1 for pathogenic bacteria. Since the shape of the function in Fig. 2 (right) depends on the coefficient for week/level  (5), a problem can occur with confidence intervals in terms of under-coverage if the fitted coefficient function is close to being linear (compare Fig. 2, left). This problem is also found for (generalized) additive models with continuous covariates, and the suggested fix is to change the target of inference to the smooth term plus the overall model intercept [18,25]. Furthermore, our implementation does not include extensions like ordinal smoothing spline isotonic regression [7]. Finally, it should be noted that all statements made and conclusions drawn in this article refer to statistical inference if smoothing parameters are estimated by REML (or ML). When using GCV (which is the default in mgcv!), results should be treated with caution.