Quantile Regression for Longitudinal Functional Data with Application to Feed Intake of Lactating Sows

Battagliola, Maria Laura; Sørensen, Helle; Tolver, Anders; Staicu, Ana-Maria

doi:10.1007/s13253-024-00601-5

Quantile Regression for Longitudinal Functional Data with Application to Feed Intake of Lactating Sows

Open access
Published: 06 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Quantile Regression for Longitudinal Functional Data with Application to Feed Intake of Lactating Sows

Download PDF

Maria Laura Battagliola¹,
Helle Sørensen²,
Anders Tolver² &
…
Ana-Maria Staicu³

342 Accesses
Explore all metrics

Abstract

This article focuses on the study of lactating sows, where the main interest is the influence of temperature, measured throughout the day, on the lower quantiles of the daily feed intake. We outline a model framework and estimation methodology for quantile regression in scenarios with longitudinal data and functional covariates. The quantile regression model uses a time-varying regression coefficient function to quantify the association between covariates and the quantile level of interest, and it includes subject-specific intercepts to incorporate within-subject dependence. Estimation relies on spline representations of the unknown coefficient functions and can be carried out with existing software. We introduce bootstrap procedures for bias adjustment and computation of standard errors. Analysis of the lactation data indicates, among others, that the influence of temperature increases during the lactation period.Supplementary materials accompanying this paper appear on-line.

A new standard model for milk yield in dairy cows based on udder physiology at the milking-session level

Article Open access 21 August 2017

Segmented regression to describe cumulative milk production of grazing dual-purpose Holstein-Zebu cows

Article 11 February 2019

New residual feed intake criterion for longitudinal data

Article Open access 25 June 2021

1 Introduction

This paper considers quantile regression for longitudinal data in the presence of functional covariates. It is motivated by data on the daily feed intake of lactating sows, where the aim is to study how temperature in the stable, or cell, during the day affects the feed intake, in particular for sows that eat scarcely. This is of interest because poor nutrition in the lactation period may lead to health downsides, both for the sows and the piglets, and production inefficiency. Daily temperature is measured every fifth minute and is therefore naturally treated as a functional covariate, and the study is longitudinal with both the feed intake and daily temperature profiles recorded for up to 21 days for each sow.

Quantile regression, first introduced by Koenker and Bassett Jr (1978), is a well-established framework from statistics and econometrics. It is suitable when the analysis aims at describing and quantifying the association between covariates and quantiles of the distribution of the response variable. In particular, it allows to robustly target not only the central parts of the response distribution, but also the more extreme regions. For overviews, see the seminal monograph by Koenker (2005) and for more recent developments, see Koenker et al. (2017).

Analyses of longitudinal data, including quantile regression, must account for the dependence between observations from the same subject in order to provide valid inference. A common approach is to include subject-specific effects in the model for the quantiles and use penalization, see for example Koenker (2004), Lamarche (2010), Harding and Lamarche (2017), Gu and Volgushev (2019), and Fasiolo et al. (2021a). We adopt the same approach for this paper. Alternatives include Kato et al. (2012) and Galvao and Kato (2016), who treated subject-specific parameters as fixed effects without penalization, and Canay (2011), who used a two-step procedure where subject-specific parameters are first estimated as fixed effects and then plugged in as offsets in a standard quantile regression (see also Besstremyannaya and Golovan 2019).

Quantile regression with functional covariates, similar to scalar-on-function mean regression, describes the association between a quantile of the response and a functional covariate using an inner product between the functional covariate and an unknown smooth coefficient function. As it is common in nonparametric regression, we approximate the coefficient function using a finite basis representation, and thus, the infinite-dimensional estimation problem is converted to a finite-dimensional one. Pre-specified spline functions and eigenfunctions obtained from the spectral decomposition of the functional covariates’ covariance operator are the most popular choices for selecting the basis functions, and they have both been used for quantile regression. For example, Cardot et al. (2005) and Park et al. (2019) used splines, whereas Kato (2012), Chen and Müller (2012a) and Li et al. (2022) used eigenfunctions. A related research area is additive quantile regression where the effect of a scalar covariate is modeled via a smooth function (Fenske et al. 2013; Greven and Scheipl 2017; Geraci 2019; Fasiolo et al. 2021a).

In this paper, we consider functional quantile regression for scalar response and functional covariates, which are both observed repeatedly for many clusters or subjects. To the best of our knowledge, no papers in the literature are devoted to this situation. We consider a set-up with longitudinal data and allow for the effect of the functional covariate on the quantile to evolve over observation time. We use penalized splines to handle the functional covariates and penalized cluster- or subject-specific intercepts to account for the dependence within clusters or subjects. The resulting model can be represented in a framework easily implementable using existing software from Fasiolo et al. (2021a). Moreover, we point out bias and variance issues of the estimators and propose adjustments obtained with bootstrap, using resampling techniques from Battagliola et al. (2022) for bias adjustment and from Galvao and Montes-Rojas (2015) for computation of standard errors. Altogether, our analysis gives new insight to the eating behavior of lactating sows, our ultimate goal. In particular, the analysis indicates that the association between temperature in the stable and the feed intake gets increasingly stronger after delivery.

The paper is structured as follows: The model framework and estimation methodology are described in Sects. 2 and 3. We analyze the lactation data in Sect. 4 and summarize and discuss findings in Sect. 5. Appendix provides details about the bootstrap procedures and the practical implementation. Finally, additional results from the application result from simulation studies, and example code can be found in the supplementary materials.

2 Framework

We consider data $\{(Y_{ij}, X_{ij}(\cdot ), t_{ij})\}_{ij}$, with scalar responses $Y_{ij}$ and functional covariates $X_{ij}(\cdot )$ at time points $t_{ij} \in \mathcal {T} \subset [0, \infty )$, where $i = 1, \ldots , N$ denotes clusters, and $j=1, \ldots , n_i$ denotes repeated measurements within cluster i. Observations from different clusters are assumed to be independent, but there may be within-cluster correlation. Covariates $X_{ij}(\cdot )$ are square-integrable functions on a closed interval $\mathcal {S} \subset \mathbb {R}$, i.e., $X_{ij}(\cdot )\in L^2(\mathcal {S})$. In practice, they are often observed on a dense grid $\{s_1, \ldots , s_H\} \subset \mathcal {S}$ and possibly with measurement errors.

We are concerned with quantile regression. Let $\tau \in (0,1)$ be a fixed quantile level and assume that the $\tau $-quantile for the conditional distribution of the j-th observation $Y_{ij}$ from cluster i given covariates $X_{ij}(\cdot )$ and $t_{ij}$ takes the form:

$$\begin{aligned} Q_{Y_{ij}|X_{ij}, u_i}^\tau (t_{ij}) = \alpha ^\tau (t_{ij}) + \int _\mathcal {S} \beta ^\tau (s,t_{ij}) X_{ij}(s) ds + u_i, \end{aligned}$$

(2.1)

where $u_i$ (dependence of $\tau $ suppressed in notation) specifies a cluster-specific level. The target parameters of the analysis are the intercept functions $\alpha ^\tau (\cdot )$ and the regression coefficient function $\beta ^\tau (\cdot , \cdot )$, which are both assumed to be common for all clusters. In particular, the functional covariate affects the $\tau $-quantile in the same way for all clusters. Notice that $\beta ^\tau $ depends on t, allowing for the covariate effect to change over longitudinal time. Without further restrictions, $\alpha ^\tau (\cdot )$ is identifiable up to an additive constant, and $\beta ^\tau (\cdot ,t)$ is identifiable up to an additive component in the orthogonal complement of the vector space spanned by the functional covariates $X_{ij}(\cdot )$.

The notation in (2.1) reflects that we think of data as emerging from a two-step process: $u_i$ is a sample of i.i.d. random variables with mean zero, and $Y_{ij}$’s are then generated independently from a model with $\tau $-quantile (2.1). The restriction $\mathbb {E}[u_i]=0$ ensures full (asymptotic) identifiability of $\alpha ^\tau (\cdot )$. We emphasize that (2.1) does not specify the full conditional distribution of $Y_{ij}$ given $\{X_{ij}(\cdot ), t_{ij}\}$ in cluster i, only its $\tau $-quantile, and we suggest to use it for one or a few quantile levels of particular interest.

With the two-step data-generating process, we may also consider the $\tau $-quantile of the conditional distribution of $Y_{ij}$ given $\{X_{ij}(\cdot ),t_{ij}\}$ marginally over all clusters. The association between this implied marginal $\tau $-quantile and $X_{ij}$ may not take a form similar to that of (2.1). Ignoring the cluster-specific parameters, i.e., fitting the quantile regression model (2.1) with all $u_i=0$, would therefore not target $\alpha ^\tau (\cdot )$ and $\beta ^\tau (\cdot , t)$. This is an important difference compared to the associated mean regression mixed-effects model where the conditional and marginal means would be described by the same coefficient function, such that an analysis based on the marginal model would lead to reliable estimates (but possibly wrong inference). See the supplementary materials and Battagliola et al. (2022) for further considerations on marginal versus conditional models and analyses in quantile mixed-effects models.

3 Estimation Methodology

Two main challenges arise for the estimation of the model (2.1) compared to classical quantile regression for independent data with scalar covariates: how to represent the longitudinal and longitudinal functional coefficients $\alpha ^\tau (\cdot )$ and $\beta ^\tau (\cdot ,\cdot )$, and how to handle the cluster-specific intercepts $u_i$.

3.1 Representation of the functional coefficient and smooth intercept

Firstly, we assume that $t\mapsto \alpha ^\tau (t)$ and $(s,t)\mapsto \beta ^\tau (s,t)$ depend smoothly on time. This allows us to use tools from additive models (Wood 2017) and hence approximate the functions along the t-coordinate with some basis functions $\{\psi _l(\cdot )\}_{l=1}^L$. For simplicity, we choose to use the same basis functions for both coefficient functions. Secondly, in the functional regression literature it is also common to have a finite-dimensional representation of functional coefficients in the s-coordinate. Let $\{\varphi _d(\cdot )\}_{d=1}^D$ be basis functions for that purpose and represent $\beta ^\tau (\cdot ,\cdot )$ as a tensor product smooth with different bases for the two coordinates. To be specific, we write

$$\begin{aligned} \alpha ^\tau (t)\approx & {} \sum _{l=1}^L a^\tau _l\psi _l(t),\nonumber \\ \beta ^\tau (s,t)\approx & {} \sum _{l=1}^L \sum _{d=1}^D \delta ^\tau _{dl}\psi _l(t) \varphi _d(s), \end{aligned}$$

(3.1)

where $a^\tau _l$’s and $\delta ^\tau _{dl}$’s are unknown coefficients. In the application, we use cubic spline bases; in the following, we describe the methodology for general penalized splines.

With the representations in (3.1), the integral in (2.1) is approximated with

$$\begin{aligned} \int _\mathcal {S} \beta ^\tau (s, t) X_{ij}(s) ds \approx \sum _{l=1}^L \sum _{d=1}^D \delta ^\tau _{dl}\psi _l(t) \int _\mathcal {S} \varphi _d(s)X_{ij}(s) ds, \end{aligned}$$

and we can work with finite-dimensional version of model (2.1), namely

$$\begin{aligned} Q_{Y_{ij}|X_{ij}, u_i}^\tau (t_{ij}) = \sum _{l=1}^L a^\tau _l \psi _l(t_{ij}) +\sum _{l=1}^L \sum _{d=1}^D \delta ^\tau _{dl} \psi _l(t_{ij}) \xi _{d,ij} + u_i. \end{aligned}$$

(3.2)

Here, $\xi _{d,ij} = \int _\mathcal {S} \varphi _d(s)X_{ij}(s) ds$, and the coefficients $\{a^\tau _l\}_{l}$ and $\{\delta ^\tau _{dl}\}_{dl}$ and the subject-specific intercepts $\{u_i\}_i$ are the unknown parameters. In practice, the integrals $\xi _{d,ij}$ are approximated with Riemann sums.

3.2 Estimation of Coefficients and Random Intercepts

In standard quantile regression, parameters are estimated by minimizing an empirical loss, $\sum _{i=1}^N \sum _{j=1}^{n_i} l_\tau (Y_{ij}-Q^\tau _{ij})$, where $Q_{ij}^\tau $ is short for the level $\tau $ quantile for observation j of cluster i and depends on the model parameters, and $l_\tau $ is an appropriate loss function, typically the check loss function $v\mapsto v (\tau - \mathbb {1}_{(v <0)})$ (Koenker and Bassett Jr 1978). The method proposed by Fasiolo et al. (2021a), named “QGAM”, offers a flexible framework to model longitudinal quantile regression with scalar covariates and is implemented in an accompanying R package qgam. QGAM uses a smooth approximation of the check loss function in order to make it differentiable so common computational optimizers like the Newton method can be used for the minimization problem. Several smooth approximations of the check loss function available in the literature, see for instance Horowitz (1998), Chernozhukov and Hong (2003), Otsu (2008), Goh and Knight (2009), Fernandes et al. (2021) and He et al. (2021). The one adopted by Fasiolo et al. (2021a), called the extended log-F loss function, allows embedding the minimization problem into a Bayesian estimation framework via belief update (Bissiri et al. 2016).

We adapt QGAM to functional covariates and penalize the coefficients $\{a^\tau _l\}_{l=1}^L$ and $\{\delta ^\tau _{dl}\}_{d=1,l=1}^{D,L}$ and the subject-specific intercepts $\{u_i\}_i$ as it is common in the additive mixed-effects models literature. More specifically, penalty terms for $a^\tau _l$s and $\delta ^\tau _{dl}$s are added to the loss function to impose smoothness in the s- and t-directions (Wood 2017, Chapter 5), while an $\ell _2$-penalty is used for the $u_i$s to balance the degree of subject-specific variation. The latter resembles a normal distribution for $u_i$ although this is not a formal assumption in the model. The tuning parameters defining the degree of penalization are selected as part of the procedure as implemented in the qgam package, see Fasiolo et al. (2021a) for details. In the following, we refer to the extension of QGAM to functional covariates as “fQGAM”.

3.3 Covariates Observed with Noise

In the previous sections, the covariate functions were assumed to be observed densely and without measurement noise. Now, consider the more realistic situation with measurement noise and possibly sparse sampling, and denote by $W_{ij,h}$ the observations corresponding to points $s_h$, i.e.,

$$\begin{aligned} W_{ij,h} = X_{ij}(s_{h}) + \epsilon _{ij,h} \quad h=1,\ldots ,H, \end{aligned}$$

(3.3)

where $\{\epsilon _{ij,h}\}_{ijh}$ are iid. random variables with mean zero and mutually independent of the underlying functions. We propose to carry out a preliminary smoothing step and proceed with the analysis from Sect. 3.2 with the unobserved values $X_{ij}(s)$ replaced by their fitted/predicted values $\hat{X}_{ij}(s)$.

There are many smoothing techniques available for functional data, e.g., kernel-based methods, smoothing splines, and smoothing with data-driven bases, see for example Ramsay and Silverman (2005). We choose to represent $\{\hat{X}_{ij}(\cdot )\}_{ij}$ with eigenfunctions arising from functional principal component analysis (FPCA), by assuming independence over both i and j. The number of eigenfunctions should be large enough to capture the primary modes of variation of $\{W_{ij,h}\}_{ijh}$, but small enough to get smooth reconstructed functions. The choice is usually based on a preset percentage of variance explained (PVE). Several implementations of FPCA are available depending on the sampling pattern of the functional data (dense or sparse, same or different sampling locations, missing values), see e.g. Yao et al. (2003), Xiao et al. (2018), and references therein. We use the fast covariance estimation (FACE) method from Xiao et al. (2016) in this work, ignoring potential dependence among functions. This is not inappropriate functions (see, e.g., Goldsmith et al. 2012). Approaches that account for dependence within subject or cluster have been discussed by Greven et al. (2010), Chen and Müller (2012b), Park and Staicu (2015), Koner and Staicu (2023), to name a few.

3.4 Bootstrap Procedures for Variance Assessment and Bias Adjustment

In the sow data application, we are mainly interested in estimation and inference for quantiles and the differences between quantiles at specified directions of the functional covariate. It is known from the literature on quantile regression for longitudinal data with scalar covariates, that estimators may be biased and that it is difficult to properly assess the sampling variability of the estimators without resampling methods (Kato et al. 2012; Galvao and Montes-Rojas 2015; Battagliola et al. 2022). We have seen in simulation studies (available in the supplementary materials) that the problems persist when covariates are functional, and we propose to use bootstrap strategies for variance estimation and bias adjustment. The complete procedure is visualized in Fig. 1, and the details are given below and in appendix.

Recall the quantile model (2.1) with repeated measurements of functional covariates and responses for each subject. Our target parameters are described as follows. Consider a fixed time point t, a function $X(\cdot )\in L^2(\mathcal {S})$ and response Y. The corresponding linear predictor at level $\tau $ is:

$$\begin{aligned} Q_{Y|X, 0}^\tau (t) =\alpha ^\tau (t)+\int _\mathcal {S} \beta ^\tau (s,t)X(s)\, ds \end{aligned}$$

and is interpreted as the $\tau $-quantile for a typical subject (with $u=0$). The function $X(\cdot )$ may or may not be one of the functions in the dataset. Furthermore, consider two functional covariates $X_A(\cdot ),X_B(\cdot )\in L^2(\mathcal {S})$ with pointwise difference, $\Delta X(s) = X_A(s) - X_B(s)$. For a fixed cluster, i.e., a fixed u and a fixed measurement time t, the corresponding difference in the $\tau $-quantile is

$$\begin{aligned} D^\tau (t) = Q_{Y|X_A, u}^\tau (t) - Q_{Y|X_B, u}^\tau (t)= \int _{\mathcal {S}} \beta ^\tau (s,t) \Delta X(s)\, ds, \end{aligned}$$

(3.4)

so $D^\tau (t)$ is the difference in quantile for a fixed subject when $X(\cdot )$ is changed in direction $\Delta X(\cdot )$. In the following, we refer to $Q_{Y|X, 0}^\tau (t)$ or $D^\tau (t)$ as targets $\theta $ of interest and let ${\hat{\theta }}$ denote the corresponding estimate calculated from estimates of the coefficients $\beta ^\tau (\cdot ,\cdot )$ and $\alpha ^\tau (\cdot )$ in the representation (3.1).

First, consider estimation of $\text {var}(\hat{\theta })$. The estimated model coefficients, $\{\hat{a}^\tau _l\}_l$ and $\{\hat{\delta }^{\tau }_{dl}\}_{dl}$, from qgam are accompanied with a variance–covariance matrix, which can be used for computation of a standard error for the estimator $\hat{\theta }$. We refer to these standard errors as model-based standard errors. However, penalization of random effects is likely to cause underestimation of the true sampling variation of $\hat{\theta }$. As Galvao and Montes-Rojas (2015), we resample complete subject data with replacement to compute standard errors. Specifically, let $\tilde{\theta }_1,\ldots ,\tilde{\theta }_B$ be estimates when the estimation procedure is applied to B bootstrap datasets and use $\text {sd}_{\text {boot}}(\hat{\theta }) = \text {sd}(\tilde{\theta }_1,\ldots ,\tilde{\theta }_B)$ as an estimate for $\sqrt{\text {var}(\hat{\theta })}$. Details of the sampling procedure can be found in appendix.

Second, consider estimation of $\text {bias}({\hat{\theta }})$. As documented by Battagliola et al. (2022), bias can occur even for large samples, caused by a combination of the incidental parameter problem (the number of parameters increase with sample size, Neyman and Scott 1948; Lancaster 2000), nonlinearity of quantiles, and penalization of the subject-specific intercepts. Block resampling cannot be used for bias adjustment because the target parameter of interest is not computable under the bootstrap distribution; see also Karlsson (2009) who obtained little or no effect in an attempt to adjust for bias in a nonlinear quantile regression for longitudinal data. Instead, we propose to combine standard resampling of estimated random effects with wild bootstrap of residual terms, using the technique developed by Battagliola et al. (2022). The purpose is to generate bootstrap datasets under a distribution where the true value of the target parameter coincides with ${\hat{\theta }}$ (the estimate obtained from the observed data), such that the bias can be estimated from the bootstrap estimates. Specifically, let ${\check{\theta }}_1,\ldots ,{\check{\theta }}_B$ be estimated values of $\theta $ for B bootstrap datasets, then bias is estimated as $\text {bias}_{\text {boot}}({\hat{\theta }}) = \frac{1}{B}\sum _{b=1}^B({\check{\theta }}_b - {\hat{\theta }})$. We explain the sampling procedure in more detail in appendix.

The block resampling method and the sampling method suggested by Battagliola et al. (2022) differ in several ways. While the former is completely nonparametric, the latter relies on the model. Another important difference is that the covariate functions are resampled (together with the responses) by the block resampling method, but kept exactly as in the dataset in the approach of Battagliola et al. (2022). As a consequence, the procedure based on wild bootstrap would underestimate the variance of the estimator ${\hat{\theta }}$. Our suggested solution is to combine the estimated bias and estimated standard deviation from the two bootstrap sampling methods, respectively, to construct confidence intervals for the target $\theta $. If the distribution of ${\hat{\theta }}$ is well approximated by a normal distribution, and standard errors and bias are estimated as described above, it is natural to define an approximate $1-\alpha $ confidence interval as

$$\begin{aligned} {\hat{\theta }}- \text {bias}_\text {boot}({\hat{\theta }}) \pm q_{1-\alpha /2} \; \text {sd}_\text {boot} ({\hat{\theta }}). \end{aligned}$$

(3.5)

Battagliola et al. (2022) demonstrated in a wide variety of simulation settings with clustered data and scalar covariates that bias was greatly reduced or removed, with the above bootstrap sampling process combining resampled cluster-specific intercepts and wild bootstrap for the residuals, and Galvao and Montes-Rojas (2015) demonstrated that sampling variation of estimators is measured appropriately with block resampling. We acknowledge that investigations of the coverage properties of the confidence intervals in the current setting would be of interest, but leave it for future research and focus on the application below.

4 Lactating Sows’ Feed Intake

It is well known that high ambient temperature has a negative impact on pigs’ reproduction performance and wellness before and after pregnancy (Dourmad et al. 2022; Bjerg et al. 2020; Bloemhof et al. 2013). In particular, high temperature reduces food intake for lactating sows, possibly leading to increased body weight loss and reduced milk production, implying slower and poorer weight gain of the piglets (Renaudeau and Noblet 2001). We refer to Bjerg et al. (2020) for a review, Dourmad et al. (2022) for a meta-analysis and Johnston et al. (1999) and Renaudeau et al. (2001) for two specific studies. A common set-up is one with two temperature regimes, and where the expected daily food intake is modeled as a function of maximum or average temperature over the day. As opposed to this, our data provide a continuum of temperature curves; we focus on low quantile levels of food intake and incorporate temperature variation over the whole day. Furthermore, we model the progression of food intake over lactation day.

4.1 Description and Preprocessing of Data

The data come from a commercial research unit in Oklahoma, where 480 sows were monitored from July to October 2013. The animals were divided into 21 groups and then assigned to cells, where they were kept under observations during the lactation period for up to 21 days. For each sow at each lactation day, the food intake (in kg) is available, as well as the cell temperature (in °C), measured every five minutes for 24 h from 2.00 pm to 1.59 pm the following day. Moreover, the parity of each sow is registered, i.e., the number of pregnancies the animal had before the current one. We will consider parity as a measure of age: a sow is “young” if it is at its first pregnancy and it is “old” otherwise. Previous studies have shown that younger and older sows behave differently (Johnston et al. 1999; Staicu et al. 2020), so we analyze data from young and older sows separately. There are 475 sows in total with 237 young sows and 238 old ones, respectively.

The data are illustrated in Fig. 2. Feed intake profiles are plotted in the left panel with profiles from three randomly selected sows from each age group highlighted. Although there is large within-sow variation over lactation days, it is also clear that some sows tend be have low (or high) feed intake throughout, calling for a subject-specific component in the model. In a preprocessing step, we smoothed the temperature curves with FACE (see Sect. 3.3), using a PVE of 99.99% so that most of the features of the curves are maintained. The reconstructed daily temperature trajectories are used in the analysis.

Staicu et al. (2020) used a longitudinal dynamic functional regression framework for mean regression for the same data with emphasis on prediction of response trajectories. Park et al. (2019) carried out separate quantile regression analyses for a derived variable at three selected lactation days. For each day separately, the cumulative distribution function (CDF) was first estimated and then inverted to estimate quantiles of interest. In contrast, we carry out quantile regression for all lactation days simultaneously using the model framework and estimation method introduced in Sects. 2 and 3, and our analysis provides estimates and confidence bands for the temperature effect on quantiles of feed intake. In particular, we consider the estimated quantiles of the feed intake when the daily temperature corresponds to the pointwise 20 and 80% quantiles of the smoothed temperature curves. These two temperature profiles, denoted $\text {Temp}_{20}(\cdot )$ and $\text {Temp}_{80}(\cdot )$, respectively, and shown in Fig. 2 in blue and red, are the most extreme temperature quantiles considered by Park et al. (2019). The pointwise median temperature curves are also plotted in Fig. 2(green).

4.2 Estimated Quantiles of Feed Intake

Denote the observed data by $\{(\text {FI}_{ij}, \text {Temp}_{ij}(\cdot ), t_{ij})\}_{ij}$. For each sow $i=1,\ldots ,N$ ($N=237$ or $N=238$) and repeated measurement $j=1,\ldots ,n_i$ ($n_i$ ranging from 7 to 21), $\text {FI}_{ij}$ refers to the daily feed intake expressed in kg, $\text {Temp}_{ij}(\cdot )$ to the smoothed temperature function in °C recorded over a day and $t_{ij}$ to the lactation day. We allow for a subject-specific intercept $u_i$ to account for the correlation of observations from the same sow. For each age group, we consider the model

$$\begin{aligned} Q_{\text {FI}_{ij}|\text {Temp}_{ij}, u_i}^\tau (t_{ij})&= \alpha ^\tau (t_{ij}) + \int _\mathcal {S} \beta ^\tau (s,t_{ij})\text {Temp}_{ij}(s)ds \nonumber \\&\quad + u_i,\quad i=1,\ldots ,N, j=1,\ldots ,n_i \end{aligned}$$

(4.1)

where $\mathcal {S}$ represents a whole day from 2.00 pm to 1.59 pm. We approximate the smooth intercept $\alpha ^\tau (\cdot )$ using ten cubic splines and the coefficient function $\beta ^\tau (\cdot ,\cdot )$ using a tensor product of ten cubic splines in both directions, with cyclic splines for the s-direction.

Figure 3 shows estimated quantile profiles for young/old sows (left/right), at quantile levels 0.1/0.5 (top/bottom), and for the pointwise 20% and 80% temperature curves $\text {Temp}_{20}(\cdot )$ and $\text {Temp}_{80}(\cdot )$ (colors as above). More specifically, the graphs show

$$\begin{aligned}{} & {} {\hat{Q}}^\tau _{\text {Temp}_{20},0}(t) = {\hat{\alpha ^\tau }}(t) + \int _\mathcal {S} \text {Temp}_{20}(s){\hat{\beta ^\tau }}(s,t) ds,\\{} & {} {\hat{Q}}^\tau _{\text {Temp}_{80},0}(t) = {\hat{\alpha ^\tau }}(t) + \int _\mathcal {S} \text {Temp}_{80}(s){\hat{\beta ^\tau }}(s,t) ds, \end{aligned}$$

plotted over t, for each age group and for $\tau =0.1,0.5$. Notice that no random effects are included in the predictions such that their interpretation is for a “typical sow”.

The solid curves are estimated profiles, and we see a clear distinction between low (blue) and high (red) temperatures, at least from around lactation day five. High temperatures negatively influence the appetite of the sows—they tend to eat more in cooler conditions—and this difference increases over time, particularly at the 0.1 level. The group of young and old sows have similar quantiles of feed intake at the very beginning of their lactation period, followed by a steep increase up to around lactation day 5, a short period with constant feed intake, and a final increase up to a stable plateau. However, estimated increments appear to be smaller for young sows. The dashed curves show the bias-adjusted estimates. Although the bias adjustment is hardly visible, it is actually significantly different from zero at many instances at a 5% significance level (based on pointwise one-sample t-tests on the estimates from the bootstrap data).

In order to illustrate the estimated temperature effects more clearly, Fig. 4 shows the difference between the estimated feed intake quantile profiles for low and high temperatures, i.e., ${{\hat{D}}}^\tau (t) = {\hat{Q}}^\tau _{\text {Temp}_{20},0}(t)- {\hat{Q}}^\tau _{\text {Temp}_{80},0}(t)$. The black curves and confidence bands show the estimates without adjustment and the corresponding model-based 95% pointwise confidence interval based on the variance–covariance matrix extracted from the model fit, and the orange curves and confidence bands show the bias-adjusted estimates and confidence bands obtained by bootstrap, as in Eq. (3.5). We used 100 bootstrap samples for bias adjustment as well as for computation of confidence intervals, cf. Sect. 3.4. Bias adjustment is most noticeable for young sows at the 0.1 quantile, and the bootstrap-generated confidence bands are always wider than the model-based ones. Simulation results have indicated that the model-based standard errors underestimate the actual variation (see the supplementary materials), so we prefer the bootstrap-generated confidence intervals. For completeness, the profiles obtained from bootstrap datasets are shown in Figure S6 in the supplementary materials.

Irrespective of the method used for construction of confidence bands and of the bias adjustment, the overall conclusion is the same: No temperature effect is found early in the lactation period (up to around day five), but at later days quantiles of feed intake are negatively affected by high temperature, both at 0.1 and 0.5 quantile levels. In general, the influence of temperature on the quantiles becomes more prominent along the lactation period, but there are certain differences between age groups and between quantile levels. At quantile level 0.1, the difference in estimates has an increasing trend along lactation days for both groups of sows, while at the median the difference in estimates reaches a maximum of approximately 0.5 around lactation day 10–13 and then flattens out. This might indicate that the sows that eat less are those particularly sensible to the environmental temperature.

4.3 Comparison with Simpler Models

Now, let us turn to a comparison of the model (4.1), with three simpler alternatives. The first modification has $\beta (s,t) \equiv \beta _A(s)$ such that the temperature curves still have functional effects, but with same effect across lactation days; this would correspond to profiles of differences in Fig. 4 being constant. The second modification has $\beta (s,t) \equiv \beta _B(t)$. Then, the model (4.1) becomes

$$\begin{aligned} Q_{\text {FI}_{ij}|\text {Temp}_{ij}, u_i}^\tau (t_{ij}) = \alpha ^\tau (t_{ij}) + \beta _B^\tau (t_{ij}) \int _{\mathcal {S}} \text {Temp}_{ij}(s)\, ds + u_i \end{aligned}$$

(4.2)

such that the quantile depends on the temperature curve only through its integral or, equivalently, the average temperature over the day, and it is no longer a functional quantile regression model. The third modification combines the two previous sub-models; it has $\beta (s,t) \equiv \beta _C$, such that

$$\begin{aligned} Q_{\text {FI}_{ij}|\text {Temp}_{ij}, u_i}^\tau (t_{ij}) = \alpha ^\tau (t_{ij}) + \beta _C^\tau \int _{\mathcal {S}} \text {Temp}_{ij}(s)\, ds + u_i \end{aligned}$$

(4.3)

and the temperature effect is the same across days and depends on the average temperature over the day only.

We measure goodness of fit with the AIC values based on the log-likelihood corresponding to the Extended log-F (ELF) distribution (Fasiolo et al. 2021a) and the effective degrees of freedom (EDF) known from additive models (Wood 2017). The EDF is partitioned into two parts: the effective degrees of freedom for the smooth coefficients, denoted EDF$_{\alpha ,\beta }$, and the degrees of freedom corresponding to the subject-specific intercepts, denoted EDF$_u$.

The results are displayed in Table 1. For both groups, the AIC values are notably smaller for the most complex model than for its competitors for quantile level $\tau =0.1$, while the values are closer among the models at the median. This indicates that it is particularly important to allow for time-varying coefficients and functional effects at lower quantiles. At level $\tau =0.5$, the most complex model is still selected for old sows, but for younger animals the smallest AIC is the one from model (4.2). Furthermore, in all cases, the AIC values from the model with $\beta _B(t)$ are smaller than the AIC value from the model with $\beta _A(s)$, indicating that it is more important to account for the temperature variation in the development along the lactation period than over the day. As a supplement, we also computed BIC values (not shown). The most complex model was still selected for quantile level $\tau =0.1$ (both age groups), whereas the simplest model with neither temperature nor functional effects was selected for the median.

For both groups and at both quantile levels, EDF$_{\alpha ,\beta }$ is the highest for the model (4.1), as expected, since it describes variation in both the s and t directions. Both EDF$_{\alpha ,\beta }$ and EDF$_u$ are larger when estimation is carried out at the median rather than at the 10% level; most likely because there is more information in the data to estimate the median, which in turn allows for higher flexibility. Finally, EDF$_u$ is always between 188 and 207 and thus smaller than 237 and 238, the number of young and old sows, respectively, so random effects are penalized to some degree.

Table 1 AIC and sum of effective degrees of freedom for young and old animals when adopting model (4.1) (first column), model (4.1) with $\beta (s,t) = \beta _A(s)$ (second column), model (4.2) (third column) and model (4.3) (fourth column)

Full size table

4.4 Estimated Effect of Temperature

Finally, we turn the attention to the estimated coefficient function ${\hat{\beta ^\tau }}(\cdot ,\cdot )$ in model (4.1). It is important to keep in mind that the estimated functional coefficient is only identifiable up to elements belonging to the orthogonal complement of the space spanned by the basis functions used in the finite-dimensional representation of the functional covariates. Thus, the interpretation of ${\hat{\beta ^\tau }}(\cdot ,\cdot )$ must be taken with caution. Figure 5 shows ${\hat{\beta ^\tau }}(\cdot ,\cdot )$ for each age group and at quantile levels 0.1 and 0.5, respectively. In each panel, $s\mapsto {\hat{\beta ^\tau }}(s,t)$ is plotted for t fixed at each lactation day, on a color scale that ranges from orange to green as the longitudinal time t goes by. Recall that, by construction, the development in both s and t direction is smooth, and the functions are cyclic over day.

The estimated coefficient functions are predominantly negative, corresponding to an overall negative effect of temperature, cf. Fig. 2. With the risk of overinterpretation, we see that the impact of temperature on feed intake is most prominent in the night hours (from about 8 pm to about 12 pm, and in the early morning at late lactation days for young sows at the median). This is also the time of the day with the largest differences in temperature effects between lactation days. Sensitivity against temperature appears to increase over lactation days, but stabilizes earlier for young compared to older sows.

5 Discussion

This work was motivated by the study of heat stress effects on lactating sows. We used a model framework for scalar-on-function quantile regression for clustered or longitudinal data where dependence within cluster/subject is taken into account by including cluster- or subject-specific intercept parameters. Estimation relies on basis expansions, more specifically penalized splines. We adapted existing software, making the methodology more easily applicable for practitioners, see appendix for implementation details and the supplementary materials fore example code. Other basis functions can be considered for the expansions. For instance, an eigenfunction basis could be used, with the number of basis functions selected using the AIC criterion (inspired by Kato 2012). This approach was studied in Battagliola (2021) and yielded more wiggly estimates of $\beta ^\tau (\cdot ,\cdot )$ compared to those in Fig. 5. We prefer the spline expansions over the eigenfunction expansions, particularly because they avoid the additional step of selecting the size of the basis.

Another alternative is the wavelet basis expansion with a LASSO penalty on the coefficients. This approach, applied to mean regression by Zhao et al. (2012) and Mousavi and Sørensen (2017), was recently proposed for quantile regression by Wang et al. (2019) and Yu et al. (2019). An intriguing avenue for future research would be to compare the results from our work with those arising from this wavelet basis expansion method.

Our analysis helped uncover some interesting insights for the sow data application. First, the feed intake quantiles are similar for younger and older sows close to giving birth, but increase faster (and to a higher level) for older than younger sows, suggesting that sows at their second or later pregnancy acclimatize faster to the situation. Second, a high temperature in the stable affects feed intake negatively except for the early days in the lactation period; this is the case for both younger and older sows, and both at the median and at the 0.1 quantile levels. At early lactation days, the temperature effect is not significant. Similar effects were reported by Renaudeau et al. (2001, Figure 1), except that their analyses did not distinguish different parities. It appears that older sows are slightly more sensitive to high temperatures than younger sows when it comes to feed intake, which is surprising because the opposite has been reported for farrowing rate (Bloemhof et al. 2013). Third, the estimated temperature effect is generally larger at the 0.1 level compared to the 0.5 level, suggesting that the lower tail of daily feed intake for a sow is more sensible to variation in temperature; however, this should be investigated further. Fourth, there is an increasing trend of the temperature effect throughout the lactation period at the 0.1 quantile level. This is confirmed by model comparisons where models with time-varying temperature effect are preferred over models with constant temperature effects, and is also in line with findings in Johnston et al. (1999, Table 5). Fifth, for both groups and at the 0.1 quantile level, the model with functional effect of temperature over the day is preferred over a model which includes the average temperature only. This suggests that the shape, not only the level, of the temperature profile affects the fraction of sows with a low feed intake. This is in line with the results for mean regression for the same dataset (Staicu et al. 2020), but the topic has apparently never been examined for other datasets in the animal science literature.

We have focused on models with a single functional covariate, but they could be extended to include more than one functional covariate or a mixture of functional and scalar covariates in a straightforward way. Moreover, several grouping levels could be included as random effects; this could be relevant in the application because sows were kept together in the stables. It remains to study the robustness of estimates in such more complex models. The proposed models have similar flavor as models from Brockhaus et al. (2020), but we were not able to get reliable estimates with the accompanying software.

We adjusted estimates and standard errors with bootstrap methods. The sampling schemes have been used and studied in simpler models (Galvao and Montes-Rojas 2015; Battagliola et al. 2022), but further examination would be interesting in the current set-up. Another future research topic is the development of hypothesis testing procedures for the regression coefficient functions; see Li et al. (2022). The recent approaches of Abramowicz et al. (2018) and Pini et al. (2023) might be helpful for this purpose. Preliminary ideas involve test statistics computed as integrals over the domain $\mathcal {S}$ of pointwise test statistics and bootstrap computations for evaluation of their null distributions. The main challenge lies in designing appropriate permutation schemes that comply with the dependence structures in the data.

References

Abramowicz K, Häger C, Pini A, Schelin L, Sjöstedt de Luna S, Vantini S (2018) Nonparametric inference for functional-on-scalar linear models applied to knee kinematic hop data after injury of the anterior cruciate ligament. Scand J Stat 45:1036–1061
Article MathSciNet Google Scholar
Battagliola ML (2021) Quantile regression for scalar and functional clustered data and data analysis with phase-amplitude separation. PhD thesis, University of Copenhagen. Available at http://web.math.ku.dk/noter/filer/phd21mlb.pdf
Battagliola ML, Sørensen H, Tolver A, Staicu A-M (2022) A bias-adjusted estimator in quantile regression for clustered data. Econom Stat 23:165–186
MathSciNet Google Scholar
Besstremyannaya G, Golovan S (2019) Reconsideration of a simple approach to quantile regression for panel data. Economet J 22(3):292–308
Article MathSciNet Google Scholar
Bissiri PG, Holmes CC, Walker SG (2016) A general framework for updating belief distributions. J R Stat Soc Ser B (Stat Methodol) 78(5):1103–1130
Article MathSciNet CAS Google Scholar
Bjerg B, Brandt P, Pedersen P, Zhang G (2020) Sows’ responses to increased heat load—a review. J Therm Biol 94:102758
Article PubMed Google Scholar
Bloemhof S, Mathur P, Knol E, Van der Waaij E (2013) Effect of daily environmental temperature on farrowing rate and total born in dam line sows. J Anim Sci 91(6):2667–2679
Article CAS PubMed Google Scholar
Brockhaus S, Rügamer D, Greven S (2020) Boosting functional regression models with FDboost. J Stat Softw 94(10):1–50
Canay IA (2011) A simple approach to quantile regression for panel data. Economet J 14(3):368–386
Article MathSciNet Google Scholar
Cardot H, Crambes C, Sarda P (2005) Quantile regression when the covariates are functions. J Nonparametric Stat 17(7):841–856
Article MathSciNet Google Scholar
Chen K, Müller H-G (2012) Conditional quantile analysis when covariates are functions, with application to growth data. J R Stat Soc Ser B (Stat Methodol) 74(1):67–89
Article MathSciNet Google Scholar
Chen K, Müller H-G (2012) Modeling repeated functional observations. J Am Stat Assoc 107(500):1599–1609
Article MathSciNet CAS Google Scholar
Chernozhukov V, Hong H (2003) An MCMC approach to classical estimation. J Econom 115(2):293–346
Article MathSciNet Google Scholar
Dourmad J-Y, Le Velly V, Gourdine J-L, Renaudeau D (2022) Effect of ambient temperature in lactating sows, a meta-analysis and simulation approach in the context of climate change. Anim Open Space 1:100025
Article Google Scholar
Fasiolo M, Wood SN, Zaffran M, Nedellec R, Goude Y (2021a) Fast calibrated additive quantile regression. J Am Stat Assoc 116(535):1410–1412
Fasiolo M, Wood SN, Zaffran M, Nedellec R, Goude Y (2021b) qgam: Bayesian nonparametric quantile regression modeling in R. J Stat Softw 100(9):1–31
Feng X, He X, Hu J (2011) Wild bootstrap for quantile regression. Biometrika 98(4):995–999
Article MathSciNet PubMed PubMed Central Google Scholar
Fenske N, Fahrmeir L, Hothorn T, Rzehak P, Höhle M (2013) Boosting structured additive quantile regression for longitudinal childhood obesity data. Int J Biostat 9(1):1–18
Article MathSciNet Google Scholar
Fernandes M, Guerre E, Horta E (2021) Smoothing quantile regressions. J Bus Econ Stat 39(1):338–357
Article MathSciNet Google Scholar
Galvao AF, Kato K (2016) Smoothed quantile regression for panel data. J Econom 193(1):92–112
Article MathSciNet Google Scholar
Galvao A, Montes-Rojas G (2015) On bootstrap inference for quantile regression panel data: a Monte Carlo study. Econometrics 3(3):654–666
Article Google Scholar
Geraci M (2019) Additive quantile regression for clustered data with an application to children’s physical activity. J Roy Stat Soc Ser C (Appl Stat) 68(4):1071–1089
Article MathSciNet Google Scholar
Geraci M, Bottai M (2014) Linear quantile mixed models. Stat Comput 24(3):461–479
Article MathSciNet Google Scholar
Goh SC, Knight K (2009) Nonstandard quantile-regression inference. Econom Theor 25(5):1415–1432
Article MathSciNet Google Scholar
Goldsmith J, Crainiceanu C, Caffo B, Reich D (2012) Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. J Roy Stat Soc Ser C (Appl Stat) 61(3):453–469
Article MathSciNet Google Scholar
Goldsmith J, Scheipl F, Huang L, Wrobel J, Di C, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, Reiss PT (2023) refund: Regression with Functional Data. R package version 0.1-30
Greven S, Scheipl F (2017) A general framework for functional regression modelling. Stat Model 17(1–2):1–35
Article MathSciNet Google Scholar
Greven S, Crainiceanu C, Caffo B, Reich D (2010) Longitudinal functional principal component analysis. Electron J Stat 4:1022–1054
Article MathSciNet PubMed PubMed Central Google Scholar
Gu J, Volgushev S (2019) Panel data quantile regression with grouped fixed effects. J Econom 213(1):68–91
Article MathSciNet Google Scholar
Harding M, Lamarche C (2017) Penalized quantile regression with semiparametric correlated effects: an application with heterogeneous preferences. J Appl Econom 32(2):342–358
Article MathSciNet Google Scholar
He X, Pan X, Tan KM, Zhou W-X (2021) Smoothed quantile regression with large-scale inference. J Econom 232:367–388
Article MathSciNet PubMed PubMed Central Google Scholar
Horowitz JL (1998) Bootstrap methods for median regression models. Econometrica 66(6):1327–1351
Article MathSciNet Google Scholar
Johnston L, Ellis M, Libal G, Mayrose V, Weldon W (1999) Effect of room temperature and dietary amino acid concentration on performance of lactating sows. NCR-89 committee on swine management. J Anim Sci 77:1638–44
Article CAS PubMed Google Scholar
Karlsson A (2009) Bootstrap methods for bias correction and confidence interval estimation for nonlinear quantile regression of longitudinal data. J Stat Comput Simul 79(10):1205–1218
Article MathSciNet Google Scholar
Kato K (2012) Estimation in functional linear quantile regression. Ann Stat 40(6):3108–3136
Article MathSciNet Google Scholar
Kato K, Galvao AF, Montes-Rojas G (2012) Asymptotics for panel quantile regression models with individual effects. J Econom 170(1):76–91
Article MathSciNet Google Scholar
Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89
Article MathSciNet Google Scholar
Koenker R (2005) Quantile regression. Econometric society monographs. Cambridge University Press, New York
Book Google Scholar
Koenker R, Bassett G Jr (1978) Regression quantiles. Econometrica 46:33–50
Article MathSciNet Google Scholar
Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of quantile regression. CRC Press, Boca Raton
Book Google Scholar
Koner S, Staicu A-M (2023) Second-generation functional data. Ann Rev Stat Appl 10:547–572
Article MathSciNet Google Scholar
Lamarche C (2010) Robust penalized quantile regression estimation for panel data. J Econom 157(2):396–408
Article MathSciNet Google Scholar
Lancaster T (2000) The incidental parameter problem since 1948. J Econom 95(2):391–413
Article MathSciNet Google Scholar
Li M, Wang K, Maity A, Staicu A-M (2022) Inference in functional linear quantile regression. J Multivar Anal 190:104985
Article MathSciNet PubMed PubMed Central Google Scholar
Liu RY (1988) Bootstrap procedures under some non-i.i.d. models. Ann Stat 16(4):1696–1708
Article MathSciNet Google Scholar
Mousavi SN, Sørensen H (2017) Multinomial functional regression with wavelets and lasso penalization. Econom Stat 1:150–166
MathSciNet Google Scholar
Neyman J, Scott E (1948) Consistent estimates based on partially consistent observations. Econometrica 16(1):1–32
Article MathSciNet Google Scholar
Otsu T (2008) Conditional empirical likelihood estimation and inference for quantile regression models. J Econom 142(1):508–538
Article MathSciNet Google Scholar
Park SY, Staicu A-M (2015) Longitudinal functional data analysis. Stat 4(1):212–226
Article MathSciNet PubMed Google Scholar
Park SY, Li C, Mendoza Benavides SM, van Heugten E, Staicu AM (2019) Conditional analysis for mixed covariates, with application to feed intake of lactating sows. J Probab Stat 2019:3743762
Article Google Scholar
Pini A, Sørensen H, Tolver A, Vantini S (2023) Local inference for functional linear mixed models. Comput Stat Data Anal 181:107688
Article MathSciNet Google Scholar
R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Ramsay J, Silverman B (2005) Functional data analysis, second edition. Springer, New York
Book Google Scholar
Renaudeau D, Noblet J (2001) Effects of exposure to high ambient temperature and dietary protein level on sow milk production and performance of piglets. J Anim Sci 79(6):1540–1548
Article CAS PubMed Google Scholar
Renaudeau D, Quiniou N, Noblet J (2001) Effects of exposure to high ambient temperature and dietary protein level on performance of multiparous lactating sows. J Anim Sci 79(5):1240–1249
Article CAS PubMed Google Scholar
Staicu A-M, Islam MN, Dumitru R, van Heugten E (2020) Longitudinal dynamic functional regression. J Roy Stat Soc Ser C (Appl Stat) 69(1):25–46
Article MathSciNet Google Scholar
Wang L, Van Keilegom I, Maidman A (2018) Wild residual bootstrap inference for penalized quantile regression with heteroscedastic errors. Biometrika 105(4):859–872
Article MathSciNet Google Scholar
Wang Y, Kong L, Jiang B, Zhou X, Yu S, Zhang L, Heo G (2019) Wavelet-based lasso in functional linear quantile regression. J Stat Comput Simul 89(6):1111–1130
Article MathSciNet Google Scholar
Wood S (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Wood S, Scheipl F (2020) gamm4: Generalized additive mixed models using ‘mgcv’ and ‘lme4’. R package version 0.2-6
Wu CFJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14(4):1261–1295
MathSciNet Google Scholar
Xiao L, Zipunnikov V, Ruppert D, Crainiceanu C (2016) Fast covariance estimation for high-dimensional functional data. Stat Comput 26(1–2):409–421
Article MathSciNet PubMed Google Scholar
Xiao L, Li C, Checkley W, Crainiceanu C (2018) Fast covariance estimation for sparse functional data. Stat Comput 28:511–522
Article MathSciNet PubMed Google Scholar
Yao F, Müller H-G, Clifford AJ, Dueker SR, Follett J, Lin Y, Buchholz BA, Vogel JS (2003) Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 59(3):676–685
Article MathSciNet PubMed Google Scholar
Yu D, Zhang L, Mizera I, Jiang B, Kong L (2019) Sparse wavelet estimation in quantile regression with multiple functional predictors. Comput Stat Data Anal 136:12–29
Article MathSciNet Google Scholar
Zhao Y, Ogden RT, Reiss PT (2012) Wavelet-based LASSO in functional linear regression. J Comput Graph Stat 21(3):600–617
Article MathSciNet PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors also wish to acknowledge that revisions were carried out, while the first author was supported by the European Research Council under Grant CoG 2015-682172NETS, within the Seventh European Union Framework Program.

Funding

Open access funding provided by EPFL Lausanne. The project was partly funded by the Danish Research Council (DFF Grant 7014-00221).

Author information

Authors and Affiliations

School of Basic Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Maria Laura Battagliola
Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
Helle Sørensen & Anders Tolver
Department of Statistics, North Carolina State University, Raleigh, NC, USA
Ana-Maria Staicu

Authors

Maria Laura Battagliola
View author publications
You can also search for this author in PubMed Google Scholar
Helle Sørensen
View author publications
You can also search for this author in PubMed Google Scholar
Anders Tolver
View author publications
You can also search for this author in PubMed Google Scholar
Ana-Maria Staicu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Laura Battagliola.

Ethics declarations

Conflict of interest

Moreover, the authors declare they do not have any conflict of interest.

Data Availability

The datasets generated during and/or analyzed during the current study are not publicly available due to write proprietary reasons but are available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (html 2448 KB)

Supplementary file 2 (zip 1175 KB)

Appendix

1.1 Bootstrap Schemes

We detail the resampling schemes mentioned in Sect. 3.4 and used to compute part of the results in Sect. 4.2.

1.1.1 Block Resampling (for Assessment of Sampling Variation of Estimator)

Let $c_1^*, \ldots , c_N^*$ be sampled with replacement from the index set of clusters, $\{1,\ldots ,N\}$, and define the bootstrap dataset as $\{(Y_{c^*_ij},X_{c^*_ij}(s_h), t_{c^*_ij})\}_{ijh}$. In this way, the within-subject dependence is maintained. For a target, $\theta $, defined from model parameters, we proceed as follows: Draw a bootstrap sample as just described, carry out estimation, and compute the estimated target. Repeat this B times, and denote the estimates ${\tilde{\theta }}_1,\ldots ,\tilde{\theta }_B$. Finally, compute the standard deviation over the bootstrap estimates as stated in the main text. The same method was used by Canay (2011) and Geraci and Bottai (2014).

1.1.2 Combination of Standard Resampling and Wild Bootstrap (for Assessment of Bias)

A bootstrap dataset consists of $\{(Y_{ij}^*,X_{ij}(s_h), t_{ij})\}_{ijh}$ where

$$\begin{aligned} Y_{ij}^* = {\hat{\alpha ^\tau }}(t_{ij}) + \int _\mathcal {S} {\hat{\beta ^\tau }}(s,t_{ij}) X_{ij}(s)\, ds + \varepsilon _{ij}^* + u_i^*. \end{aligned}$$

(5.1)

The estimates ${\hat{\alpha ^\tau }}(\cdot )$ and ${\hat{\beta ^\tau }}(\cdot ,\cdot )$ are based on the observed data. Notice that the values of $X_{ij}(s_h)$ and $t_{ij}$ from the observed data are used unchanged. The subject-specific intercepts $u_1^*,\ldots ,u_N^*$ are drawn with replacement from the estimates ${\hat{u}}_1,\ldots ,{\hat{u}}_N$ obtained from the observed data, and the error terms $\{\varepsilon _{ij}^*\}_{ij}$ are generated via wild bootstrap. This means that $\varepsilon ^*_{ij}=w_{ij}|\varepsilon _{ij}|$, where $\varepsilon _{ij}= Y_{ij} - {\hat{\alpha ^\tau }}(t_{ij}) -\int _\mathcal {S} {\hat{\beta ^\tau }}(s,t_{ij}) X_{ij}(s)\, ds -{\hat{u}}_i$ are residuals from the model, and $w_{ij}$s are drawn independently as

$$\begin{aligned} w_{ij}=\left\{ \begin{array}{ll} 2(1-\tau ), &{}\quad \text {with probability}\; 1-\tau \\ -2\tau , &{}\quad \text {with probability}\; \tau \end{array}\right. \end{aligned}$$

For a target of interest, $\theta $, and estimates ${\check{\theta }}_1,\ldots ,{\check{\theta }}_B$ computed from B bootstrap samples, the bias of ${\hat{\theta }}$ is estimated as explained in the main text. Wild bootstrap was introduced by Wu (1986) and Liu (1988) for mean regression, and adapted to quantile regression by Feng et al. (2011). Results in Feng et al. (2011), Wang et al. (2018) and Battagliola et al. (2022) indicate that wild bootstrap captures asymmetry and heteroskedasticity better than ordinary resampling of residuals.

1.2 Implementation

We used the software environment R (R Core Team 2023) for the computations. The FACE method used for smoothing is implemented in the function fpca.face, which is part of package refund (Goldsmith et al. 2023). It can handle functional data observed on a dense or a sparse grid and also allows for missing values. One specifies either the selected PVE (pve) or the number of principal components (npc) of choice. The resulting eigenfunctions, the functional mean, and the predicted/smoothed functions are evaluated and returned at a dense grid.

In order to fit fQGAM, we rely on the package qgam (Fasiolo et al. 2021b). It includes the qgam function, which is a wrapper of the function gam (Wood 2017; Wood and Scheipl 2020). The call to qgam has the following structure:

where y is the response and formula specifies any ordinary covariates, smooth effects, and random effects to include in the model. The quantile level of interest $\tau $ is passed to qu, and the entry data specifies the data frame of interest. The formula for qgam works with the same syntax as for gam. Hence, smooth terms are included with s() in the univariate case and with te() in the multivariate case. In both cases the user can choose the type and the number of basis functions by passing the arguments bs and k, respectively. Importantly for this paper, the functional covariate is included in the smooth term with the option by = Xhat, where Xhat is the matrix of (possibly) pre-smoothed functional covariate values, along with the grids of times points of observation, in form of a matrix. The subject-specific intercept is included as a smooth term with bs=’re’.

The output from qgam is a gamObject, which stores several quantities related to the model and the estimation process, such as twice the log-likelihood (logLik) and the estimated effective degrees of freedom (edf2), which are used for computation of AIC values in our application. Moreover, Vp, the variance-covariance matrix of all estimated coefficients, is available, and can be used to compute model-based standard errors and confidence bands for functions of the parameters, such as the targets mentioned in Sect. 3.4. Estimated quantiles for new covariate functions can be computed with the function predict.gam, and the function allows to exclude one or more terms from the model, such as the subject-specific intercepts, in the prediction.

Finally, we have implemented the bootstrap schemes described in Sect. 3.4, namely block bootstrap and wild bootstrap, and the R functions are available as part of Supplementary Material in Section S3. Moreover, data preparation, implementation of fQGAM, and bootstrap-based inference results are provided in an example using a dataset which is available in the refund package (Goldsmith et al. 2023).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Battagliola, M.L., Sørensen, H., Tolver, A. et al. Quantile Regression for Longitudinal Functional Data with Application to Feed Intake of Lactating Sows. JABES (2024). https://doi.org/10.1007/s13253-024-00601-5

Download citation

Received: 05 May 2023
Revised: 08 December 2023
Accepted: 03 January 2024
Published: 06 February 2024
DOI: https://doi.org/10.1007/s13253-024-00601-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quantile Regression for Longitudinal Functional Data with Application to Feed Intake of Lactating Sows

Abstract

Similar content being viewed by others

A new standard model for milk yield in dairy cows based on udder physiology at the milking-session level

Segmented regression to describe cumulative milk production of grazing dual-purpose Holstein-Zebu cows

New residual feed intake criterion for longitudinal data

1 Introduction

2 Framework

3 Estimation Methodology

3.1 Representation of the functional coefficient and smooth intercept

3.2 Estimation of Coefficients and Random Intercepts

3.3 Covariates Observed with Noise

3.4 Bootstrap Procedures for Variance Assessment and Bias Adjustment

4 Lactating Sows’ Feed Intake

4.1 Description and Preprocessing of Data

4.2 Estimated Quantiles of Feed Intake

4.3 Comparison with Simpler Models

4.4 Estimated Effect of Temperature

5 Discussion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Data Availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (html 2448 KB)

Supplementary file 2 (zip 1175 KB)

Appendix

Appendix

1.1 Bootstrap Schemes

1.1.1 Block Resampling (for Assessment of Sampling Variation of Estimator)

1.1.2 Combination of Standard Resampling and Wild Bootstrap (for Assessment of Bias)

1.2 Implementation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation