Variable selection in nonparametric functional concurrent regression

We develop a new method for variable selection in nonparametric functional concurrent regression. The commonly used functional linear concurrent model (FLCM) is far too restrictive in assuming linearity of the covariate effects, which is not necessarily true in many real‐world applications. The nonparametric functional concurrent model (NPFCM), on the other hand, is much more flexible and can capture complex dynamic relationships present between the response and the covariates. We extend the classically used variable selection methods, e.g., group LASSO, group SCAD and group MCP, to perform variable selection in NPFCM. We show via numerical simulations that the proposed variable selection method with the non‐convex penalties can identify the true functional predictors with minimal false‐positive rate and negligible false‐negative rate. The proposed method also provides better out‐of‐sample prediction accuracy compared to the FLCM in the presence of nonlinear effects of the functional predictors. The proposed method's application is demonstrated by identifying the influential predictor variables in two real data studies: a dietary calcium absorption study, and some bike‐sharing data.


INTRODUCTION
Function-on-function regression refers to the class of regression models with both the response variable and the covariates being functions over a continuous domain, such as time. The functional concurrent regression model is a particular class of function-on-function regression, where the value of the response at the current time point depends only on the values of the predictors at that specific time point. The most commonly used concurrent regression model is the functional linear concurrent model (FLCM) (Ramsay & Silverman, 2005), which assumes that the relationship between the response and the covariates is linear and modelled using smooth univariate regression functions. These regression functions can capture the time-varying effect of the covariates on the response. Estimation and inference in the functional linear concurrent regression and the closely related varying coefficient model (Hastie & Tibshirani, 1993) have been widely studied in the literature.
Although FLCM is useful and finds applications in longitudinal data analysis, it is restrictive in assuming a linear relationship between the response and predictors. The nonparametric functional concurrent model (NPFCM) overcomes this limitation by specifying a nonparametric relationship that is more general and flexible for modelling purposes. Multiple methods exist in the literature for estimation and inference in NPFCM using smoothing splines (Kim, Maity & Staicu, 2018a), Gaussian process regression (Shi, Murray-Smith & Titterington, 2005) and local kernel smoothing techniques (Jiang & Wang, 2011); see Maity (2017) and the references therein for a review of various methods regarding NPFCM. With the advent of modern high-throughput technologies, functional data structures have recently become complex and high-dimensional in nature, and this has motivated researchers to develop variable selection methods to identify the true underlying set of influential predictors. Several variable selection methods exist in the functional data analysis literature for scalar-on-function (Gertheiss, Maity & Staicu, 2013;Fan, James & Radchenko, 2015) and function-on-scalar (Chen, Goldsmith & Ogden, 2016;Parodi & Reimherr, 2018) regression models. However, the literature on variable selection in the functional concurrent regression model is relatively sparse. Goldsmith & Schwartz (2017) proposed a variable selection method for the FLCM using the variational Bayes approach. Recently, Ghosal et al. (2020) developed a variable selection method for FLCM regression by extending the classically used penalized variable selection methods. However, these methods are for the FLCM, which might not capture the actual relationship between the response and the predictors.
In this article, we consider an additive nonparametric functional concurrent regression model (Kim et al., 2018b), which is an extension of the functional generalized additive model (FGAM) (McLean et al., 2014) for functional responses. Scheipl et al. (2016) extended this class of models to non-Gaussian functional responses, allowing for nested or crossed functional random effects. We develop a variable selection method for the NPFCM, extending the classically used penalized variable selection methods like LASSO (Tibshirani, 1996), SCAD (Fan & Li, 2001) and MCP (Zhang, 2010). To the best of our knowledge, ours is the first work to develop a variable selection method for NPFCM. We follow the approach of Ghosal et al. (2020) and show that for nonparametric functional concurrent regression, the variable selection problem reduces to a group LASSO (Yuan & Lin, 2006) and its natural extension, namely a group SCAD or a group MCP problem. Through simulations, we illustrate that the proposed variable selection method can identify the true underlying predictors with minimal false-positive rate (FPR) and negligible false-negative rate (FNR). The proposed method is flexible and shown to produce satisfactory performance even when the functional covariates are sparsely observed and are contaminated with measurement error. Comparisons with the variable selection method in FLCM are provided to illustrate potential gain in terms of selection accuracy and predictive performance using the variable selection method for NPFCM. We demonstrate the application of DOI: 10.1002/cjs.11654 The Canadian Journal of Statistics / La revue canadienne de statistique the proposed method on a dietary calcium absorption study data (Davis, 2002) and bikeshare data (Fanaee-T & Gama, 2014) for modelling the number of daily bike rentals in Washington, D.C., while taking into account the diurnal effect of continuous meteorological covariates, e.g., temperature, humidity and wind speed. The proposed variable selection method is employed to identify the relevant meteorological factors while simultaneously estimating their nonlinear dynamic effects. The rest of this article is organized in the following way. We present our modelling framework, set up the nonparametric functional concurrent regression model and illustrate the proposed variable selection method in Section 2. In Section 3, we present simulation studies to evaluate the empirical performance of the proposed method. In Section 4, the real data applications are presented. In Section 5, we deliberate over the contribution and some technical aspects of our method and also propose some future extensions based on the current research.

Modelling Framework
We assume that the observed data for the ith subject is represents the functional response and X 1 (⋅), X 2 (⋅), … , X p (⋅) are the corresponding functional covariates. In practice, we observe both the response and the predictors only on finitely many time points over some closed and bounded interval. In developing our method, we assume that the covariates and the response are observed on a dense and regular grid of points for some a, b ∈ ℝ and that the covariates are observed without any measurement error. We later illustrate how the proposed method can be easily extended to accommodate more general scenarios where the covariates are contaminated with measurement error and observed on an irregularly spaced grid. We consider the following nonparametric functional concurrent regression model: (1) Here, Y (t) is the functional intercept, and F {X (t), t} are smooth bivariate functions on ℝ × S capturing the concurrent effect of covariate X (⋅). Estimation and prediction in the NPFCM has been studied by Kim, Maity & Staicu (2018a). If E [F {X (t), t}] = 0, the functions F (⋅, ⋅) are identifiable and Y (⋅) represents the marginal mean of the response curves. The covariates X i (⋅) are assumed to be independent and identically distributed (i.i.d.) copies of X (⋅) ( = 1, 2, … , p), where X (⋅) is a smooth stochastic process. We further assume that i (⋅) are i.i.d. copies of (⋅), which is a mean-zero stochastic process with unknown and nontrivial covariance structure. We model the intercept function = ( ,1 , ,2 , … , ,p 0 ) T is a vector of unknown coefficients. We use cubic B-spline basis functions in this article; however, other basis functions can also be used. For modelling the bivariate functions F (⋅, ⋅), we express them in terms of bivariate basis expansion using the tensor product of B-spline basis functions (McLean et al., 2014;Kim, Maity & Staicu, 2018a) are sets of known B-spline basis functions over x (the image of the process X (⋅)) and t, respectively. Let us define B T ,0 (t) = 1 for = 1, … , p and include them in the basis set over t . We model F (⋅, ⋅) using a tensor product of these two sets of basis functions as vector {B X ,k (X i (t))B T , (t)} K ,L k=1, =0 of length K (L + 1) as Z i (t) T and is the vector of unknown basis coefficients ,k, , i.e., = ( ,1,0 , ,1,1 , … , ,1,L ,…, ,K ,L ) T . For the purpose of variable selection, we do not include 1 as a basis in {B X ,k (x)} K k=1 . Hence, we are able to estimate the joint effect of x, t on response Y(t) adjusted for the main effect of t, which gets included in the functional intercept Y (t). Using the basis expansions of Y (⋅) and F (⋅, ⋅), NPFCM (1) can be reformulated as This minimization problem is identical to performing a group LASSO (Yuan & Lin, 2006), where the grouping is introduced by the covariates Z , and depending on whether = 0, X (⋅) will be either absent or present in the model. The tuning parameters for this minimization are the penalty parameter and the number of B-spline basis functions K and L , which control the smoothness of the function F (⋅, ⋅) in x and t directions, respectively. Instead of directly using a roughness penalty on the functions, we follow a truncated basis approach (Ramsay & Silverman, 2005;Fan, James & Radchenko, 2015) and restrict the number of basis functions to be small, which ensures that the resulting functions are smooth in both the variables. Next, we extend this group LASSO formulation to two non-convex penalties. It was illustrated in Ghosal et al. (2020) that group LASSO can have a very high FPR for variable selection in FLCM and produce biased estimates, which in turn can affect the prediction accuracy of the model. The non-convex penalties SCAD (Fan & Li, 2001) and MCP (Zhang, 2010), on the other hand, are known (Mazumder, Friedman & Hastie, 2011;Breheny & Huang, 2015) to produce sparser solutions in classical regression literature and also to overcome the bias problem of LASSO due to their relaxed rate of penalization. Unlike adaptive LASSO (Zou, 2006), these DOI: 10.1002/cjs.11654 The Canadian Journal of Statistics / La revue canadienne de statistique methods do not require initial estimates of weights. As pointed out by Fan, Samworth & Wu (2009), one drawback of adaptive LASSO is that the final solution is always sparser than the initial estimate owing to zero being an absorbing state. On the contrary, SCAD and MCP do not face this problem. Subsequently, we propose a group SCAD-and a group MCP-based criterion, extending the group LASSO formulation (4) for performing variable selection in the NPFCM. For the group SCAD method, we perform variable selection and obtain an estimate of in the following way: where the SCAD penalty P SCAD, , (|| || 2 ) is defined as Analogously, for the group MCP method, we use the group minimax concave penalty (MCP) on the parameters and estimate aŝ where the penalty P MCP, , (|| || 2 ) is defined as Remark 1. For the construction of B X ,k (x) (B-spline basis), we use equally spaced knots in the range of observed data {X i (t)} n i=1 . The tuning parameters K and L (the number of basis functions in each direction) and the penalty parameter are selected using the extended BIC (EBIC) (Chen & Chen, 2008) of a corresponding Gaussian likelihood. Although we make no distributional assumption, this has shown satisfactory performance, as illustrated in our empirical studies. Some other information criterion or data-driven method can also be used, depending on the problem. We use the same number of basis functions K = K, L = L for = 1, … , p, which is a standard assumption for computational tractability and assumes that all the nonparametric functional effects have more or less the same level of smoothness in both the variables. For the tuning parameter , we use the values 4 for SCAD and 3 for MCP, as recommended by the original authors of these methods. We use the grpreg package (Breheny & Huang, 2015) in R (R Core Team, 2018) for implementation of the group SCAD and group MCP method.

Extension to Sparse and Noisy Data
As a general extension, we consider the case where data are observed on an irregular and sparse grid and covariates are possibly contaminated with measurement error. This situation arises frequently for longitudinal studies where there are few subject-specific visits. Here we have response {(Y i (t i ), t i ), = 1, 2, … , m i }, and the observed covariates are We denote U k (t ki )s, (k = 1, 2, 3, … , p) by U i k , which represent the observed covariate with measurement error, i.e., we have U i k = X k (t ki ) + e i k for i = 1, 2, … , n, = 1, 2, … , m ki and k = 1, 2, … , p, where the measurement errors e i k are assumed to be white noises with zero mean and variance 2 k . We assume that, although the individual number of observations m i is small, (Kim, Maity & Staicu, 2018a;Ghosal et al., 2020). Under such a set-up, the eigenvalues and eigenfunctions corresponding to the original curves can be estimated using functional principal component analysis (FPCA) (Yao, Müller & Wang, 2005). The scores can be estimated through conditional expectation, using the principal analysis by conditional estimation method (Yao, Müller & Wang, 2005), and finally the estimates can be put together using Karhunen-Loeve expansion to get the estimated where the number of eigenfunctions S is chosen using the percent of variance explained (PVE) criterion. Hence, for sparse data observed on an irregular grid and contaminated with measurement error, we treat as our original data and use these data for performing variable selection.

Centring and Scaling of Covariates
One of the problems that is likely to arise for irregularly observed sparse functional data is that some of the B-splines B X ,k (x) might not have any observed data in its support, particularly if the image of the process X (t) is not dense over ℝ. To overcome this limitation, we employ point-wise centring and scaling of functional covariates X (t), as recommended in Kim, Maity & Staicu (2018a). In particular, we transform the functional covariate X (t) as X * (t) = where (t) and (t) are the mean and standard deviation of X (t). The transformed covariate can be interpreted as how much the amount of standard deviation X (t) is away from the mean (t) at time t. This normalization, first of all, brings all the covariates to a comparable scale. Second, and more importantly, since X * (t) is bounded (at least stochastically), it facilitates construction of the B-splines B X ,k (x). In practice, we estimatê(t) and̂(t) from the FPCA step and use the for performing variable selection.
Our empirical analysis shows good performance in terms of selection accuracy and out-of-sample prediction performance using the above strategy; henceforth, throughout this article, we pre-process (denoise and standardize) the observed functional covariates before applying the proposed variable selection method. The overall procedure for performing variable selection in the NPFCM is presented as an algorithm in Algorithm 1.
Algorithm 1 Variable selection method for NPFCM 1. Pre-process functional covariate X i (t) using FPCA and obtain the denoised and standardized (using "fpca.sc" function in Refund package in R).
2. Form basis matrix and ℤ i as in Equation (3) with transformed covariatesX * (t). 3. Use Y i , and ℤ i in the penalized selection criterion (4), (5), (6) and implement group LASSO, group SCAD, or group MCP using the "grpreg" function within grpreg package in R. 4. Use EBIC for choosing the tuning parameters K , L , and the penalty parameter .
The Canadian Journal of Statistics / La revue canadienne de statistique 3. SIMULATION STUDY

Simulation Setup
In this section, we investigate the performance of the proposed variable selection method using numerical simulations. We evaluate the proposed method in terms of selection accuracy and out-of-sample prediction performance. We also compare the proposed variable selection method for NPFCM with the variable selection method in FLCM (Ghosal et al., 2020). We consider the following simulation designs: Scenario A: We generate data from the model where we have Y (t) = 1 + 2(t∕100) + (t∕100) 2 , and the bivariate functions are given by (xt∕100) and F {x, t} = 0 for = 4, 5, 6, … , 20, i.e., the last 17 covariates do not have any effect on the response. The . Moreover, following the discussion in Section 2.3, we assume that the covariates X i (t) are observed with measurement error, i.e., we observe ∼ Uni orm{30, 31, … , 41}. Three sets of sample size n ∈ {200, 400, 600} are considered. For each set, 80% of data is used for model fitting and the rest is kept aside for validation.
Scenario B: We follow a similar data generation set-up as in (7) with Y (t) = 1 + 2(t∕100) + (t∕100) 2 , F 1 {x, t} = 2x 3 exp(t∕100), F 2 {x, t} = 5x 2 sin( t∕100), F 3 {x, t} = 16 sin(xt∕100) and F {x, t} = 0 for = 4, 5, 6, … , 20. In this scenario, the concurrent effects of the functional predictors are highly nonlinear unlike in scenario A, and this scenario is specifically used to compare the performance of the proposed variable selection method for NPFCM with that of the variable selection method in FLCM (Ghosal et al., 2020). We consider sample size n = 200 for this simulation scenario.
Scenario C: We consider an additional high-dimensional scenario with p = 50 functional predictors and sample size n = 40. We set Y (⋅) and F 1 (⋅, ⋅), F 2 (⋅, ⋅) and F 3 (⋅, ⋅) as in scenario A, and F {x, t} = 0 for = 4, 5, 6, … , 50. The rest of the simulation design is kept exactly the same as in scenario A.
For all the simulation scenarios, we generate 500 replicated datasets from the specified models to assess the proposed variable selection method's performance.

Simulation Results
Scenario A: We first assess the selection accuracy of the proposed variable selection method. As the covariates are observed sparsely and with measurement error, we apply the FPCA method (PVE = 99%) and pre-process the curves as discussed in Section 2.3 to obtain denoised and standardized predictorsX * i (t) before applying our variable selection method. We then use the proposed variable selection method for NPFCM and note which predictors are selected in the model for each of the three penalized selection criteria proposed in Section 2.2. We repeat this across 500 Monte Carlo replications and obtain the selection percentage of each predictor, reflecting the true-positive rate (TPR) and FPR of the proposed variable selection method. The selection percentages are reported in Table 1. We observe that all three penalization methods (group LASSO, group SCAD and group MCP) pick out the nonparametric concurrent effect of X 1 (⋅), X 2 (⋅), X 3 (⋅) in 100% of the replications, illustrating satisfactory TPR of the proposed method. The group LASSO method exhibits a small false selection percentage around 0.2% for some of the variables. The group SCAD and the group MCP method, on the other hand, show zero FPRs across all the sample sizes in our simulation for all the predictors. In other words, they are able to identify the true model in 100% of the replications. The average model sizes for each scenario are also provided in Table 1. The group LASSO method has an average model size in the range of 3.002-3.006 for the three different sample sizes. The group SCAD and the group MCP method produce an average model size 3, successfully identifying the true model each time. Next, we compare the proposed selection methods (with different penalties) in terms of their out-of-sample prediction performance. For this purpose, we define a criterion similar to R-squared (R 2 ), based on the sum-of-squared deviations of the predicted curves from the actual curves. For normalization purpose, we use the sum-of-squared deviations of the actual curve from their scalar meanȲ. The measure is defined as We report the quartiles and 10th and 90th percentiles of the out-of-sample R 2 criterion based on all the Monte Carlo replications in Table 2. We notice that for all three penalization methods, the median of the out-of-sample R 2 is around 96%-97% and increasing with the training set sample size. This indicates a satisfactory out-of-sample prediction performance of the proposed variable selection method. The empirical results, in particular, illustrate that the proposed group SCADand group MCP-based selection method for the NPFCM can not only identify the underlying model successfully but also provide high out-of-sample prediction accuracy. Scenario B: In this scenario, the primary goal is to illustrate the benefit of the proposed variable selection method for NPFCM by comparing it with the variable selection method for FLCM (Ghosal et al., 2020). In particular, we want to demonstrate what can be gained using the proposed method in terms of selection accuracy and prediction performance in the presence of nonlinear effects of the predictors. As in scenario A, we first obtain the denoised curveŝ X i (t) before applying our variable selection method. All the predictors in this scenario have a nonlinear effect on the response. We obtain the selection percentage of each variable for each of the three penalized selection criteria discussed in this article and compare them with the same for the variable selection method in FLCM. The results are displayed in Table 3. We observe that for the variable selection method in FLCM, the effect of X 2 (⋅) (X 3 (⋅)) is captured in 69% (73.2%) of the replications by the group LASSO penalty, in 58.4% (64.6%) of the replications by the group SCAD penalty, and in 50.8% (60%) of the replications by the group MCP. The results illustrate quite a large (30%-50%) FNR for the variable selection method in FLCM. The average FPR is also considerably high (29%, 21%, 16%) using the variable selection method for FLCM.
In comparison, the variable selection method for NPFCM captures the nonlinear effect of X 2 (⋅) in 100% of the replications for all the penalties. The effect of X 3 (⋅) is also captured with a higher TPR (87%-99%). The proposed group SCAD-and group MCP-based selection methods for the NPFCM also exhibit a negligible FPR (<1%). The results illustrate that the proposed variable selection method for NPFCM is more suitable in the presence of nonlinear effects of the functional predictors, as the FNR remains minimal along with the FPR.
We provide a similar comparison as in scenario A, in terms of out-of-sample prediction performance of the selection methods. FLCM intuitively should be able to capture only the linear DOI: 10.1002/cjs.11654 The Canadian Journal of Statistics / La revue canadienne de statistique     method in NPFCM produces a superior performance on account of successfully identifying the nonlinear effects of the functional predictors. Scenario C: This scenario illustrates the performance of the proposed variable selection method in the high-dimensional case of p > n. The TPR over the 500 replicated datasets for all three penalization methods is found to be 100%. The average FPR for the group LASSO, the group SCAD, and the group MCP method are found to be 0.39%, 0% and 0.008%, respectively. The results illustrate the proposed variable selection method's satisfactory performance even in the high-dimensional scenario of p > n.
The simulation results in this section illustrate that the proposed variable selection method for NPFCM can identify the underlying functional predictors having nonlinear concurrent effects with high accuracy and also gives satisfactory out-of-sample prediction performance. The group SCAD-and group MCP-based selection methods, in particular, are shown to have minuscule FPR and FNR, and therefore are the recommended methods for variable selection in NPFCM in this article.

REAL DATA APPLICATIONS
In this section, we demonstrate the application of the proposed variable selection method in NPFCM for the selection of the influential functional covariates in two real data studies. We first consider a dietary calcium absorption study, with added pseudo-variables as an illustration of the proposed method. Pseudo-variables have been used before (Miller, 2002;Wu, Boos & Stefanski, 2007) to assess the false selection rate in real datasets and therefore to tune variable selection procedures. Next, we apply the proposed method for identifying the influential meteorological variables in the capital bikeshare data and estimate their concurrent effects on the number of casual bike rentals.

Study of Dietary Calcium Absorption
We consider the study of dietary calcium absorption given in Davis (2002). In this article, we have data on calcium absorption Y(t), dietary calcium intake X 1 (t), body mass index (BMI) X 2 (t) and body surface area (BSA) X 3 (t) of 188 patients at irregular time points between 35 and 64 years of age (t). For each patient, the number of repeated measurements varies between 1 and 4. Figure 1 displays the individual curves of the patients' calcium absorption, calcium intake, BSA and BMI as a function of their ages. We are primarily interested in finding out which of the three predictors influence calcium absorption Y(t). We intuitively expect calcium intake to be associated with calcium absorption based on prior literature (Kim, Maity & Staicu, 2018a;Ghosal et al., 2020). As the data are sparse and the original covariates might be observed with measurement error, we apply FPCA-based pre-processing (PVE = 95%) as discussed in Section 2.3 and get the denoised and standardized trajectoriesX * (t) for = 1, 2, 3, before applying our variable selection method. To further illustrate the selection performance and FPR of the proposed variable selection method, we add 15 functional pseudo-variables by simulating X i (⋅) . Finally, we apply the variable selection method developed in this article for NPFCM to the functional response Y(t) and the covariateŝ X * 1 (t),X * 2 (t),X * 3 (t), X * 4 (t), X * 5 (t), … , X * 18 (t) (the pseudo-variables are standardized). We repeat the whole procedure a large number of times (B = 100). Table 5 displays the selection percentage of each of the variables. We notice that both the group SCAD-and group MCP-based methods identify calcium intake X 1 (t) as a significant predictor in 100% of the iterations. All other variables including the randomly generated predictors are completely ignored in all the iterations. So the proposed variable selection method for NPFCM does a good job in identifying the true predictor set and discarding the pseudo-variables. As both the   penalization methods select calcium intake, we want to estimate its concurrent effect. For this purpose, we use the proposed method for performing variable selection in NPFCM on the original data without adding any pseudo-variables. The estimated nonparametric effect of calcium intake (adjusted for the main effect of t) is displayed in Figure 2. The estimated linear effect of calcium intake, obtained using the variable selection method for FLCM (Ghosal et al., 2020), is also provided for comparison. We notice that the estimated nonparametric effect of calcium intake (standardized) is more or less linear, which matches with prior findings of Kim, Maity & Staicu (2018a). For fixed and moderate values of t (age), as x (calcium intake) increases (or is higher than the mean at time t), the functional effect  F(x, t) is decreasing in most of the regions, indicating that, as calcium intake increases, dietary calcium absorption should decrease at most ages. A similar observation was made in Ghosal et al. (2020), although one needs to be careful in interpreting the results near boundaries due to lack of observations. Our analysis using the variable selection method for NPFCM provides a more comprehensive understanding of the underlying association between calcium absorption and dietary calcium intake.

Remark 2.
To investigate what happens in the presence of a significant correlation between the truly relevant predictors and the pseudo-variables, we considered an alternative scenario where the pseudo-variables Z (t) ( = 4, 5, … , 18) were generated conditionally on Z 1 (t) (calcium intake) from a normal distribution with mean and variance (1 − 2 ). We set = 0.5 in our analysis to introduce a moderate correlation among the variables. The constants a, b were taken to be the scalar mean and standard deviation of the samples {Z i1 (t i ), = 1, 2, … , m i } n i=1 . We applied the proposed variable selection method for NPFCM and obtained identical results as reported in Table 5 in terms of variable selection. This result demonstrates that the proposed variable selection method can distinguish between the truly influential predictors and irrelevant ones with high accuracy even when there is a significant correlation among them.

Study of Bike-Sharing Data
We consider application of the proposed variable selection method on the Capital bikeshare study in Fanaee-T & Gama (2014). The data were initially obtained from the Capital bikeshare system in Washington, D.C., and contain information on casual bike rentals along with other meteorological variables such as temperature (temp), feels-like temperature (atemp), humidity (hum) and wind speed (windspeed) on an hourly basis. Casual bike rentals are defined as rentals to cyclists without membership in the Capital bikeshare program. Since bike rentals can have different dynamics on weekends compared to weekdays, following the analysis of Kim et al. (2018b), we restrict our attention to casual bike rentals on Saturdays. The counts of casual bike rentals (hourly) are available during the period from 1 January 2011 to 31 December 2012, on a total of 105 Saturdays barring some exceptions (8 missing). Figure 3  and we are primarily interested in selecting the influential ones as well as estimating their concurrent effects. The hourly bike rentals Y(t) are discrete and their distribution is skewed, therefore we apply a log transformation prior to our analysis (Y(⋅) = log{Y(⋅) + 1}). We add 16 functional pseudo-variables to the existing 4 meteorological variables by generating from the covariate model X (t) = a √ 2 sin(2 ( − 4)t) + b √ 2 cos(2 ( − 4)t), where a ∼  (0, (2) 2 ), b ∼  (0, (2) 2 ), for = 5, 6, … , 20. This is again done in order to assess the false selection performance of the proposed method. We pre-process the meteorological variables X 1 (t), X 2 (t), X 3 (t), X 4 (t) using FPCA methods as discussed in Section 2.3, and then apply the proposed variable selection method to Y(t) andX * 1 (t),X * 2 (t),X * 3 (t),X * 4 (t), X * 5 (t), … , X * 20 (t) (the pseudo-variables are standardized). The whole procedure is repeated several times (B = 100). Table 6 displays the selection percentage of each of the predictors for the group SCAD-and group MCP-based methods for NPFCM. Also reported, for comparison, are the selection percentages of the FMCP method (Ghosal et al., 2020) with and without pre-whitening (PW and NPW, respectively).
We observe that the group SCAD method selects feels-like temperature X 2 (t), humidity X 3 (t) and wind speed X 4 (t) 100% of the time, whereas the group MCP method selects temperature X 1 (t), humidity X 3 (t) and wind speed X 4 (t) 100% of the time. The covariates temperature X 1 (t) and feels-like temperature X 2 (t) should be highly correlated, and it is ideal that the proposed variable selection method is picking only one of them consistently. In terms of the FPR, we see that both group SCAD and group MCP methods discard the pseudo-variables in all the iterations.  For the FMCP method, which uses an FLCM, we observe that feels-like temperature (atemp) is selected using pre-whitening, and all four predictors are selected without pre-whitening. In this case, the average number of false positives is calculated to be 4.85. In the presence of nonlinear effects, pre-whitening may not be a good idea, as the FLCM used to estimate the covariance matrix becomes mis-specified. Hence, both results (with and without pre-whitening) are presented to facilitate comparison. Nevertheless, the proposed variable selection method for NPFCM performs better than FMCP irrespective of pre-whitening in terms of selection accuracy. Next, we estimate the concurrent effects of the meteorological factors on hourly bike rentals by applying the proposed group MCP method for NPFCM on the actual covariates (no pseudo-variables added). The estimated concurrent effects of the selected meteorological variables (in standardized scale) are displayed in Figure 4. The estimated effects from the group SCAD method are similar and therefore not reported. We observe that the effect of temperature on bike rentals is maximum when the temperature is in the mid-range (near the mean) and during midday, as this should be an ideal condition for biking. For too cold or too warm temperatures, the effect on the number of bike rentals decreases, which is natural. For mid-range to higher temperatures, the impact of temperature on casual bike rentals is more or less uniform throughout the day, which is expected on weekends and suitable weather conditions in Washington, D.C.
There is a negative association for each fixed time point for humidity, as its effect on the number of bike rentals decreases with increasing humidity. A similar observation was made in Jim et al. (2018b). For lower humidity (lower than the mean at a particular time), the effect increases during midday and is highly positive. For higher humidity (higher than the mean), we see a drop in the effect, indicating that on humid days there should be a fall in bike rentals, which is again what one would naturally expect in these weather conditions. For wind speed, the effect is more or less uniform across the domain, except in high wind speed regions where bikers might face additional difficulties. In such high wind speed regions, during midday (plausibly the hottest time of the day), there is a noticeable dip, which can be attributed to the difficulty one might face while cycling amidst high-velocity winds. Our findings match with those of Gebhart & Noland (2014), who made similar observations using a multiple linear regression model on the Capital bikeshare data. Our analysis using NPFCM provides a more comprehensive understanding of the effect of meteorological factors on daily bike rentals.
It is evident from Figure 4 that the concurrent effects of the meteorological predictors are nonlinear. To illustrate the efficiency gain in predictive performance using the variable selection method for NPFCM, we compare it with the FMCP method (Ghosal et al., 2020) for FLCM in terms of their out-of-sample prediction performance. For the FMCP method, we do not use pre-whitening. In the presence of nonlinear effects, the estimate of the covariance matrix obtained using the FLCM can be highly biased and lead to inferior performance. As a comparison metric, we use the out-of-sample mean-square prediction error of both methods We perform repeated training (80%) and testing (20%) using both methods for a large number of iterations (B = 100). Boxplots illustrating the distribution of out-of-sample prediction error corresponding to both methods are displayed in Figure 5. We notice that the variable selection method for NPFCM with MCP produces much smaller out-of-sample MSPE than the FMCP method, demonstrating the advantage of the proposed method, particularly in the presence of nonlinear effects of functional predictors.
Our data analysis in this section illustrates that, with the variable selection method for NPFCM, it is possible to identify the underlying functional predictors driving the functional response more accurately and capture their concurrent nonlinear effects, providing a better understanding of the dynamics between them.
Remark 3. The pre-whitened FMCP, as reported in Table 6, was unable to capture the concurrent effect of humidity and wind speed and produced inferior prediction performance compared to the non-pre-whitened version of the variable selection method in FLCM. Hence, for comparison of out-of-sample prediction performance, the non-pre-whitened version of FMCP was used.
The Canadian Journal of Statistics / La revue canadienne de statistique FIGURE 5: Comparison of out-of-sample prediction performance of the proposed variable selection method for NPFCM with the FMCP method in the bike-sharing data.

DISCUSSION
In this article, we proposed a variable selection method in the nonparametric functional concurrent regression, extending the classically used penalized variable selection methods like LASSO, SCAD and MCP. The proposed method provides a unified framework for variable selection in additive function-on-function concurrent regression models. We have shown through numerical simulations that the proposed variable selection method for NPFCM, particularly with the group SCAD penalty or the group MCP, identifies the true underlying variables with minimal FPR and minuscule FNR while also providing satisfactory out-of-sample prediction accuracy. In the presence of nonlinear concurrent effects of the functional predictors, the proposed variable selection method for NPFCM with non-convex (SCAD or MCP) penalties was illustrated to be more accurate compared to the variable selection method for FLCM and therefore is the recommended method for variable selection in this article. In classical penalized regression, the non-convex penalties (SCAD and MCP) are known to have the oracle property (Fan & Li, 2001;Zhang, 2010) under standard regularity conditions. The linear model representation of NPFCM in Equation (3) could be used to establish the oracle properties of the regularized estimators in NPFCM, under the assumption that the actual nonparametric functions F (⋅, ⋅) are in the space spanned by the tensor product splines. This could be done in the same way as in Chen, Goldsmith & Ogden (2016), where the oracle properties of the group MCP regularized estimate were established in the context of function-on-scalar regression.
The proposed method can also be readily applied for variable selection in function-on-scalar regression, where the covariates are not time-varying. So far, almost the entire existing literature for variable selection in function-on-scalar regression (Chen, Goldsmith & Ogden, 2016;Fan & Reimherr, 2017;Parodi & Reimherr, 2018) has focused on the linear effect of the covariates. The proposed method can be used to identify the nonlinear effects of the predictors. We have demonstrated our method on a dietary calcium absorption study and bike-sharing data to select the influential functional covariates and estimate their underlying concurrent effects. One of the proposed method's limitations is forecasting when a new functional predictor is observed outside the range of the training data. The B-splines (De Boor, 1978) used for modelling in the training data will be not be defined on this new observation. Note that this aspect of prediction is not unique to our proposed variable selection method or NPFCM. In general, for any spline-based method for nonparametric regression, extrapolation beyond the range of observed data is troublesome and has to be done with additional care (e.g., Eilers & Marx, 2010). We have followed a centring and scaling approach as suggested in Kim, Maity & Staicu (2018a) for NPFCM in this article. Alternatively, based on the suggestions for the FGAM (McLean et al., 2014), the transformationsĜ t ( where Φ is the cdf of the standard normal and h t is a user-chosen bandwidth) can be used on X(t) so that the transformed predictors cover [0, 1] uniformly. Another working strategy, often employed by practitioners, can be to include additional "ghost" knots beyond the range of the observed data with the aim of extrapolating (Eilers & Marx, 2010;Kosinka, Sabin & Dodgson, 2014;Corlay, 2016) or supplementing the basis with functions of infinite support following the proposal in Schumaker (2007). We have used an independent working correlation structure in our proposed variable selection method for NPFCM, which has shown satisfactory performance in our empirical analysis in terms of selection accuracy, which is the primary objective of this article. It is plausible to consider a pre-whitening method to take into account temporal dependence within functions. This can be done similarly to Chen, Goldsmith & Ogden (2016) and Ghosal et al. (2020). However, if the model is mis-specified, the use of pre-whitening can be counterproductive, as observed in our analysis. An additional challenge would be to handle the computational complexity arising from the initial pre-whitening step.
There are plenty of research problems that remain to be explored based on this work. It is of particular interest to study small-sample properties of the estimated nonparametric functions and get a measure of their uncertainty. The perturbation-based bootstrap (Das, Gregory & Lahiri, 2019) procedure for penalized regression methods would be interesting to explore in such situations. For distribution-free predictive inference, split conformal prediction bands (Lei et al., 2018) can also be explored by extending the conformal prediction approach to functional response (Lei, Rinaldo & Wasserman, 2015). The proposed method can be generalized to other function-on-function regression scenarios, e.g., the historical functional regression model or the additive function on function regression model (Kim et al., 2018b), where one might consider nonlinear lagged effects of functional predictors on the response at the current time point. Extending the proposed method to these general classes of functional regression models can have diverse applications and remain areas for future research.