Cluster analysis of longitudinal proﬁles with subgroups

: In this paper, we cluster proﬁles of longitudinal data using a penalized regression method. Speciﬁcally, we allow heterogeneous variation of longitudinal patterns for each subject, and utilize a pairwise-grouping penalization on coeﬃcients of the nonparametric B-spline models to form subgroups. Consequently, we identify clusters based on diﬀerent patterns of the predicted longitudinal curves. One advantage of the proposed method is that there is no need to pre-specify the number of clusters; instead the number of clusters is selected automatically through a model selection criterion. Our method is also applicable for unbalanced data where diﬀerent subjects could have measurements at diﬀerent time points. To implement the proposed method, we develop an alternating direction method of multi- pliers (ADMM) algorithm which has the desirable convergence property. In theory, we establish the consistency properties for approximated nonpara- metric function estimation and subgrouping memberships. In addition, we show that our method outperforms the existing competitive approaches in our simulation studies and real data example.


Introduction
In longitudinal data studies, distinguishing patterns of longitudinal trajectories is useful in many practical applications. For example, in personalized medicine, correctly identifying subgroups is essential for individualized treatment assignment, since distinguishing the dynamics of disease progression status among patients is critical in evaluating the effectiveness of a certain treatment. In timecourse gene expression studies, grouping genes with similar expression profiles over time is also useful in association studies to identify crucial genes linked with certain diseases ( [4], [14], [24]).
One way to distinguish longitudinal patterns is to apply cluster analysis through classical multivariate clustering methods and algorithms by treating repeated measurements as multivariate vectors. For example, the K-means method [12] is one of the popular dissimilarity-measure-based approaches, which partitions subjects into a pre-specified number of clusters based on the Euclidean distance from each cluster mean. Alternatively, the Gaussian mixture model [11] assumes a finite mixture of multivariate normal distributions, and estimates the parameters of the distribution and the conditional membership probabilities. In addition, [25] provides a comprehensive review of several other applicable multivariate clustering algorithms.
However, there are several drawbacks to treating longitudinal data as multivariate vectors, since this assumes that the multivariate vector is balanced with the same number of time points. That is, the multivariate clustering approach cannot be applied directly unless the missing entries are imputed first. Most critically, the multivariate-vector method does not take the information of time ordering into account, and consequently, the clustering result is invariant to different permutations of measurements within subjects, which makes little sense when observations are over time and the trajectory patterns are of our main interest. Alternatively, to capture the growth curves of each subject, we can cluster the trajectories of subjects with nonparametric smoothing approaches, such as B-spline techniques. For example, [1] partitions subjects through the K-means method using B-spline coefficients. [17,18] and [7] apply spline approximations under the linear mixed-effects model framework. However, the imposed parametric assumption of the mixed effects model, typically a normal distribution, makes their methods less flexible in practice.
In addition, the aforementioned clustering approaches require one to specify the number of clusters in advance, which could be problematic in cluster analysis. Recent developments in penalized regression methods allow one to estimate the cluster centers and select the number of clusters simultaneously. [20] models the multivariate vectors assuming an individual center for each subject and penalizes the pairwise distance between two subjects' centers. [5] utilizes a convex penalty and considers the clustering as a convex optimization problem. [19] incorporates covariates of interest to model univariate response data, which assumes different intercepts for different subjects. However, none of these approaches models trajectories over time for longitudinal data.
In this paper, we propose a regression-based approach which partitions observations into subgroups through penalization of pairwise distances between the B-spline coefficients vectors. One advantage of the proposed approach is that a pre-specification of the number of clusters is not required; instead, we select the number of clusters automatically through a model selection criterion. This allows us to achieve model estimation and subgrouping subjects simultaneously. Another advantage is that the proposed method is applicable in characterizing longitudinal trajectories which can include unbalanced longitudinal data. In addition, we implement an alternating directions and method of multipliers algorithm (ADMM) [2] to achieve fast convergence of the method. In theory, we establish the consistency property for the proposed method which can identify the true underlying subgroup membership asymptotically. Furthermore, our simulation studies and real data analysis also confirm that, compared to other existing approaches, the proposed method performs well in identifying subgroups.
The rest of the article is organized as follows. Section 2 introduces the model formulation. In Section 3, we propose a nonparametric pairwise-grouping approach along with the ADMM algorithm, and establish theoretical properties and implementation strategies. Simulation studies and real data analysis are presented in Sections 4 and 5. We conclude the article with a brief discussion in Section 6.

A subject-wise model for longitudinal data
The subject-wise model for subject i (i = 1, · · · , n) is formulated as follows: where y ij is the response at the jth (j = 1, · · · , n i ) repeated measurement and x ij is a one-dimensional covariate which defines the pattern of interest, and the random errors ε ij are uncorrelated with mean 0 and variance σ 2 . In this paper, we only deal with the setting where x ij is a covariate of time, and investigate the patterns of longitudinal trajectories over time. In principle, we can extend our method to a more general setting with more than one covariate. Without loss of generality, we assume that the covariates x ij can be scaled to a compact interval X = [0, 1]. For this subject-wise model, each subject has its unique unknown smoothing function denoted as f i (·) ∈ C q (X ), which is assumed to be qth-order continuously differentiable. The estimation of the subject-wise smoothing functions f i (·) characterizing the longitudinal profile of a specific covariate is one of our main interests. Here, we estimate f i by the nonparametric B-spline approach, which flexibly approximates smoothing functions. We define the qth-order B-splines with a set of m internal knots sequences κ = {0 = κ 0 < κ 1 < · · · < κ m < κ m+1 = 1} recursively [8] as Then is a B-spline basis vector and β i is a p-dimensional coefficient vector with p = m + q, determined by the number of knots m and B-spline order q.
The matrix form of the model (2.1) can be reformulated as In addition, the corresponding B-spline approximation is We assume that the subjects share the same smoothing function form if they are from the same group. That is, f i = f j if subjects i and j are from the same cluster group. Consequently, let G = {G 1 , · · · , G K } be a partition of {1, · · · , n}, where K(K ≤ n) is the number of distinct groups. We define the nonparametric function subspace M f G corresponding to the group partition as and the subspace of the B-spline coefficients corresponding to the group partition as Our goal is to identify the distinct group patterns of the smoothing functions for any given subjects. This is equivalent to distinguishing between B-spline coefficients for each group.

A nonparametric pairwise-grouping approach
In this subsection, we propose a pairwise-grouping approach through penalization to achieve B-spline coefficient estimation and grouping of subjects simultaneously. We adopt a penalized B-spline approach [9] to utilize a relatively large number of knots, but impose a penalty on the B-spline coefficients. More specifically, the objective function of the penalized regression spline given the dth-order difference penalty is ×p matrix presentation of the dth-order difference operator. For example, the second-order difference operator Δ 2 is defined as

175
The penalized B-spline coefficient estimator is obtained by minimizing the following objective function (3.1): Consequently, the estimation of the smoothing function approximation isf = Bβ. Notice that this is equivalent to applying the penalized B-spline approach for each subject separately under the subjectwise model framework. In the penalized spline approach, [22] provides a rule-ofthumb to select about min{n i /4, 35} number of knots for subject i, where the location of knots can be chosen based on sample quantiles of {x i }.
To identify subgroups corresponding to distinct smoothing functions, we group subjects together if they possess similar functional forms of nonparametric approximations. Specifically, we penalize pairwise distances of B-spline coefficients to encourage subjects to fall into the same group. We propose the corresponding objective function as: where ρ(·, λ 2 ) is a penalty function with a tuning parameter λ 2 , and L = {l = (i, j) : 1 ≤ i < j ≤ n} is the index set containing the total number of possible subject pairs |L| = n(n − 1)/2. The essence of the proposed approach is to take advantage of the flexibility of nonparametric approximation while controlling the complexity of the model, which is also associated with the number of subgroups. Here, the tuning parameter λ 2 plays such a role to determine the number of subgroups. By minimizing the objective function (3.2), we simultaneously obtain nonparametric coefficient estimation and group subjects if their estimated nonparametric coefficient vectors are sufficiently close.
In general, the choice of penalty function ρ(·, λ 2 ) is critical since it results in different parameter estimation and subgroup selection. For example, a Lassotype of penalty leads to a sparse solution, which could be appealing in merging subjects into groups. However, it is also well-known that the Lasso estimation is biased. Here, we apply the minimax concave penalty (MCP) [29], which is nearly unbiased and also has the sparsity property. Specifically, the penalty function is , and the regularization parameter τ controls the unbiasedness and concavity of the penalty function. We obtainβ through minimizing (3.2) using the MCP penalty, and the corresponding smoothing function estimation isf = Bβ. With this nearly-unbiasedness property, we can achieve accurate nonparametric coefficient estimation and group membership recovery.
In fact, it is challenging to optimize the objective function (3.2) directly, as the proposed grouping penalty is not separable in terms of β i 's. Here, we develop an alternative approach which introduces a new set of parameters v l = β i − β j , l ∈ L, which are equivalent to the pairwise differences of B-spline coefficient vectors. Consequently, the above optimization problem can be transformed into the following constrained problem: We solve the above optimization problem using the alternating direction method of multipliers (ADMM), which is a variant of the augmented Lagrange multipliers (ALM) method. The above constrained problem can be further converted to an optimization problem through augmenting a quadratic penalty with a fixed parameter θ: Notice that the above two problems are equivalent due to the fact that the imposed quadratic penalty is zero for any feasible β and v = (v T 1 , · · · , v T |L| ) T satisfying the constraint. Therefore, we can estimate parameters by minimizing the corresponding Lagrangian as follows: where λ = (λ T 1 , · · · , λ T |L| ) T are Lagrange multipliers. Following the AMDD algorithm, we update the estimates of β, v, λ sequentially at the (s + 1)th iteration step as follows: Note that the first minimization function in Eq. (3.3) is equivalent to minimizing a quadratic function: whereṽ l = v l + 1 θ λ l and A l = (e i −e j ) T ⊗I p , in which ⊗ is the Kronecker product and e i is an n-dimensional vector with one at the ith component and zeros As for the second minimization function in Eq. (3.4), it is a convex function with respect to each v l if τ > 1/θ, even though L θ (β s+1 , v, λ s ) involves a nonconvex MCP penalty term. Consequently, we can update v s+1 In non-convex optimization, it is important to assign appropriate initial values to obtain a good solution. We choose to initialize the ADMM algorithm with a warm start which also reduces the number of iterations. Specifically, we first apply the penalized B-spline for each subject and assign β 0 =β = where λ 1 can be selected using the BIC λ1 from the two-step tuning procedure in section 3.3. The detailed ADMM algorithm is outlined as follows: , for all l ∈ L. if stopping criteria are met then break end if end for In the above algorithm, the stopping criteria are evaluated based on r s+1 The algorithm terminates at the step s * if r s * 2 ≤ r and d s * 2 ≤ d , where r and d are small numbers according to [2]: where the parameters abs and rel are predetermined small values.
Theorem 3.1 establishes the convergence property of the ADMM algorithm, indicating that the stopping criteria of the algorithm can be reached as the number of iteration increases. The proof of Theorem 3.1 is provided in the appendix A.1.

Asymptotic properties
In this subsection, we establish the asymptotic properties of the estimators obtained by the proposed approach. To study the convergence rate off , we first provide some regularity conditions.
(C1). Suppose that the design points {x ij } n,ni i=1,j=1 follow a density function f X , which is absolutely continuous, and there exist constants c 1 and c 2 such (C2). The error terms in model (2.1) are uncorrelated with a mean zero and a variance σ 2 > 0.
where N k = i∈G k n i for k = 1, · · · , K, and N = n i=1 n i . Conditions (C1) -(C5) are standard assumptions for nonparametric B-spline smoothing functions. Similar conditions are also given by [6], [30], and [26]. In (C5), the condition on the number of knots applies for all subjects. In addition, we impose a constraint on cluster size in (C6), implying that the cluster size grows as the sample size increases.
We first investigate the convergence property on the estimation of the penalized B-spline approximationf . Let f o be the true function corresponding to the true group partition G. We establish the estimation consistency in the following Lemma 3.1.
. We establish the following convergence rate off : Lemma 3.1 shows that the convergence rate of the penalized spline estimator is determined by three factors. The first term is the average asymptotic variance, which decreases as the number of repeated measurements grows and increases if the nonparametric model is more complex with an increasing number of knots. The second term is introduced by the shrinkage bias, but vanishes if λ 1 → 0. The last term reflects the nonparametric approximation bias, which is also related to the model complexity. We show in Lemma 3.1 that the convergence rate also depends on the minimum number of repeated measurements n 0 among subjects. The proof of Lemma 3.1 is given in the appendix A.2.
When the true group membership is known, we obtain the oracle approximation byf or = Bβ or , where the corresponding oracle penalized spline estimator isβ or = arg min Let N 0 = min(N 1 , · · · , N K ), we provide the convergence rate of the oracle approximation in the following Lemma 3.2.

Lemma 3.2. Under conditions (C1) -(C4) and (C6), given a sufficiently large
In contrast to the convergence rate in Lemma 3.1 for the penalized B-spline estimators, Lemma 3.2 establishes a faster convergence rate for the oracle penalized spline estimators when the true subgroup information is known, since N > n 0 . The above convergence property is guaranteed as long as the number of repeated measurements for each cluster is sufficiently large, as it is equivalent to obtainingβ or within each subgroup. The proof of Lemma 3.2 is provided in the appendix A.2. In the following, let b be the minimum distance between smoothing functions f o (k) and f o (k ) from any two clusters, that is, b = min . We denote the proposed approximation asf = Bβ.

Theorem 3.2. Under conditions (C1) -(C6)
, and if cb ≥ τ λ 2 holds for a constant c > 0 , then for any fixed n, and given a sufficiently large n 0 , such that Theorem 3.2 indicates that the convergence rate of the proposed approximation is the same as the penalized spline estimators as long as there is a sufficiently large number of repeated measurements for each subject. In addition, the distance between smoothing functions from any two clusters should be sufficiently large to achieve the above convergence rate. The details of the proof are given in the appendix A.3.
Corollary 3.1 indicates that the recovery of subgroup membership depends on both the minimum distance b and the nonparametric approximation bias O p (m −2q ). If the true functions from two clusters are very close to each other, then the nonparametric approximation needs to be sufficiently accurate with little bias. Otherwise, a smoother function with fewer knots is adequate to ensure subgroup membership recovery as long as the distance between two clusters is sufficiently large. The proof is given in the section A.3.

Implementation
In this section, we provide the strategy of forming subjects into subgroups based on parameter estimation. In addition, we provide the tuning parameters selection criteria.
In fact, we form clusters based on estimated parametersv instead of B-spline coefficientsβ. In the proposed approach, we encourage closeness between β i and β j through pairwise grouping penalization. That is, subjects i and j are expected to be clustered in the same group ifβ i =β j . However, this is not achievable since we impose a quadratic penalty on β i − β j − v l in the ADMM algorithm when augmenting the optimization constraint β i − β j = v l , and this makes the implementation problematic for clustering. Therefore, we apply the MCP penalty tov l to enjoy the sparsity property in the algorithm, and merge subjects i and j into the same cluster ifv l = 0.
To select the tuning parameters λ 1 and λ 2 , we propose a two-step procedure to search from a sequence of grid points. Note that λ 1 controls the smoothness of the B-spline approximation, and λ 2 controls the number of clusters selected, denoted asK. Traditionally, we can implement a grid search for both tuning parameters simultaneously. However, this increases the computational cost tremendously. To solve this problem, we propose a two-step procedure which first searches for an optimal value of λ 1 given λ 2 = 0, then selects λ 2 given the optimal λ 1 from the first step. Although this procedure may not lead to the optimal selection for both tuning parameters, our numerical studies show that this strategy works effectively. More specifically, we select λ 1 by minimizing

Simulation study
In this section, we conduct simulation studies to investigate the performance of the proposed nonparametric pairwise-grouping approach (NPG) when the subjects have unbalanced numbers of repeated measurements, which often arises in practice. We compare the proposed method with the smoothing spline regression clustering approach [18], using the original unbalanced data. However, traditional multivariate-vector approaches are not feasible for handling unbalanced data unless imputation for missing entries is implemented. Instead, we compare our method to the K-means (bKmeans) and the Gaussian Mixtures (bGM) methods by treating the subject-wise penalized B-spline estimatorsβ i 's as multivariate vectors. We implement the K-means method with R function kmeans and select the number of clusters based on the Gap statistic [13] using the R package cluster.
To ensure the robustness of the K-means method, we calculate a mean result from 10 random picks of initial centers. The Gaussian mixtures approach is implemented by the R package mclust, and the number of clusters is selected based on the embedded Bayesian Information Criterion (BIC), which is chosen from K = 1, 2, · · · , 15 in each simulation. We also implement the smoothing spline regression clustering approach with the R package MFDA. In addition, we compare the results with a mixture of mixed-effects method (Mixed) with P-spline smoothing technique [7]. In our simulation, we fix θ = 1 and τ = 2 to ensure the convexity of our objective function. For the methods we compare, the proper tuning parameters are chosen accordingly for each approach. The final results are based on 100 simulations. We implement the proposed approach in R software, and the programming codes are available on Github (https:// github.com/Xiaolu-Zhu/LongitudinalClustering.git).
To evaluate the performance of these clustering algorithms, we calculate the estimated number of groupsK selected and several frequently used external validity measures: the Rand index [21], the adjusted Rand index (aRand) [15] and the Jaccard index [16]. Let the true positive (TP) be the number of pairs of subjects from the same cluster and assigned to the same cluster, the true negative (TN) be the number of pairs of subjects from different clusters and assigned to different clusters, the false positive (FP) be the number of pairs of subjects from different clusters but assigned to the same cluster, and the false negative (FN) be the number of pairs of subjects from the same cluster but assigned to different clusters. The Rand index is calculated as Rand = TP+TN TP + TN + FP + FN , which measures the percentage of pairwise agreements between the true and selected clusters. However, Rand tends to be large even under random partitions. The adjusted Rand (aRand) index corrects this problem, and is calculated by Rand -E(Rand) max(Rand) -E (Rand) . The Jaccard index is calculated as TP TP+ FN +FP . For these external criteria, a higher value indicates a better agreement between the selected and the true group memberships.

Case 1: Independent measurement
In this simulation setting, we generate 15 subjects from each of the following four distinct functional patterns: f (1) 5x; and f (4) (x) = −1.5x + 1.5. The continuous response y ij for subject i from the kth subgroup is generated by y ij = f (k) (x ij ) + ε ij , k = 1, · · · , 4, j = 1, · · · , 10, where random errors within subjects are independent ε ij ∼ iid N (0, 0.4 2 ), and {x ij } 10 j=1 are equally spaced points on [0, 1]. In order to mimic real data situations, we allow 30% of the subjects from each subgroup to have 40% missing repeated measurements. The number of knots is recommended by [22] as min{n i /4, 40} for subject i. We choose the B-spline with an order q = 3 and the number of knots m = 3 for all subjects. Table 1 shows that the proposed approach performs the best in terms of three external criteria. Since the K-means and Gaussian mixtures methods approximate each subject's pattern individually, the clustering results are not as good as the proposed method. In addition, the proposed NPG method is able to identify the true function subgroups more effectively, as it estimates the Bspline coefficients for all subjects simultaneously and borrows cross-subject information from the same subgroup. Our simulations show that the mixed-effects model performs better than the K-means and Gaussian mixtures methods, but slightly worse than the proposed approach. We also compare the computational cost among these methods. In this simulation setting, the average computational time for the "Mixed" method in [7] is 23.13 seconds, the K-means method is 12.53 seconds, the Gaussian mixture method is 1.06 seconds and the MFDA method is 12.37 seconds, while the proposed NPG approach with fixed tuning parameters takes about 27.18 seconds. As shown in Corollary 3.1, subgroup membership can be recovered successfully regardless of the number of knots used in B-spline approximation as long as the underlying longitudinal function patterns are far from each other. We illustrate this using various numbers of knots from 1 to 3. Figure 1 and Figure 2 indicate that the underlying true mean function curves are recovered well even with one knot, although the non-linear curves can be approximated more accurately if we increase the number of knots to 3, especially for the nonlinear  patterns in clusters 3 and 4. However, using one or three knots leads to the same accurate subgrouping identification in this simulation setting.
We also conduct simulation experiments to investigate the effect of tuning parameter selection. Specifically, we fix the tuning parameter θ = 1 and τ = 2 to ensure that τ > 1 θ is satisfied. We compare the NPG performance on a range of τ ∈ (2,4,8,16), which shows that the clustering results are quite stable with the Rand index range in (0.993, 0.995), the adjusted Rand index in (0.982, 0.986) and the Jaccard index in (0.975, 0.980).
The clustering performance is also compared under an unbalanced data setting, where we follow the same data generation process, except that the sample sizes for the four subgroups are 5, 10, 20 and 40, respectively. Table 2 shows similar clustering accuracy as in the balanced data setting, where both the proposed NPG and Mixed models show robust performance as reflected in the three different indices.

Case 2: Correlated measurement
In this simulation study, we investigate the performance of the proposed approach when the repeated measurements are correlated. In particular, we generate data from the same process as in the Case 1 balanced setting, but allow random errors to have a certain correlation structure. Specifically, we generate the true ε i ∼ N (0, 0.4 2 R 0 ), where R 0 is either AR(1) or exchangeable (EX) with a 0.7 correlation coefficient.
Since the original NPG approach is based on the independence assumption of errors from the same subject, we modify the proposed approach by utilizing the working correlation structure. The corresponding objective function becomes where R is a block diagonal matrix with diagonal components of a given working correlation matrix, such as an AR(1) or exchangeable correlation structure from each subject. Notice that the modified NPG method with an independent working correlation matrix is equivalent to the original NPG, where R is the identity matrix. In practice, we can estimate the working correlation coefficient using the residuals obtained from the NPG approach. We denote the modified NPG methods as NPGex and NPGar, corresponding to the exchangeable and AR(1) working correlation structures, respectively. Tables 3 and 4 indicate that the proposed method performs similarly using either AR(1) or exchangeable working correlation structures, but outperforms other methods in distinguishing different functional patterns. In addition, we also calculate the average mean square error (AMSE) of the predictions of outcomes in Tables 3 and 4, showing that the misspecification of error correlation structure may lead to slight loss of estimation efficiency. In fact, [28] also mentions that we can gain improvement by utilizing a correct correlation structure when the correlation coefficient is large. However, this does not affect accuracy very much in identifying true group membership in this simulation setting.

An application to IRI marketing data
In this section, we apply the longitudinal clustering methods on an IRI marketing dataset. This dataset was developed by the SymphonyIRI Group [3], and consists of 11-year (2001-2011) weekly sales data of packaged goods from chain grocery and drug stores in 47 markets across the country. To better capture the overall sales pattern for each product category for 11 years at the market level, we aggregate the data from the weekly level to the yearly level from store specific to geographic market hierarchy, and from granularly itemized products to product category level. In this analysis, we focus on the Los Angeles market, where we aim to cluster 26 packaged goods categories into subgroups based on their longitudinal sales units trajectories, where the trajectories are standardized to remove the mean and control the unit variance. We compare the clustering results using the proposed method (NPG), the mixed-effects (Mixed) method, the K-means and Gaussian mixture methods on B-spline coefficients fitted at subject level. The MFDA method is excluded from comparison because of too much computation instability to be deliverable. The  NPG approach utilizes 3 knots with an order of 3 for B-spline approximation and the Mixed approach specifies 5 knots based on its implementation guidelines.
Figures 3 and Figure 4 provide the clustering results and the fitted functional curves for the NPG and Mixed approaches. Specifically, the NPG method is able to identify two subgroups of trajectories, where cluster 1 identifies food related packaged products: beer/ale/alcoholic, cider, coffee, cold cereal, frozen dinners/entrees, frozen pizza, hotdogs, mayonnaise, milk, mustard/ketchup, peanut butter, salty snacks, soup, spaghetti/Italian sauce, sugar substitutes, and yogurt. Cluster 2 identifies non-food related products: blades, cigarettes, deodorant, diapers, facial tissue, paper towels, photography supplies, razors, shampoo, toilet tissue, and toothpaste. On the other hand, the Mixed method separates the product categories into 3 clusters, where cluster 3 has food products including mayonnaise, milk, and mustard/ketchup, and the non-food product toothpaste. The remaining two clusters are subsets of the two clusters from the NPG method. In addition, the fitted curves in Figures 3 and 4 for two clusters have similar patterns for these two methods.
The Gaussian Mixture model fails to identify any informative subgroups with one single cluster, while the K-means method requires random initialization. We implement 10 random picks of initial centers for the subject-wise fitted B-spline coefficient vectors, and the number of selected clusters from K-means ranges from 2 to 5, which leads to the same clustering result as the NPG method when the Gap statistic selects 2 clusters given a certain random initialization.

Discussion
In this article, we propose a nonparametric pairwise-grouping approach to cluster longitudinal trajectories over time. The new approach captures underlying functional patterns through utilizing the nonparametric B-spline method. In addition, we subgroup subjects through penalizing pairwise distances of B-spline coefficient vectors, which borrows between-subject information to better recover the true functions. The proposed NPG approach has the advantage of avoiding overfitting, compared to existing methods which approximate the underlying functions separately. This strategy works effectively when some of the repeated measurements are missing.
The proposed approach takes advantage of the MCP penalty, which is nearly unbiased and also leads to a sparse solution. This is especially important as we select the optimal tuning parameters through a model selection criterion BIC, which relies on model estimation accuracy. Note that other non-convex penalty functions, such as SCAD [10] or TLP [23], can also be applied here to utilize the unbiasedness property. However, the implementation and convergence property of the ADMM algorithm based on other viable penalty functions may require further investigation.
In this paper, although we assume an independence structure of random errors within subjects, a modified approach utilizing working correlation is also proposed to account for the correlation information. We show in simulation studies that the modified approach has similar performance in clustering as the NPG approach assuming independence, but leads to improved efficiency in estimation. The theoretical properties of the modified NPG method need to be further investigated if correlation information is of our interest.
The proposed approach can also be extended to subgroup identification incorporating multiple covariates under generalized linear models. One potential research topic is to extend the proposed framework for binary longitudinal outcomes, and to identify the subgroups of treatment effects. Identifying subgroups for binary data could be quite challenging, as specifying the proper loss function while taking the correlation within subjects into account is nontrivial.
group. According to [6], we have f or Thus it can be shown that