lcc: an R package to estimate the concordance correlation, Pearson correlation and accuracy over time

Background and Objective Observational studies and experiments in medicine, pharmacology and agronomy are often concerned with assessing whether different methods/raters produce similar values over the time when measuring a quantitative variable. This article aims to describe the statistical package lcc, for are, that can be used to estimate the extent of agreement between two (or more) methods over the time, and illustrate the developed methodology using three real examples. Methods The longitudinal concordance correlation, longitudinal Pearson correlation, and longitudinal accuracy functions can be estimated based on fixed effects and variance components of the mixed-effects regression model. Inference is made through bootstrap confidence intervals and diagnostic can be done via plots, and statistical tests. Results The main features of the package are estimation and inference about the extent of agreement using numerical and graphical summaries. Moreover, our approach accommodates both balanced and unbalanced experimental designs or observational studies, and allows for different within-group error structures, while allowing for the inclusion of covariates in the linear predictor to control systematic variations in the response. All examples show that our methodology is flexible and can be applied to many different data types. Conclusions The lcc package, available on the CRAN repository, proved to be a useful tool to describe the agreement between two or more methods over time, allowing the detection of changes in the extent of agreement. The inclusion of different structures for the variance-covariance matrices of random effects and residuals makes the package flexible for working with different types of databases.


INTRODUCTION
Agreement indices are generally used when the same experimental unit is measured by at least two methods or observers (King et al., 2007). Measurements of agreement between raters or methods can be used in any field to explore their interchangeability considering a certain degree of agreement between the measurements they provide (Barnhart & Williamson, 2001;Chen & Barnhart, 2013). In biomedical sciences it is often necessary to study the reproducibility of continuous measurements made using specific diagnostic tools or methods, and that measurements can be taken over the time on the subjects of interest, such as in the studies of Pandit, Chair & Schuller (2019); Shinar et al. (2019) and Loecher et al. (2019).
The concordance correlation coefficient (CCC) introduced by Lin (1989) is a statistic commonly used to measure the agreement between methods when the response is continuous. Let Y 1 and Y 2 be two random variables with a joint normal distribution s 12 s 12 s 2 2 ! : Here the expected value of the squared difference between Y 1 and Y 2 can be used as an agreement value. However, it ranges from 0 (perfect agreement) to infinity, which makes its interpretation difficult. Lin (1989) proposed standardizing this agreement index so that its values lie between -1 and +1: Þ, s 2 2 ¼ Var Y 2 ð Þ and s 12 ¼ Cov Y 1 ; Y 2 ð Þ. This coefficient takes the value -1 when there is perfect disagreement, zero when there is no agreement, and +1 when there is perfect agreement. Moreover, ρ, the Pearson correlation coefficient (|ρ| ≤ 1), measures how far each observation deviated from the best-fit line (a precision measure) and C b , the accuracy (0 < C b ≤ 1), measures how far the best-fit line deviates from the 45 line through the origin, defined as where v = σ 2 1 /σ 2 2 is a scale shift and u ¼ ðm 1 À m 2 Þ= ffiffiffiffiffiffiffiffiffiffi s 1 s 2 p is a location shift relative to the scale (Lin, 1989). Note that C b = 1 indicates no deviation from the 45 line. In an attempt to improve the inferential ability, Liao (2003) extended the concordance correlation coefficient by using two random paired measurements to the identity line.
When pairs of samples Y i1k ; Y i2k ð Þ , for i = 1, 2, …, N subjects and k = 1, 2, … K repeated measures, corresponding to observations on the same subject or experimental unit over time, the use of generalized multivariate analysis of variance to compute a weighted version of the CCC for repeated measurements is recommended (Chinchilli et al., 1996). Moreover, this coefficient has also been expanded to assess the agreement between more than two methods (King & Chinchilli, 2001).
When it is necessary to add extra variability sources due to within-subject measurements and/or other covariates in the model, the CCC can be estimated through the variance components (VC) of a mixed-effects model (Carrasco, King & Chinchilli, 2009). The advantages of the mixed-effects models are that they give a general approach to analyse repeated measures and unbalanced data; they allow for the inclusion of different variance-covariance structures for both random effects and sampling errors. The restricted maximum likelihood (REML) approach can be used to obtain unbiased estimates of the VC.
Nevertheless, sometimes the researcher is not interested in reducing the CCC for repeated measurements to a single value, as proposed by Carrasco, King & Chinchilli (2009) and Carrasco et al. (2013), but in describing the extent of agreement between methods over time, as discussed by Liao (2005) in a non-parametric case. However, in the parametric case, we can consider a linear or non-linear function of the time and/or covariates in the model to describe the response variable, as proposed by Rathnayake & Choudhary (2017) and Oliveira, Hinde & Zocchi (2018). Here, we present the implementation of this methodology as an R (R Core Team, 2019) package lcc (Oliveira et al., 2019), which provides functions for estimating the longitudinal concordance correlation (LCC) between methods based on variance components and fixed effects using polynomial mixed-effects models. It also computes estimates for the longitudinal Pearson correlation (LPC), which measures the precision, and the longitudinal bias correction factor (LA), which provides an accuracy measure.
The lcc() function gives fitted values and non-parametric bootstrap confidence intervals for the LCC, LPC and LA statistics. Moreover, they can be estimated using different structures for the variance-covariance matrices of the random effects and different variance functions to model heteroskedasticity of within-group errors, with the option of using time as a variance covariate.
The remainder of the article is organized as follows: Section 1 introduces the theoretical definition of the LCC. Section 2 introduces the lcc() function input and output, describing in detail the various options as well the summary() and other generic methods. Section 3 briefly discusses model specification, which is illustrated more extensively in Section 4 using three real data examples. The first and third shows an application in biomedical science, while the second from food science was the motivation for the development of the methodology and software and nicely shows the utility of the approach. Section 5 provides a discussion about the lcc package, and the importance of LPC and LA. Finally, Section 6 presents some final remarks about the lcc package.

MODELS AND COMPUTATIONAL METHODS
Suppose a researcher is interested in investigating the extent of agreement between two or more methods, indexed as j = 1, 2, …, J. Let N be the number of subjects in the experiment or observational study, indexed as i = 1, 2, …, N, and suppose that each subject is observed n i times (visits) with associated nuisance factors and/or covariates, these could include, for example, the effect of block or group. Let y ijk be a realization of a random variable Y ijk measured on the i-th subject by the j-th method at time t k , k = 1, 2, …, n i , with additional subject level (nuisance) covariates x i . Here t k assumes values of the time covariate t 2 T , where T denotes the set of measurement times. Hence, the linear mixed-effects model including a polynomial function of time per method, random effects of subject, as well random effects for as subject/time interactions, is given by where h = 1, 2, …, q, q + 1, …, p is an index identifying the degree of the polynomial, with q ≤ p; Y ijk is the response measured on the i-th subject by the j-th method at time t ik ; t ik represents the time (seconds, minutes, days, etc) at which the i-th individual was observed; γ is a vector of fixed effect parameters for the subject level covariates; T is a (q + 1)-dimensional vector of random effects with mean vector 0 and covariance matrix G; ε i is a (J × n i )-dimensional error vector assumed to be independent for different i and independent of the random effects, with independent entries over j and k, with mean vector 0 and diagonal variance matrix R i . Under model (1), the longitudinal concordance correlation (LCC) function between methods j and j′, j ≠ j′, is given by where S jj 0 t k ð Þ ¼ t k b j À b j 0 is the systematic difference between methods j and j′; is a variance function assumed continuous in δ; δ j is a vector of variance parameters for observations measured by j-th method or observer. We have that r ðpÞ jj 0 t k ð Þ is the longitudinal Pearson correlation (LPC) that measures how far each observation deviated from the best-fit line at a fixed time t k = t, given by C jj 0 t k ð Þ, the longitudinal accuracy (LA), measures how far the best-fit line deviates from the 45 line at a fixed time t k = t, given by denotes the scale shift at time t k = t, and denotes the location shift at time t k relative to the scale (Lin, 1989;Oliveira, Hinde & Zocchi, 2018).

Estimation and inference
Point estimation and statistical inference for the LCC r jj 0 t k ð Þ has been proposed by Oliveira, Hinde & Zocchi (2018). It is estimated by replacing β and the variance components by their respective REML estimates: Since the variance components are estimated using the REML approach, their estimates are asymptotically normally distributed and the bias is smaller when compared to the maximum likelihood (ML) approach. Moreover, Oliveira, Hinde & Zocchi (2018) showed a satisfactory performance of the LCC even in settings with severe imbalance and only a small number of subjects (N = 20).
A confidence interval (CI) for r jj 0 t k ð Þ can be constructed using a nonparametric bootstrap based on M (e.g., 5,000) bootstrap samples with either the percentile method (recommended for N ≤ 30) or, otherwise, a normal approximation confidence interval, as described by Oliveira, Hinde & Zocchi (2018).
When we use a normal approximation for the CI, the Fisher Z-transformation given by should be used with the normal approximation made to the empirical distribution of r Ã j;j 0 t k ð Þ (Lin, 1989). Consequently, the confidence limits can be estimated using the bootstrap estimator of r Ã j;j 0 t k ð Þ for a fixed time t k = t given by Thus, an approximate bootstrap confidence interval of level (1a) for ρ j,j′ is LB; UB ½ , where where za 2 and z ð1À a 2 Þ denote the a 2 and ð1 À a 2 Þ percentiles of the standard normal distribution.
On the other hand, the CI based on the percentile method uses the percentiles of the bootstrap distribution of b r j;j 0 t k ¼ t ð Þdirectly and is given by Þare the ð100 Â a 2 ÞÀth and ð100 Â 1 À a 2 ÞÀth empirical percentiles of the b r ðmÞ j;j 0 t k ¼ t ð Þvalues, m = 1, 2, …, M. If the bootstrap distribution of r Ã j;j 0 t k ¼ t ð Þis approximately normal, then both proposed methods will give very similar confidence intervals as N increases. Inference for C jj 0 t k ð Þ can be performed in a similar way as to that presented for the LCC. Since C ðj;j 0 Þ ð1Àa=2Þ t k ¼ t ð Þbelongs to the interval [0, 1], we suggest the use the arc-sine transformation instead of the Fisher Z-transformation, nor logistic transformation (used by Oliveira, Hinde & Zocchi (2018)) to approximate the distribution of C ðj;j 0 Þ ð1Àa=2Þ t k ¼ t ð Þby a normal distribution. Thus, the confidence limits can be estimated using the bootstrap estimator of C Ã j;j 0 t k ð Þ for a fixed time t k = t given by and standard deviation of the bootstrap distribution of b C Ã j;j 0 t k ð Þ for a fixed time t k = t is given by where za 2 and z ð1À a 2 Þ denote the ð a 2 Þ and ð1 À a 2 Þ quantiles of the standard normal distribution. Bootstrap percentile intervals are calculated in the obvious way from the bootstrap values b C ðmÞ j;j 0 ðtÞ, m = 1, 2, …, M.

OVERVIEW OF THE PACKAGE lcc AND R SYNTAX
This section provides some details on the implementation of the function lcc and explains its technical arguments, whose default settings were carefully chosen. The package is freely available for download from the CRAN website https://CRAN.R-project.org/package=lcc, and installation can be performed using R> install.packages("lcc") R> library(lcc) The lcc package has 21 arguments that are briefly summarised in Table 1. We present a more detailed description of some arguments below: 1. data: must be a data frame containing the following variables: response, subject identification, method, and time; 2. method: name of the method variable in the dataset. The lcc package recognizes the first level of the variable associated with this argument as the gold-standard method, and then compares it with all other levels; 3. qr: when we specify qr = 0 a random intercept is included in the polynomial model while qr = 1 specifies random intercepts and slopes. If qr = qf = q, with q ≥ 1, all polynomial terms are specified to have random effects at the individual level.
4. time_lcc: a named list with values for arguments time, from, to, and n used in the time_lcc() function to generate a regular sequence merged with specific or experimental time values of the time variable used for LCC, LPC and LA predictions. Argument time is a vector of specific or experimental time values of a given length, where the experimental time values are used as default; from and to are used to define, respectively, the starting and end values of the time variable, and n is used to define the desired length of the sequence. We recommend a grid t ¼ t 1 ; t 2 ; . . . ; t n Ã ð Þ T of n Ã points in T to construct the agreement curve and confidence intervals. In practice, n Ã between 30 and 50 is generally adequate. Example: R> Time <-seq(0,20,1) R> str(tk <-time_lcc(time=Time, from=min(Time), to=max(Time), + n=30)) num [1:49] 0 0.69 1 1.38 2 … 5. pdmat: the lcc package provides six standard classes of positive-definite matrix structures that can be included in the model to estimate the LCC, LPC and LA statistics. Available standard classes are pdSymm, pdLogChol, pdDiag, pdIdent, pdCompSymm and pdNatural. More information about these classes are available in Pinheiro & Bates (2000).
6. var.class: a class of variance functions that are used to model the variance structure of within-group errors using covariates (Pinheiro & Bates, 2000). We generalize this class as where g(·) is the variance function assumed continuous in δ; t k is the time covariate and δ is a vector of variance parameters. The lcc package provides two different standard variance functions classes that are included in the nlme library (Pinheiro et al., 2017).
The first one is the varIdent class that represent a variance model with different variances for each level of a stratification variable s, s = 1, 2, …, S, given by Var e ijk À Á ¼ s 2 e d 2 s ijk : As we have S + 1 parameters to represent S variances, we need to add the restriction δ 1 = 1, and consequently δ′ s Ã = δ s Ã /δ 1 , s Ã = 2, 3, …, S and δ′ s Ã >0. Here each level of method/ observer or time represents a stratum of a homogeneous subgroup.
The second variance function is an exponential function of the variance covariate, the varExp class, represented as where δ sijk is unrestricted, so the variance model (4) allows Var e ijk À Á to increase or decrease over time.
7. weights.form: a varFunc class object, representing a constructor to the form argument in the nlme library. The weights.form argument is based on a one-sided formula specifying a variance covariate and, optionally, a grouping factor for the variance parameters. Moreover, this argument must be specified only when var.class is specified as well.
The first class varIdent represents a variance model with different variances for each level of the grouping factor and has two options of weights.form in the lcc package: (a) "method": specifies a variance model with different variances for each level of factor method/observer and is given by . . . ; J;

NULL No
weights.form Formula An one-sided formula specifying a variance covariate and, optionally, a grouping factor for the variance parameters in the var.class. If var.class = varIdent, the form "method", (or ∼ 1 | method), or "time.ident" (∼ 1 | time), must be used. If var.class = varExp, the form "time" ( ∼ time), or "both" (∼ time | method), must be used. where g method j ; d j À Á ¼ d 2 method j is the variance function, and δ methodj is the variance parameter for observations measured by the jth method. The form argument in the varFunc is form = ∼ 1|method;

NULL
(b) "time.ident": specifies a variance model with different variances for each level of stratification in the time variable and is given by where g t k ; d k ð Þ ¼d 2 t k is the variance function, and δ tk is the variance parameter for observations measured at time t k = t, with t 2 t 0 ; t K ½ and t 0 ≥ 0. The form argument in the varFunc class is form = ∼ 1|time.
The class varExp represents a variance model whose variance function g(.) is an exponential function of the variance covariate. This class has also two options of weights. form in the lcc package: (a) "time": specifies a variance model given by where the variance function g t k ; d ð Þ ¼ exp 2dt k ð Þ is an exponential function of the time t k = t; and δ is the variance parameter. The form argument in the varFunc class is form = ∼ time; (b) "both": specify a variance model for each level of the factor method given by where the variance function g t k ; method j ; d À Á ¼ exp 2d method j t k À Á is an exponential function of the time t k = t for each level of method; and δ methodj is the variance parameter for the jth level of method. The form argument in the varFunc class is form = ∼ time| method; The lcc package uses the REML method as default because it is less biased, less sensitive to outliers, and deals more effectively with high correlations when compared to standard ML estimation (Harville, 1977;Giesbrecht & Burns, 1985). However, we offer the user the possibility to change the estimation method to ML because this approach should be used when comparing models with nested fixed effects but with the same random effects structure. Furthermore, the package depends on the nlme (Pinheiro et al., 2017) and ggplot2 (Wickham, 2009) packages, and imports some functions from packages gdata (Warnes et al., 2017), gridExtra (Auguie & Antonov, 2017) and hnp (Moral, Hinde & Demétrio, 2017).

Generic functions and outputs
A typical call of the lcc function is similar to a call to lme as the LCC estimation is based on a mixed-effects regression model. Several variations in the specifications of linear mixed-effects models to estimate the LCC are possible, and we can query the fitted lcc object through different generic functions. Table 2 gives details of a set of S3 generic extractor functions for objects of class lcc.
The output of the summary() function includes the values of Akaike Information Criterion (AIC) (Akaike, 1974), the Bayesian Information Criterion (BIC) (Schwarz, 1978), log-likelihood value, and a goodness of fit measurement gof, which is calculated using the concordance correlation coefficient (Lin, 1989) between fitted values extracted from the mixed-effects model and observed values. This measure can be used, with care, to describe the overall agreement between observed and fitted values, where a value equal to -1 represents a perfect disagreement between them, zero represents no agreement, and +1 perfect agreement. Clearly, a high model performance is related with a high positive value of gof (generally between 0.8 and 1).
The fitted curves of LCC, LPC, or LA values versus the time covariate, as well as their bootstrap confidence intervals, can be visualised through the lccPlot() function, which is specified as lccPlot(obj, type, control), where obj is an object of class lcc; type specifies required output that could be type="lcc" for the LCC, the default, type="lpc" for the LPC, or type="la" for the LA statistics; and control is a list of control values or character strings returned by the plotControl() function used to modify the plot structure. This function uses the ggplot2 package internally to build the final plot, where predicted values are joined by lines, sampled observations are represented by circles, and confidence intervals by a ribbon (grey as default) defined by its lower and upper bounds. summary() Returns an object of class summary.lcc containing the relevant summary statistics (which has a print() method). If type = "lcc" it provides information about r jj 0 t k ð Þ, and if components = TRUE in the lcc() function, also provides information about r ðpÞ jj 0 t k ð Þ, and C jj 0 t k ð Þ. If type = "model" it provides additional information about the linear mixed-effects fit. The default is type = "model" .
anova() Summarise and compare likelihoods of fitted models from lcc objects coef() The fixed effects estimated and corresponding random effects estimates are obtained at subject levels less or equal to N. The resulting estimates are returned as a data frame, with rows corresponding to subject levels and columns as coefficients.
The output depends on the argument type, where type = "lcc" (the default), type = "lpc", or type = "la" gives output for b r jj 0 t k ð Þ, b r

SPECIFYING MODELS IN THE lcc() FUNCTION
In the lcc package, to describe the LCC we need to specify the subject, response, method and time variables, a polynomial mixed-effect model, and the data. These arguments are specified through an easy-to-use syntax. Consider a first degree polynomial model with random intercepts for a continuous dependent variable y observed on N subjects (i = 1, 2, …, N) using J methods at times t k (k = 1, 2, …, n i ). Such model can be written as Thus, the LCC based on fixed effects and variance components at time t k is given by and the syntax to specify this model in the lcc() function is R> library(lcc) R> data(simulated_hue_block) R> m1 <-lcc(data = simulated_hue_block, subject = "Fruit", + resp = "Hue", method = "Method", time = "Time", + qf = 1, qr = 0) where qf = 1 represents the polynomial degree for the fixed effects, and qr = 0 specifies a random intercepts model. Here, the names of the columns in the dataframe data are supplied as strings to the arguments of the lcc() function.
Suppose now that the experimental design in the previous example was a randomized complete block design. Then, the fixed effect of blocks can be included in that model by specifying the covar argument, that is, R> m2 <-update(m1, covar ="Block") If we suppose different variances for each level of the method factor, the corresponding model would include a variance function such as g d j À Á ¼ s 2 e d 2 j , and the syntax would then be R> m3 <-update(m2, var.class = varIdent, weights.form = "method", + lme.control = list(opt="optim")) To visualize the summary and graphical output of model m3 we call summary(m3) and lccPlot(m3), respectively.
Many other possible models can be built to estimate the LCC through the function lcc() options, see Section 1. Model selection can be performed using likelihood-ratio tests for nested models; or using the AIC or BIC criteria, for example, R> AIC(m2, m3); BIC(m2, m3); anova(m2, m3)

EXAMPLES
We will now use three example datasets, drawn from Lloyd et al. (1998), Martin et al. (2002) and Oliveira, Hinde & Zocchi (2018), to illustrate the implemented functions in the following sections of this article. The first dataset is an observational study of a cohort of 82 adolescent females to assess the percentage body fat and the aim is to determine the agreement profile between measurements made over time using a skinfold caliper and dual-energy X-ray absorptiometry. The second is a canonical example from agriculture and was the motivation for the original development of these methods; here the goal is to investigate if a colorimeter can compete with a digital scanner in measuring the peel hue of papayas over time. The final example is again related to medicine and the goal here is to verify the agreement between cortisol concentration measured on patients every hour and every 2 h.

Percentage body fat dataset
These data came from a longitudinal observational study conducted as part of the Penn State Young Women's Health Study (Lloyd et al., 1998). Percentage body fat was measured using skinfold calipers and dual-energy X-ray absorptiometry (DEXA) on a cohort of 82 adolescent white females attending public schools in Pennsylvania. The initial visit occurred at age 12 (baseline) and subsequent visits occurred every 6 months, in which one skinfold caliper and one DEXA measurement were taken to assess the percentage of body fat. As the skinfold measurement is the most frequently used method for laboratory and field studies, the objective was to determine the agreement profile between the skinfold caliper and DEXA measurements. Figure 1 shows that the agreement between skinfold and DEXA apparently decreases over the visits. King et al. (2007) explained that this phenomenon may occur because the skinfold method is only capable of detecting subcutaneous fat, while DEXA detects subcutaneous, breast, lower body and visceral fat. Moreover, female adolescents may have a considerable fat increase in breast, lower body and/or visceral fat over this age range (King et al., 2007). Consequently, this reinforces the interest in estimating the agreement profile between these methods for the body fat measurements over ages ranging from 12.5 to 13.5 years old, rather than summarizing it in a single coefficient as proposed by King et al. (2007). Hence, we created a new variable called TIME given by 12 × (age -12), which represents the time in months after the first visit (baseline). Now let y ijk be the measurement taken on the i-th individual, by the j-th method at the k-th visit. We then fit a random intercepts and slopes linear regression model, given by is the half-vectorization of a symmetric matrix G formed from only the lower triangular part). Using model (4), we estimate the LCC, LPC and LA statistics as well as their 95% bootstrap confidence intervals based on 10,000 pseudo-samples using the lcc() function: R> data(bfat, package = "cccrm") R> library(dplyr) R> bfat < -bfat %>% + mutate(VISITNO = replace(VISITNO, VISITNO == 2, 12.5)) %>% + mutate(VISITNO = replace(VISITNO, VISITNO == 3, 13)) %>% + mutate(VISITNO = replace(VISITNO, VISITNO == 4, 13.5)) %>% + mutate(SUBJECT = factor(SUBJECT)) %>% + mutate(MET = factor(MET, labels = c("1 hour", "2 hours"))) R> bfat$TIME < -12 Ã (bfat$VISITNO -12) R> set.seed(134) R> m.bfat.1 <-lcc(data = bfat, subject = "SUBJECT", resp = "BF", + method = "MET", time = "TIME", qf = 1, qr = 1, + components = TRUE, ci = TRUE, nboot = 10000) Convergence error in 902 out of 10,000 bootstrap samples. The output of model m.bfat.1 indicates that in 902 (9.02%) of the pseudo-samples, the likelihood maximization algorithm failed to converge, where most of these failures were a consequence of specific bootstrap sample patterns. An alternative procedure to decrease the percentage of convergence failures is by increasing the iteration limit and/or changing the optimization method from nlminb to optim. In the lcc() function, the user can include a list of optimisation control additional arguments in the lme.control() function: R> set.seed(134) R> m.bfat.2 <-update(m.bfat.1, lme.control = list(opt = "optim")) Convergence error in 76 out of 10,000 bootstrap samples. We may then plot the LCC, LPC and LA with their respective confidence intervals by executing R> lccPlot(m.bfat.2) R> lccPlot(m.bfat.2, type = "lpc") R> lccPlot(m.bfat.2, type = "la") The estimates of LCC, LPC and LA, their confidence intervals, and figures indicate that the agreement and accuracy profiles between the skinfold caliper and DEXA measurements decrease over time, while the precision profile, represented by LPC, remains constant (Fig. 2). Therefore, a first conclusion is that the agreement profile decreases over time because the accuracy is decreasing.
Moreover, there is a moderate to weak agreement profile, where the greatest LCC estimate was 0.6654 at age 12.5 (95% CI [0.5688-0.7395]) and the smallest LCC estimate was 0.4588 at age 13.5 (95% CI [0.3354-0.5599]). This result reinforces the discussion presented by King et al. (2007), who provided physiological explanations for this phenomenon due to fact that the skinfold method is not capable to detect breast, lower body and visceral fat, which increases over this age range. Clearly, as the skinfold method detects less fat than the other, the accuracy between them tends to decrease since the expected value difference is greater (Fig. 2C). The concordance correlation coefficient between fitted values of the mixed-effects model and observed values is presented as goodness of fit (gof) and was approximately 0.92. This result shows that the model can reproduce the observed values quite well.

The papaya peel hue dataset
In commercial fruit classification, one of the most important variables is the peel hue because it is used to determine fruit ripeness (Mendoza & Aguilera, 2004;Oliveira, Zocchi & Jacomino, 2017). This is very important to plan harvesting procedures. In an experiment described in Oliveira, Hinde & Zocchi (2018), the hue component was measured for a sample of 20 papaya fruits using a flat-bed scanner (HP Scanjet G2410) and a colorimeter (Minolta CR-300) (Konica Minolta, 2003). The hue of each fruit was measured daily using both devices for a period of 15 days, where four equidistant points on the equatorial region were observed using a colorimeter, and 1,000 points over the same region were observed using a scanner. The circular mean hue was calculated for the ith fruit, i = 1, 2, . . . , N, measured by the jth method, j = 1, 2 at time t ik , 390 k = 1, 2, . . . , n i . As the multivariate von Mises distribution of the hue is highly concentrated around its overall mean, we assume that its distribution can be treated as a normal distribution with mean m h = 391 and covariance matrix R = I σ 2 ε . The aim of the agreement study here was to determine whether the scanner can reproduce the mean hue measurements taken by the colorimeter on the same fruit over time. The colorimeter is faster and easier to use than a flatbed scanner. Additionally, each image obtained with the scanner needs to processed by an image manipulation program to select the object and extract its pixel-by-pixel information. Our major interest here is in the longitudinal accuracy profile, because high values over time would suggests that the fruit's topography does not influence the measurements taken by the scanner.
We start by making a plot of individual profiles grouped by measurement device, as well as a scatterplot of the hue data (Fig. 3). We fit a second-degree polynomial model over time for each fruit considering all observations taken by both devices, and obtain the 95% confidence intervals for the coefficients (Fig. 3C). Apparently, there is a moderate agreement between the scanner and the colorimeter, which increases as the mean hue decreases. However, this could be due to the smaller number of fruits at the end of the experiment (fruits that presented disease had to be dropped out of the study). Let y ijk be the peel hue measured on fruit i, using method j at time point k. We start by fitting a second degree polynomial mixed-effects model with random intercepts, linear and quadratic coefficients, written as (5), the LCC is given by We can fit this model to estimate the LCC, LPC and LA statistics as well as to compute their 95% bootstrap confidence intervals based on 10,000 pseudo-samples using the lcc() function directly: R> data(hue) R> set.seed(6836) R> m.hue.2 <-lcc(data = hue, subject = "Fruit", resp = "H_mean", The model used to estimate r jj 0 t k ð Þ as well as its sampled and fitted values can be extracted by using summary(m.hue.2, type = "model") and summary(m.hue.2, type = "lcc"), respectively. Moreover, a graphical representation of fitted values and confidence intervals for LCC, LPC and LA can be obtained by executing R> lccPlot(m.hue.2) R> lccPlot(m.hue.2, type = "lpc") R> lccPlot(m.hue.2, type = "la") Apparently, the estimated LCC increases over time (Fig. 4A). However, note that it is necessary to check whether the model assumptions were fulfilled because the estimates for the LCC and its bootstrap confidence intervals may be biased under a misspecified model. We therefore checked (i) the normality assumption for the errors, by producing a normal plot of the within-group standardized residuals (Fig. S1A), which indicates that this assumption for the within-group errors is almost plausible, and is not far from a normal distribution; (ii) the homoscedasticity over time was evaluated via a plot of the standardized residuals versus time (Fig. S1B), which indicates an apparent residual correlation for observations taken by the colorimeter and greater between-subject variance for observations taken by the scanner (Fig. S2); (iii) the normality assumption for the random effects (Fig. S1C), which are verified by producing a normal plot for b 0i ; b 1i and b 2i . Additionally, the goodness of fit (gof) was 0.992, indicating a high concordance among the model fitted values and observed values. Thus, we update the model m.hue.2 to include different variances for each level of the factor "method", where the variance function is given by: To ensure identifiability we assume that d 1 ¼ 1. We also created a regular sequence from the time variable that can be used to make predictions R> lcc_time <-with(hue, list(time = Time, from =min(Time), + to=max (Time), n=50)) This model can be specified in the lcc() as R> set.seed(6836) R> m.hue.3 <-update(m.hue.2, var.class = varIdent, weights.form = "method", + time_lcc = lcc_time, + lme.control = lmeControl(opt = "optim")) Convergence error in 1187 out of 10000 bootstrap samples. The result shows that we reject H 0 in favour of H a at a significance level of a = 0.05, that is, the inclusion of the function gðd j Þ ¼ d 2 j was significantly important in explaining the extra variability between observations taken at different times.
Moreover, the gof between fitted and observed values for m.hue.3 model has, practically, the same value as presented for the m.hue.2 model. R> summary(m.hue.3, type = "lcc")$gof [1] 0.9915905 Although the parameter δ 2 2 was important to explain the variability by method, we can see in Fig. S3 that the model assumptions were still not completely fulfilled because there is a possible correlation among residuals for the colorimeter methodology. However, this model is more plausible than the first one. The sample semivariogram estimate is presented in Fig. S3B and it appears to vary non-randomly around 0.9. Further studies involving the inclusion of correlation structures for the within-group residuals to compute the longitudinal concordance correlation function are still in development.
The agreement profile changes over time, being smaller at the beginning of the experiment and increasing to values close to 1 (Fig. 5). If we consider values above 0.80 for the lower bound of the CI as an indication for interchangeability between the use of the two methods, the colorimeter could be used from the 12th day onwards.

The blood draw dataset
The blood draw dataset was used as an example in the cccrm package developed by Carrasco et al. (2013). This dataset comes from a study conducted by the Asthma Clinical Research Network (ACRN) (Martin et al., 2002). In this double-blinded clinical trial, 144 subjects were randomized to one of six inhaled corticosteroid combinations, and the primary aim of the study was to estimate dose-response curves with respect to adrenal suppression. After two weeks, the subjects were admitted for overnight testing once a week, for the next five weeks (visits). Blood samples were collected hourly between 8pm and 8am. Then, the plasma cortisol area under the curve (AUC) was calculated using the trapezoidal rule. A secondary objective here was to assess the agreement of the results from blood sampling performed hourly or every two hours, when calculating the plasma cortisol AUC. As an example, we used all individual profiles whose expected value can be described using a second or lower degree polynomial mixed-effects model: R> data(bdaw, package = "cccrm") R> bdaw$SUBJ <as.factor(bdaw$SUBJ) R> bdawMET <as.factor(bdaw$MET) R> levels(bdaw$MET) <-c("1 hour", "2 hours") R> length(unique(bdaw$SUBJ)) R> library(nlme) R> fit_list <-lmList(AUC~poly(VNUM, 4) | SUBJ, data = bdaw) R> int <-intervals(fit_list) R> zero_included <-function(x) { + flag <-min(x) < 0 & max(x) > 0 + return(flag) + } R> selected_subj<-names ( + which(apply(int[,,4], 1, zero_included) & + apply(int [,,5], 1, zero_included))) R> bdaw_subset <-subset(bdaw, SUBJ %in% selected_subj) The scatterplot of the AUC taken every two hours as a function of the AUC taken each hour and plots of the 19 selected individual profiles are presented in Fig. 6.
There seems to be a moderate to strong agreement between the plasma cortisol AUC measurements from blood draw samples taken hourly and every two hours (Fig. 6A). Furthermore, we can also see high variability between subjects and that the AUC decreases over time for some subjects (Fig. 6B). We begin by fitting a first degree polynomial model with a subject random intercept and slope model. R> m.bw.1 <-lcc(data = bdaw_subset, subject = "SUBJ", + resp = "AUC", method = "MET", time = "VNUM", + qf = 1, qr = 1) R> summary(m.bw.1, type = "lcc")$gof [1] 0.8850628 This model gives only a moderate fit to the data and this is confirmed by the estimated CCC between fitted and sampled values of 0.885 (Fig. 7C). Two possible reasons are (i) we need a higher degree polynomial mixed model to correctly describe some subject profiles, and/or (ii) a possible heteroscedasticity across time, potentially caused by three somewhat different subject profiles, that should be included in the model (Fig. 7B). In addition, the normality assumptions for the within group error and random effects were easily checked by producing the normal plot with simulation envelope (Figs. 7E and 7F) and seem to be broadly plausible.
R> plot(m.bw.1, which = c(1, 2, 4, 5, 6)) We now fit a second degree polynomial model with random subject effects for all coefficients and compute the 95% bootstrap confidence intervals based on 10.000 bootstrap samples for LCC, LPC and LA components.    Figure 5 shows the fitted LCC, LPC, and LA for concentration of plasma cortisol AUC between measurements taken every hour and taken every 2 h and their respective 95% confidence intervals.
R> lccPlot(m.bw.2, control = list(scale_y_continuous = c(0.85, 1))) R> lccPlot(m.bw.2, type = "lpc", + control = list(scale_y_continuous = c(0.85, 1))) R> lccPlot(m.bw.2, type = "la" + control = list(scale_y_continuous = c(0.85, 1))) These results show that even though the trend across time is essentially linear at the population level, there is a non-linear trend at the individual level to be more investigated. We can observe that the fitted values and confidence intervals for the LA component were very close to 1 over time, indicating a very high accuracy between methods (Fig. 8C). Consequently, the LCC values depend almost exclusively on the LPC, which indicates a possible problem related to the precision between methods over time, suggesting the use of blood sampled every hour, rather than every two hours, is desirable for this group of patients. It is worthy to note that, as the diagnostic seems broadly plausible for the second degree mixed effects polynomial model (m.bw.2), under this model the LCC, LPC, and LA are fourth degree polynomials functions of the time variable.

DISCUSSION
The package lcc provides a convenient and versatile tool for estimation and inference about the LCC, LPC and LA. The estimation of these three statistics provides a complete evaluation of the agreement between methods over time (Oliveira, Hinde & Zocchi, 2018). These statistics are also very appealing for graphical illustration. The package supports balanced or unbalanced (dropouts) experimental designs or observational studies, multiple methods, inclusion of covariates in the linear predictor to control systematic variation in the response, and the inclusion of different variancecovariance structures for random-effects and residuals. Residual diagnostic and goodness of fit can be evaluated easily via the generic function plot(), which provides up to six built-in diagnostic plots. Furthermore, the anova(), AIC() and/or BIC() functions can be used to aid in model selection.
Statistical inference for the estimators of r jj 0 t k ð Þ, r ðpÞ jj 0 t k ð Þ and C jj 0 t k ð Þ can be obtained using bootstrap confidence intervals based on approximations of their empirical distributions by the normal distribution, or from percentiles of their bootstrap sampling distribution. These methods are, however, computationally intensive.
To the best of our knowledge, there is no package available to estimate the extent of longitudinal agreement between methods. The lcc package can be viewed as an extension of the R and SAS cccrm package developed by Carrasco et al. (2013). This package handles the time as a factor in the model, and computes the concordance correlation coefficient, which can be viewed as a measure that summarises the interchangeability between methods in relation to all their measurements.
The importance in estimating the LPC, as a measure of precision, and the LA, as a measure of accuracy, was demonstrated in Section 4 (Fig. 5). In particular, both of these statistics can be used jointly to determine if a moderate or small agreement between methods at time t k = t is related to a precision or an accuracy problem, as suggested by Lin (1989); Barnhart & Williamson (2001); Lin (1992); Ma et al. (2010). In the papaya hue example, the moderate LCC is highly influenced by a moderate LPC, suggesting that if we increase the number of points observed with the colorimeter on the equatorial region up to day 10, the colorimeter will probably be able to reproduce the measurements taken by the scanner. Future studies involve the determination of the sample size over time based on the least acceptable LCC, assuming we can accept up to a certain amount of loss in the LPC and in the LA, as discussed by Lin (1992).
It would be useful to mention that, as a naive analysis, the Bland-Altman method (Bland & Altman, 1986) is commonly used to calculate the mean difference between two methods (as a measurement of "bias") with the addition of 95% limits of agreement (LoA) in the analysis of repeated-measures studies (including longitudinal data). If these methods are being compared without a 'golden standard' reference (Lin, 1989), an improved Bland-Altman interval approach is preferred (Liao & Capen, 2011).
Although these approaches are not suitable to analyse repeated-measures designs, researchers still use it to explore the data because the method is simple to use. However, even if the Bland-Altman method has observations outside of LoA range, two methods can have a very high concordance correlation when the correct variance-covariance structure is accounted for in the model, as discussed by Zhao et al. (2009). This demonstrates the value of the availability of packages that enable the selection of matrix structures for random effects and error term when calculating the longitudinal concordance correlation.
Another interesting remark is that when the systematic difference between methods is zero, the CCC calculated based on a mixed-effects model is equivalent to the intraclass correlation coefficient (ICC) (Carrasco, King & Chinchilli, 2009). In the same direction, the ICC as a function of the time variable is a particular case of the longitudinal concordance correlation function when S 2 jj 0 t k ð Þ ¼ 0. If we consider the repeated measures, the ICC gives us the percentage of total variability explained by subject over time, and, consequently, it is not comparable with the LCC in terms of a longitudinal agreement index between methods.
Finally, all examples discussed in Section 4 show that our methodology is flexible, and can be applied to many different data types. One limitation of the lcc package is that, for the time being, the covar argument only allows for including fixed-effect covariates in the linear predictor. We plan to update our package in the near future to handle with the inclusion of fixed-effects and random-effects covariates, as well as interaction effects.

CONCLUSION
The lcc package implements methods to estimate the LCC, LPC and LA functions as well as their bootstrap confidence intervals. In this package, we included different structures for the variance-covariance matrices of random-effects and residuals, allowing estimation of the extent of longitudinal agreement between methods under different assumptions. Functions plot(), for diagnostics, summary() and lccPlot(), for numerical and graphical summaries, respectively, and anova(), AIC(), BIC(), for model selection, make the package flexible and easy to use. Furthermore, the mixed-effects model based approach to compute the LCC allows us to work with both balanced and unbalanced experimental designs and observational studies.