The effectiveness of item parceling to increase the model fit: A case study of PAPs

The impact of item parceling to improve model fit indexes in confirmatory factor analysis has been on debate amongst psychometricians. In this study, the effectiveness of item parceling was examined using Tes Potensi Akademik Pascasarjana (PAPS) or Postgraduate Academic Potential Test in Universitas Gadjah Mada. Item parceling approach, second-order approach, and item-based approach of confirmatory factor analysis (CFA) were used for examination. Data were collected from a sample of 1374 postgraduate candidates in 2017. The result found that model fit indices such as the chi-squared test, comparative fit index, Tucker-Lewis index, and standardized root mean square residual were improved in the item parceling approach when compared to item based approach. Interestingly, the root mean square error of approximation were deteriorating in the item parceling approach. The finding of this study suggested that model dimensionality and sample size should be carefully considered when using the item parceling approach.


INTRODUCTION
The usage of structural equation modelling (SEM) is used in the social sciences field of study because of its capability to estimate relationships between unobserved constructs, known as latent variables, from observable variables (Hancock, 2003). SEM analytic techniques are rapidly used in social and behavioral science study for the causal modeling of multivariate and complex data sets which provided multiple measures of proposed constructs (Hair, 2019). By employing SEM analysis, the researchers are able to create a relationship model among multiple predictor and criterion variables, construct underlying unobservable latent variables, represent errors in measurement for observed variables, and statistically examine an earlier substantive or theoretical and measurement assumption of an empirical data (Chin, 1998).
One of the most popular techniques in SEM analysis family is confirmatory factor analysis (CFA). The main goal of CFA is to find out if the data fit a hypothesized measurement model. This hypothesized model can be developed based on theory or previous research. Suhr (2003) indicates that CFA allows the researchers to test the hypothesis regarding the existence of the relationship between observed variables and their underlying latent construct(s). Burns et al. (2001) implemented CFA to evaluate five different organization model of Attention-Deficit/ Hyperactivity Disorder (ADHD) and oppositional defiant disorder (ODD) in DSV-IV. CFA was also used in Wongpakaran et al. (2017) study to confirming the factor structure of the re-

Item Parcels
In the default application of SEM, individual items are used as indicators of the underlying construct that is purportedly measured by a given scale. Item-based approach is still widely used in the social field of study research. Another alternative approach is to aggregating items into several parcels as the indicators of the target construct or referred as parceling. Parceling is a measurement strategy that is commonly used in multivariate approaches to psychometrics, particularly for use with latent-variable analysis techniques that can be categorized as an aggregatelevel indicator consisted of the sum or average of two or more items, responses, or behaviors (Little et al., 2002). However, parceling is not the same as creating subscales. Subscale usually based on certain theory that might have an explanation of its relation, whereas parceling did not have any relation and done in non-theoretical approach (Bandalos, 2008). The difference between a model that use item based approach and parceling approach in SEM analysis can be seen in Figure 1.  (Matsunaga, 2008) Parcels can be utilized to reduce the number of indicators of a scale that consisted of many items and surveys when performing CFA (Bandalos & Finney, 2001). Parceling is usually done in studies with dichotomously or coarsely categorized measurement indicators to meet certain assumption of normality and continuity of statistical fit test (Bandalos & Finney, 2001). Marsh and Hau (2004) proposed several advantages of item parceling such as more definitive rotational results, less violation of normality assumptions, increased reliability, fewer parameters to be estimated resulted on more stable parameters. Parceling was found to be more beneficial in increasing scale communality and the common-to-unique factor for each indicator thus, it can reduce the random error score (Matsunaga, 2008). Forming parcels by combining items reduces coarsely distributed data that are unrelated to the construct and increases the ratio of the true score to the total score, and make the reliability higher (Matsunaga, 2008). Compared to item-based approach, Bandalos (2002) study also demonstrates that chi-square value, CFI, and RMSEA indexes parcel-based solutions are better than item-based approach. From the modelling perspective, SEM solutions based on parcelled data provide stabler estimation and fit the data better than their item-based counterparts (Holbert & Stephenson, 2002).
The usage of item-parceling approach evoke several opinions from researcher. Marsh et al. (1998) stated that, although parcel-based models in general show good performance, itembased solutions would be preferable when all relevant factors including the parameter estimates are taken into consideration. Another important critique of parceling concerns multidimensionality and model misspecification. When the given scale is multidimensional, the use of parcels can obscure rather than clarify the factor structure of the data (Bandalos, 2002).
To prove this argument, a side-by-side comparison of item-based approach and itemparceling approach in the purpose of obtaining adequate absolute model score index should be applied with the same test and data. This study wants to find out the efficiency of parceling method compared to item-based method and second-order method to determine the absolute model score index with the same test and sample size.

Participants
Participants in this study are the candidates for postgraduate program of the year 2017 (N=1374) in UGM. The data score in this PAPs test in a secondary data from Psychodiagnostic Test Development Unit (UPAP) UGM as the official administrator of PAPs test.

Instruments
The purposed test used for this analysis is Postgraduate Academic Potential Test or Tes Potensi Akademik Pascasarjana (PAPs) test. PAPs test was developed by UPAP to predict candidates' potential in pursuing a higher level of education (Widhiarso et al., 2015). The previous version of PAPs test possessed high reliability with the score of 0.92 (Belinda, 2015). The component score of PAPs test is obtained by summing every correct item on each subtest. Each correct item will be given 1 score and the wrong item will be scored 0. There is no score reduction for the wrong item. The most updated version of PAPs test is PAPs test series E. PAPs test consists of three subtests, namely verbal subtest, quantitative subtest, and figural subtest. Each subtest consists of four to five components and 50 items. In total, this test consists of 150 items. The specification of each subtest is elaborated in Table 1.

Data Analysis Procedure
Before conducting SEM analysis, the technique of handling missing data should be properly considered because equations and formulas in SEM analysis requires complete data. In PAPs test, all wrong answer and missing data from each participants are replaced with zero.  There are three types of approaches in this study; item-based approach, second-order approach, and item-based approach. Second-order approach was used in this study because PAPs test is an assessment that measures latent construct of broad general intelligence. Second-order approach were made based on the latent construct measured in PAPs test: verbal, quantitative, and figural. Figure 2 shows the differences between each approach when applied in verbal subtest. Then, model-fit indexes in SEM analysis on item parceling analysis and individual item analysis will be compared.
Item parcels were calculated by taking the mean of each item in each component. Each component consisted of five to 15 items. There were 13 parcels produced for each test in which each parcel consisted of five until 15 items. The smallest parcel consisted of five items and the largest parcel consisted of 15 items. Data analysis in this study was conducted using Mplus version 7 software.
There are several recommended indices than shows the model fit score: chi-squared test, root mean square error of approximation (RMSEA), comparative fit index (CFI), Tucker-Lewis Index (TLI), and standardized root mean square residual (SRMR) (McDonald & Ho, 2002;Hooper et al., 2008;Kline, 2016). A model can be categorized to have a good model fit when the CFI and TLI score are above 0.90 and RMSEA and SRMR score below 0.08 (Hu & Bentler, 1999).

Findings
Based on Table 2, the descriptive statistics result analyzed from 1.374 sample of PAPs test series E1 shows that the mean of each component ranges from 1.67 to 6.06. Overall, this test has fairly symmetrical data distribution (Synonym, Antonym, Word Analogy, Number Series, Quantitative Comparison, Serial, Classification, and Picture Analogy) to moderately skewed data distribution (Analytical, Arithmetic, Algebra Concept, Geometric Reasoning, and Diagram).  Table 3 shows the descriptive statistics result analyzed from 1.374 sample of PAPs test series E2. Similar with the the E1 series, this test has symmetrical data distribution (Synonym, Analogy, Word Analogy, Analytical, Number Battery, Quantitative Comparison, Serialization, Classification, and Picture Analogy) to moderately skewed data distribution (Arithmetic, Algebra Concept, and Diagram) to highly skewed data distribution (Geometric Reasoning).   Table 4 shows the result factor analysis of the PAPs test series E1 score. As previously mentioned, the indices used to measure model fit were Chi-square index, CFI, TLI, RMSEA, and SRMR. According to Hu and Bentler (1999), a data can be said to be in accordance with the model if TLI and CFI values > 0.90 and RMSEA and SRMR values < 0.08. Factor analysis score of PAPs test E1 (N = 1.374) in item-based approach showed that CFI and TLI score for verbal subtest are 0.960, and 0.958. RMSEA and SRMR scores were 0.023 and 0.026. From these indices, it could be proved that item-based approach had satisfactory model fit scores for verbal subtest. For quantitative and figural subtest, the CFI and TLI scores of this subtest using item-based approach did not provide the scores required to achieve a good model fit (<0.90). When using second-order approach, CFI and TLI scores of all subtests were higher than item-based approach. RMSEA score on all subtest was also decreases, whereas SRMR score remained the same for verbal and figural subtest and higher for quantitative subtest. Item-parceling approach provided higher CFI and TLI score for all subtest compared to item-based and second-order approach. However, it provided higher RMSEA score for verbal and figural subtest. On the other hand, item-parceling approach provided lower SRMR score for all subtests and RMSEA score for figural subtest. Table 5 shows factor analysis score result for PAPs test E2 (N = 1374). In item-based approach, CFI and TLI scores for quantitative and figural subtest did not exceed the minimum requirement to achieve adequate model fit, whereas the RMSEA ans SRMR scores already fulfilled the requirement. Second-order approach in this test were analyzed using AMOS version 22 for figural subtest because the iteration was not found when the analysis was done using Mplus version 7. It provided higher CFI and TLI scores for all subtest but not adequate for figural subtest. Second-order approach did not give any effect on the RMSEA and SRMR scores of verbal subtest but it gave lower score for quantitative and figural subtests. Item-parceling approach provides adequate CFI, TLI, and SRMR scores to achieve model fit. But it only provided good RMSEA score for figural subtest. Based on the aforementioned descriptions, it can be seen that item parceling produce better output in CFI, TLI, and SRMR score as the score improved compared to second-order approach and item-based approach for both version of the test. However, RMSEA score for both version of the test detoriates.

Discussion
Based on the analysis of three model approaches, it can be seen that overall parceling improves several model fit indexes such as chi square value, CFI, TLI, and SRMR in both PAPs test series E1 and E2 compared to item-based approach and second-order approach. In addition, item-parceling approach tends to show a better score on model fit indexes compared to item-based model in both versions of the test for all subtests. The finding of this study corresponds with the previous study which stated that the usage of item parceling can provide a better model fit score (Nasser & Wisenbaker, 2003). De Bruin (2004 stated that parceling possesses unlimited advantages such as higher reliability, richer scale points, better approximation of an interval-scale level, and normality. These advantages can make parceling technique satisfies the assumptions of factor analysis better when it is compared to individual items approach. Balanced item-to-construct parceling was found to improve model fit by reducing the p/f ratios and it also shows a better model fit compared to item-level model (Wilkinson, 2007).
Interestingly, the finding of this study discovers that the RMSEA index for parceling produces bigger number for several components in both versions of the test. RMSEA index for verbal and quantitative in PAPs test series E1 and verbal, quantitative, and figural for series E2 present bigger scores for parceling approach compared to item based and second order approach. This finding is not in line with the previous one which stated that SEM analysis employing parcel based approach resulted in bigger score of CFI and lower score of RMSEA and SRMR (Orcan, 2013). It is generally known that RMSEA is an indicator that shows the difference between observed variance matrix per degree of and the hypothesized covariance matrix which represent the model (Chen, 2007). RMSEA is closely related to degree of freedom, thus it tends to be more sensitive to the complexity of the model and the number of estimated parameters in the model (Byrne, 2012). RMSEA works by seeking a way to balance out the model as measured using noncentrality parameter operationalized as the models degree of freedom by con-sidering models parsimony. By adding more parameters, the degrees of freedom of the model decreases. If the degree of freedom of a model decreases, RMSEA corrects models by dividing the noncentrality parameter by its degrees of freedom and will create a smaller score (Bandalos, 2018). Even with a correctly specified models, RMSEA can be lower than the cut-offs. It is frequently happened if it has small degree of freedom (Kenny et al., 2014). In the parcelling based approach, the parameter estimated in models might be reduced and produced low degree of freedom. Due to the low number of parameters, there are some variances that cannot be defined because it cannot fit to any parameter. This causes a small number of explained variance. The amount of indicators and parameters in creating a parcel must be properly noted to anticipate this issue. Thus, the parcels can be classified into two: those with one component as represented by an indicator and the ones containing only odd or even items.
Another reason why item parceling can improve model fit indices in PAPs is because of the test unidimensionality. Unidimensionality of measurement is defined by its internal and external consistency (Anderson, 1984). A unidimensional measurement shows good internal consistency between each component. Each component within a unidimensional structure must have a close relationship in construct validity. PAPs test consists of three components: quantitative, verbal, and figural. A construct validity study of the PAPs test has been conducted before and the result of the study proves that the PAPs test unidimensionality model is supported by the fulfilment of fit indices against established criteria (Widhiarso, 2019). Previous studies have found out that parceling method is more suitable when it is applied in unidimensional measurement compared to multidimensional measurement. The majority of the research and literature related to parceling support the evidence that the dimensional nature of a measured construct might have an impact towards the validity and accuracy of parceling techniques. Item parceling in unidimensional structure can increase the model-data fit and provides less biased solution for coarsely categorized item with non-normal data distribution (Bandalos, 2002). Many researchers suggest that parceling should be done only for unidimensional construct (Bandalos & Finney, 2001). It is also stated that item parceling can effectively improve model fit indices when it is applied to a unidimensional scale (Little et al., 2002). Thus, it is safe to say that unidimensional construct of intelligence measured in PAPs test supports the usage of item-parceling approach.
The parceling method in this study is a homogenous parceling in which each item is aggregated based on their component and shares a similar characteristic with each other. When aggregating items based on their construct, the variance of each parcel is also represented by its component (Cole et al., 2015). The factor loadings of homogenous and heterogenous parcels can impact the model fit indices score. Homogenous parcels tend to show better performance in model fit indices compared to heterogeneous parcels because the factor loadings in homogenous parcels are smaller than heterogenous parcels. When the factor loading is smaller, the chi-square, RMSEA, and SRMR scores get smaller while the CFI and TLI scores get bigger (Cole et al., 2015). Lower chi-square, RMSEA, and SRMR scores and higher CFI and TLI scores become the desired goals when analysing data using CFA. It can help fulfilling the existing criteria for a good model fit. Smaller factor loading can improve the model fit score better because it can reduce the weight of observed covariances (Cole & Preacher, 2014). The rule of thumb of this concept is that the model tends to fit better when the observed covariances is smaller or almost zero. When the observed variance is smaller, it reduces the discrepancies between the observed and implied covariances (Cole et al., 2015). Item parcels tend to substantially increase the power to detect miss-specification within the structural model without affecting the parameter accuracy (Rhemtulla, 2016).
The result of this study indicates different impact of item parceling between each components. A construct validity study of PAPs by Widhiarso (2019) found that quantitative subtest shows higher factor loading compared to verbal subtest and figural subtest. Amongst all of the components of the test, verbal subtest have the lowest factor loading. This might be caused by the distinctive features of quantitative subtest. Quantitative subtest contains several attributes, such as thinking systemically as well as using precise strategies and speed in data processing and accuracy. These attributions tend to show small variances between each subject and does not show a huge difference, especially with the lack of cultural influence and verbal interference in the subtest. Those things make quantitative subtest generates an almost uniform way of thinking between each subject. This uniform way of reasoning makes quantitative subtest produces low variation among its data (low data variation). As previously mentioned, smaller variances can blur out the discrepancies between observed and implied covariances (Cole et al., 2015). Thus, it can make parceling approach worked better with homogenous data. The limitation of this study is that this study does not provide information about how sample size can influence the effectiveness of item parceling. This study did not have the alteration of the sample size of PAPs test included in the analysis process. Therefore, the present study cannot dispense comprehend information about item parceling efficacy on different sample sizes.

CONCLUSION
Overall, it is safe to say that parceling approach is a useful technique for a measurement with unidimensional construct. Item parceling can alleviate some obstacles such as unclear internal consistency, coarsely distributed data, and correlation errors between items. However, when a measurement possesses an unclear construct, parceling might not be suitable to conduct, depends on how a parcel is constructed. Therefore, item parceling is more suitable to be used in CFA rather than EFA. Parceling is primarily a way to obtain the estimation of latent construct and the focus is on accurate estimates of the direct and indirect effects between latent factors. Parceling is commonly used for a long scale that usually consisted of more than 50 of 100 items. In this length of scale, medium-sized sample (example N = 200) might not work for parceling as it can create estimation issue. This might be one of the limitations of this study. Since this study does not analyze the impact of parceling methods when applied to different sample size, future study should consider the effect of sample size on the usage of parceling techniques in PAPs test.