Best-fit model of exploratory and confirmatory factor analysis of the 2010 Medical Council of Canada Qualifying Examination Part I clinical decision-making cases

Purpose: This study aims to assess the fit of a number of exploratory and confirmatory factor analysis models to the 2010 Medical Council of Canada Qualifying Examination Part I (MCCQE1) clinical decision-making (CDM) cases. The outcomes of this study have important implications for a range of domains, including scoring and test development. Methods: The examinees included all first-time Canadian medical graduates and international medical graduates who took the MCCQE1 in spring or fall 2010. The fit of one- to five-factor exploratory models was assessed for the item response matrix of the 2010 CDM cases. Five confirmatory factor analytic models were also examined with the same CDM response matrix. The structural equation modeling software program Mplus was used for all analyses. Results: Out of the five exploratory factor analytic models that were evaluated, a three-factor model provided the best fit. Factor 1 loaded on three medicine cases, two obstetrics and gynecology cases, and two orthopedic surgery cases. Factor 2 corresponded to pediatrics, and the third factor loaded on psychiatry cases. Among the five confirmatory factor analysis models examined in this study, three- and four-factor lifespan period models and the five-factor discipline models provided the best fit. Conclusion: The results suggest that knowledge of broad disciplinary domains best account for performance on CDM cases. In test development, particular effort should be placed on developing CDM cases according to broad discipline and patient age domains; CDM testlets should be assembled largely using the criteria of discipline and age.


INTRODUCTION
The Medical Council of Canada Qualifying Examination Part I (MCCQE1) is a two-part computer-based examination that assesses the knowledge, skills, and attitudes judged essential for entry into supervised post-graduate medical training according to the specific statement of objectives of the Medical Council of Canada [1]. The first part of the examination includes 196 five-option, single-best-answer (A-type) multiple choice items. These 196 multiple-choice questions are distrib-uted into seven sections that contain 28 items apiece. The second part of the MCCQE1 is composed of approximately 60 clinical decision-making (CDM) cases. Each CDM case includes one to five questions, for a total of approximately 80 questions. CDM cases included in the MCCQE1 provide a measure of the problem-solving and decision-making skills of candidates as they pertain to specific clinical scenarios. The MCCQE1 is administered in two multi-week windows at over a dozen test sites located throughout Canada. The examination is internet-delivered at dedicated secure sites located largely in Canadian medical schools. Candidates have up to 3.5 hours to complete the multiple-choice question portion of the MCCQE1, whereas up to four hours are allocated for completing the CDM cases. This study aims to compare the fit of a http://jeehp.org J Educ Eval Health Prof 2015, 12: 11 • http://dx.doi.org/10.3352/jeehp. 2015.12.11 number of exploratory and confirmatory factor analysis models to the 2010 combined spring and fall MCCQE1 CDM item response matrix using the Mplus software package (Muthen & Muthen, Los Angeles, CA, USA) [2]. The results of this investigation will provide information relevant to a range of psychometric analyses related to CDM cases, including how to best estimate scores and calibrate this component of the MC-CQE1, and will also help develop the unidimensionality test for estimating the item parameters for CDM cases.

MCCQE1 cohort
The present investigation focused on the combined spring and fall 2010 MCCQE1 examinee cohorts. The spring administration population is composed primarily of first-time Canadian medical graduates (CMGs), whereas international medical graduates (IMGs) comprise the bulk of the fall testing cohort. Analyses were centered on all first-time test takers for both the spring and fall 2010 MCCQE1 administrations. A breakdown of the cohort by training (i.e., CMG vs. IMG) and test administration is provided in Table 1. The majority of the 2010 combined cohort was composed of CMGs (2,429, 60.2%) and does conform to expected cyclical patterns; that is, CMGs made up the majority of the spring 2010 MCCQE1 administration, whereas IMGs largely took the test in the fall administration window.

MCCQE1 bank
The bank of multiple-choice items available for the combined 2010 MCCQE1 administrations included several thousand items. Over 100 CDM cases were also available for use in the 2010 bank. CDM cases are developed to target problemsolving and clinical decision-making skills. Examinees were presented with case descriptions, followed by one or more test questions that assessed key clinical issues in the resolution of the case. Questions could relate to eliciting clinical information, ordering diagnostic procedures, making diagnoses, or prescribing therapy.

Analyzed cases
Examinee responses reflect decisions made in the management of actual patients. CDM cases include both short-menu and write-in item formats, and they are polytomously scored on a proportion-correct scale. For the purposes of this study, these proportion-correct case scores were integerized (i.e., transformed to whole numbers) to enable analyses using Mplus. The majority of CDM cases had either two or three response categories (63% of the bank). Given the very sparse nature of the CDM case matrix and the challenges that this poses from a covariance coverage perspective in Mplus, the final analyses were conducted on a set of 17 CDM cases that were culled from the original set of cases. The cases were representative of the bank with respect to a number of classification variables.

Analyses
All analyses were carried out using the structural equation modeling software program Mplus [2]. Initially, the fit of oneto five-factor exploratory models (exploratory factor analytic models [EFAs]) was assessed for the combined 2010 CDM item response matrix. Given the non-normal nature of CDM case score distributions, weighted least-squares parameter estimation, using a diagonal weight matrix with standard errors and mean-and variance-adjusted chi-square tests (using a full weight matrix), was implemented [3]. The latter estimation method is appropriate for data that violate the assumptions of more common methods (such as the normality assumption underlying the maximum likelihood and the generalized least-squares estimations).
The second set of analyses focused on fitting a number of confirmatory factor analytic models (CFA) to the same 2010 item response matrix based on substantive considerations identified through a review of the current CDM blueprint. Specifically, the following five CFA models were examined: first, a three-factor 'location/setting' model; second, a three-factor 'lifespan period' model; third, a four-factor 'lifespan period' model; fourth, a four-factor 'clinical situation' model; and fifth, a five-factor 'discipline' model. Table 2 provides a breakdown of the 17 CDM cases as a function of these classifying variables. The three-factor 'location/setting' model posited the following factor structure: factor 1 (family physician office) loaded on CDM cases 1, 2, 3, 6, 7, 8, 9, 10, 12, and 13; factor 2 (general hospital) loaded on CDM cases 4 and 11; and factor 3 (emergency department) loaded on CDM cases 5, 14, 15, 16, and 17. The four-factor 'lifespan period' model posited the following factor structure: factor 1 (adult) loaded on CDM cases 1, 2, 3, 7, 15, 16, and 17; factor 2 (pediatrics) loaded on CDM cases 9, 10, and 13; factor 3 (adolescent) loaded on CDM cases 6, 8, 12, and 14; and factor 4 (pregnancy/neonatal/infant) loaded on CDM cases 4, 5, and 11. A three-factor modified version of the latter CFA model was also examined, in which factor 2 (pediatrics/pregnancy/neonatal/infant) loaded on CDM cases 4, 5, 9, 10, 11, and 13, based on exploratory correlational analyses. The remaining factor structure was identical to the four-factor 'lifespan period' model. The four-factor 'clinical situation' model posited the following factor structure: factor 1 (undifferentiated complaint) loaded on CDM cases 1, 7, and 10; factor 2 (single typical problem) loaded on CDM cases 2, 4, 8, 13, 15, and 16; factor 3 (preventive care and health promotion) loaded on CDM cases 6, 8, 12, and 14; and factor 4 (multiple problem or multi-system life-threatening event) loaded on CDM cases 3, 6, and 9. Finally, the five-factor 'discipline' model posited the following factor structure: factor 1 (medicine) loaded on CDM cases 1, 2, and 3; factor 2 (obstetrics/gynecology) loaded on CDM cases 4, 5, 6, and 7; factor 3 (pediatrics) loaded on CDM cases 8, 9, 10, 11, and 12; factor 4 (psychiatry) loaded on CDM cases 13 and 14; and factor 5 (surgery) loaded on CDM cases 15, 16, and 17. As was the case with the EFAs, a diagonal weight matrix-based estimation procedure was used in all CFAs. The fit of all models was assessed via the following statistics and indices: the chi-square test of model fit, the comparative fit index (CFI), the Tucker-Lewis index (TLI), and the root mean square error of approximation. Both the CFI and TLI evaluate the fit of a user-specified solution in relation to a more restricted nested baseline model, in which the covariance among all input indicators is fixed to zero or no relationship among the variables that are posited; in other words, the number of dependent variables is equal to the number of factors. The TLI additionally imposes a correction for over-parameterization. CFI and TLI values range from 0 to 1, though the TLI can exceed 1 with severe over-fitting, with values of 0.90 or above indicating acceptable fit [4]. It is important, however, to underscore that the relative fit of the five-factor models will be compared as opposed to the absolute fit of any given solution. Practically speaking, it is of greater interest to compare the relative fit of the five alternative models previously outlined rather than attempting to identify an 'optimal' configuration from a statistical point of view. Adopting this relative approach is also congruent with views espoused by several factor analysts who maintain that no restrictive model fits the population and that all restrictive models are merely approximations [5]. Consequently, our analyses were aimed at identifying the best-fitting model among those under study, all of which were posited based on substantive considerations, rather than attempting to accept or reject an a priori false hypothesis. Table 3 provides fit values for the five EFAs that were examined in this study. Based on these results, it appears that a threefactor EFA solution provided the best fit to the item response matrix, without over-fitting, which was clearly observed in the four-and five-factor models based on CFI and TLI values. The three-factor obliquely rotated factor loadings are provided in Table 4. Using a rough cutoff of 0.25 in order to better define the nature of the factor structure, it appears as though factor 1 could generally be described as reflecting 'biomedical/medicine' CDM cases. Factor 1 loads on three 'medicine' cases, two biomedically oriented 'obstetrics and gynecology' cases, and two 'orthopedic surgery' cases. Factor 2, which loads more heavily on CDM cases 8, 9, 10, and 14, appears to reflect a 'pediatrics' factor. Finally, factor 3 could be labeled as a 'psychiatry' factor, with heavier loadings on CDM cases 13 and 14. Finally, the correlations between the three factors were quite low, ranging from -0.04 (F1-F3) to 0.09 (F2-F3), suggesting that distinct competencies are required to perform well on each type of CDM case. Table 5 provides fit statistic values for the five CFA models that were examined in this study. Based on these results, it appears that both the 'lifespan period' and 'discipline' models provided the best fit amongst the five CFA models examined in this study. Factor loadings and inter-factor correlations for the four-factor 'lifespan period' CFA model are provided in Tables 6 and 7. Most prescribed loadings were statistically significant. However, some of the factors did not load on their assigned CDM cases. With regard to factor 1 (adult), CDM case 7 (infertility) was poorly associated with the domain. Similarly, factor 3 (adolescent) poorly loaded on CDM case 12 (lifethreatening asthma). Finally, factor 4 (pregnancy/neonatal/infant) poorly loaded on CDM case 5 (diabetic pregnancy). In regard to factor correlations, values ranged from -0.04 (between 'pediatrics' and 'pregnancy/neonatal/infant') to 0.92 (between 'adult' and 'pregnancy/neonatal/infant'). Factor loadings as well as inter-factor correlations for the five-factor 'discipline' CFA model are provided in Tables 8 and 9. Again, the vast majority of pre-specified loadings were statistically significant. However, as was the case with the previous model, some of the factors did not load on their prescribed CDM cases. With regard to factor 2 (obstetrics/gynecology), CDM cases 5 (diabetic pregnancy) and 7 (infertility) were poorly associated with the domain. Similarly, factor 3 (pediatrics) poorly loaded on CDM case 12 (life-threatening asthma). Finally, factor 4 (psychiatry) was heavily defined by CDM case 14 (threatened suicide). In regard to factor correlations, values ranged from 0.07 (between 'medicine' and 'psychiatry') to 0.91 (between 'medicine' and 'obstetrics/gynecology').

DISCUSSION
Assessing the underlying structure of any item response matrix is critical to both test development and psychometric efforts. From a test development standpoint, such analyses can provide substantiating evidence with respect to both blueprinting and test design activities. From a psychometric perspective, the use of advanced modeling techniques, such as item response theory, is predicated on a clear understanding of the data structure that is being analyzed. While common item response theory models assume unidimensionality of the underlying latent ability, research has shown that the underly-ing latent ability is robust against departures from this assumption, as long as the composite of proficiencies is comparable across test forms [6]. From a scoring standpoint, factor analysis might also inform how to best weight CDM cases in order to yield a composite score that most closely reflects the structure of the MCCQE1. Finally, from a score reporting perspective, a better understanding of the underlying structure of the CDM component of the MCCQE1 might also better support current feedback provision mechanisms.
Both the exploratory and confirmatory factor analyses examined in this study suggest that broad discipline domains best account for performance on CDM cases. While a 'lifespan