Estimation of item parameters and examinees’ mastery probability in each domain of the Korean Medical Licensing Examination using a deterministic inputs, noisy “and” gate (DINA) model

Purpose The deterministic inputs, noisy “and” gate (DINA) model is a promising statistical method for providing useful diagnostic information about students’ level of achievement, as educators often want to receive diagnostic information on how examinees did on each content strand, which is referred to as a diagnostic profile. The purpose of this paper was to classify examinees of the Korean Medical Licensing Examination (KMLE) in different content domains using the DINA model. Methods This paper analyzed data from the KMLE, with 360 items and 3,259 examinees. An application study was conducted to estimate examinees’ parameters and item characteristics. The guessing and slipping parameters of each item were estimated, and statistical analysis was conducted using the DINA model. Results The output table shows examples of some items that can be used to check item quality. The probabilities of mastery of each content domain were also estimated, indicating the mastery profile of each examinee. The classification accuracy and consistency for 8 content domains ranged from 0.849 to 0.972 and from 0.839 to 0.994, respectively. As a result, the classification reliability of the diagnostic classification model was very high for the 8 content domains of the KMLE. Conclusion This mastery profile can provide useful diagnostic information for each examinee in terms of each content domain of the KMLE. Individual mastery profiles allow educators and examinees to understand which domain(s) should be improved in order to master all domains in the KMLE. In addition, all items showed reasonable results in terms of item parameters.

trait psychology perspective have been reliable methods for evaluating the general state of students' knowledge, skills, and abilities when the purpose of measurement is to compare students' abilities and to select students who have developed mastery in the context of a licensing examination. However, overall scores of this type do not offer sufficient useful information for the purposes of (1) measuring multidimensional contextual knowledge, skills, and abilities; (2) measuring complicated tasks reflecting complex knowledge, skills, and abilities; (3) understanding distinguishable systematic patterns associated with different characteristics of groups; and (4) providing diagnostic information connected with the curriculum and instruction. For these purposes, it is necessary to obtain more information from assessment results through various measurement models.
More specifically, the main purpose of a large-scale assessment is to compare students' achievement levels and to make pass/fail decisions based on general proficiency. This is usually done by students' overall scores, which are used to assign students to specific performance levels. However, this information has very little instructional usefulness in terms of what should be done to improve individual students' levels of achievement. That is, overall test scores from large-scale assessments offer relatively little diagnostic information about a student's strengths and weaknesses [1]. Diagnostic information is more informative for students, instructors, and assessment/instruction developers from the perspective of learning and improving the quality of assessments and the curriculum [2]. In light of these issues, diagnostic classification models (DCMs) have been proposed as psychometric models.
DCMs are statistical models that were originally developed to classify students in terms of their mastery status for each attribute [3,4]. DCMs contain multiple attributes, which refer to latent aspects of knowledge, skills, and abilities that are supposed to be measured in an assessment. Students' mastery status for the attributes of interest are estimated based on their observed response patterns. A composite of a student's mastery statuses for the attributes is referred to as an attribute profile. Therefore, the attribute profile is a pattern used for providing diagnostic feedback. Several DCMs have been proposed, such as deterministic inputs, noisy "and" gate (DINA), deterministic inputs, noisy "or" gate (DINO), and the re-parameterized unified model. These models differ depending on the variables of interest and the condensation rules that are used for modeling attributes; however, a central concept of modeling is linking the diagnostic classification with cognitive psychological findings [4]. Since multiple attributes are involved and tasks can depend on more than one attribute, their relationships are represented by a complex loading structure, often called a Q matrix [4]. A Q matrix contains the targeted attributes and specification of which attributes are measured by which task(s) based on substantive theoretical input (e.g., a domain specialist for the relevant examination). To construct a Q matrix, many sources may be used, such as subject matter experts' opinions, cognitive developmental theory, learning science, and learning objectives in the curriculum.
Educators often want diagnostic information from assessments, and in particular, educators in the health professions often want to provide feedback to a given student based on how he or she does on each content strand. However, most assessments are developed to provide only a single total score [3]. Most score reports in the health professions provide a total score or pass-fail decisions based on classical test theory. DCMs are psychometric models that characterize examinees' responses to test items through the use of categorical latent variables that represent their knowledge [5]. Thus, DCMs have become a popular area of psychometric research. However, few application studies using DCMs with health professions data have been reported [6]. The purpose of this study was to conduct a DINA analysis using Korea Health Professional Licensing Examination data in order to provide diagnostic information about each content domain in this licensing examination.

Objectives
The purpose of this study was to conduct a DINA analysis using Korean Medical Licensing Examination (KMLE) data in order to provide diagnostic information about each content domain in this licensing examination. Specifically, the guessing and slipping parameters of items and the mastery probabilities of examinees in each domain were estimated. The ultimate objective of this study was to evaluate the classification reliability of mastery in 8 content domains by using DINA and investigate the application of DCM in KMLE.

Ethics statement
The raw data file was obtained from the Korea Health Professional Licensing Examination Institute for research purposes. The open data source does not contain any identification or personal information about the examinees. Therefore, informed consent and the requirement for institutional review board approval were exempted according to the Enforcement Rule of the Bioethics and Safety Act under Korean law.

Study design
This was a diagnostic analysis of high-stakes exam results based on a DCM for identifying item parameters and examinees' mastery status.

Data sources/measurement:
Data from the KMLE in 2017, including 360 items and 3,259 examinees, were analyzed in this study. The data are available from https://doi.org/10.7910/DVN/PETWZF (Dataset 1) [7]. The 8 content domains of the KMLE are described in Table 1.

Q matrix
For a DCM analysis, it is necessary to create a Q matrix, which provides information about the relationship between items and content domains. Since this study analyzed 360 items, Table 2 shows the Q matrix with a few examples of item information. The entire Q matrix is available in Dataset 2.

Statistical methods
The following DINA model was used in this study: The parameter α ik refers to classification k in latent class j, which is either 1 or 0 for k. The parameter q ik refers to the entry in row i, column k of the Q matrix, an attributes-to-items mapping with dimensions I × K for which individual entries take values according to The general model for DCMs is as follows [5]: where x ij denotes the response of examine j to item i (where i = 1,…,i), with 1 and 0 representing a correct or incorrect response, respectively, and g i and s i representing the guessing and slipping parameters for the item i, respectively. Additionally, η ij is a binary indicator related to the Q matrix, and the following equation indicates whether examinee j has mastered all attributes required by item i:   where i and j denote the student and task, respectively; λ 0j is an intercept and λ j represents a vector of the coefficient indicating the effects of attribute mastery on the response probability for item j; and h(α i , q j ) is a set of linear combinations of α i and q j . The intercept can be interpreted as the guessing parameter, the λ ju parameters represent the main effects of each attribute u on the response probability for item j, and the λ juv parameters represent the 2-way interaction effects of the combination of the mastery status for attributes u and v on the response probability for item j. We used the CDM R package (The R Foundation for Statistical Computing, Vienna, Austria; https://cran.r-project.org/) to implement the DINA model [9]. The main code to implement the DINA model is as follows: install.packages("CDM") library(CDM) dinadat < -read.table( "data2017.csv", header = F,sep = ",") qmatrix < -read.table( "qmatrix.csv", header = F, sep = "," ) d1 < -CDM::din( dinadat, q.matrix = qmatrix) summary(d1)

Model fit
For the DINA model in this dataset, the Akaike information criterion and Bayesian information criterion were 998,956 and 1,004,893, respectively. The mean root mean square error of approximation for item fit was 0.045, indicating that this model fit the data well [10].

Estimation of the guessing and slipping parameters of items
Some examples of the guessing and slipping parameters are shown in Table 3. (All guessing and slipping parameters are available in Dataset 3.) The guessing parameter indicates the probability of a correct response to an item that a respondent should answer incorrectly. In this context, the idea that the respondent should answer the item incorrectly means that the respondent has non-mastery of at least 1 required content domain. The slipping parameter represents the probability of an incorrect response to an item that a respondent should answer correctly because the respondent has mastery of all required content domains. For example, the probability of a correct response to item 1 using guessing was 0.008, while the probability of an incorrect response to item 1 for high-ability students was 0.984. Thus, item 1 showed high discrimination. In contrast, for item 2, the guessing parameter was 0.962 and the slipping parameters was 0.016. Thus, item 2 showed very low discrimination. The average values of the guessing and slipping parameters in this exam were 0.647 and 0.228, respectively.

Estimation of mastery probabilities of the examinees in each domain
This study estimated the mastery probability of each content domain for all examinees. Table 4 shows the probability of having mastery in each content domain for 3,259 examinees and the average probability of having mastery in each content domain. The full dataset on the estimation of mastery probabilities is available in Dataset 4. For example, the probabilities of having mastery in each content domain for examinee 2939 were 0.999, 0.999, 0.995,0 .899, 1, 0.965, 0.995, and 0.860, respectively, which means that this student had high probabilities of mastery in all content domains. In contrast, the corresponding probabilities for examinee 509 were 0.112, 0.135, 0.007, 0.053, 0.004, 0.002, 0.166, and  0.008, meaning that student 509 had low probabilities of mastery for all content domains. Thus, the mastery information from the DINA model provides precise predictions and diagnostic information for each content domain.

Classification reliability of the diagnostic classification model
The classification accuracy and consistency of the DCM are described in Table 5. The item parameters and the probability distribution of latent classes using a simulation were compared with the classification of the real data. The classification accuracy and consistency of mastery were estimated using maximum a posterior classification estimator [11]. Classification accuracy and consistency were estimated for the separate classification of 8 content domains. Table 5 showed that classification accuracy of mastery for the 8 content domains ranged from 0.849 to 0.972 and the classification consistency ranged from 0.839 to 0.994. As a result, the classification reliability of the DCM was very high for the 8 content domains of the KMLE.

Key results
DCMs have become a popular area of psychometric research in educational testing. DCMs are psychometric models that classify students based on categorical latent variables related to the purpose of the assessment. This psychometric model is a promising means for students, instructors, and test developers to obtain more detailed diagnostic information [5]. This study conducted an analysis using the DINA model, which is one of the main types of DCMs, using response data from the KMLE. The output showed the guessing and slipping parameters, enabling the quality of items to be checked. In addition, the examinees' mastery probabilities for each content domain were estimated. The mastery profile provides individual information on which domain(s) should be improved in order to master all domains in the KMLE. Therefore, educators can identify specific content domains for each examinee to improve their weaknesses in content domains. Finally, this study demonstrated that the subscore classification accuracy for mastery was very high, and subscores were reported consistently for all content domains, which means that the classification accuracy was very reliable based on the outcomes of the DCM.

Interpretation
By investigating the guessing and slipping parameters for each item, test developers can check whether items are appropriate to measure examinees' mastery of each content domain. Through the probabilities of mastery for all content domains, the DINA model provides precise predictions of diagnostic information in terms of each content domain for all examinees. In addition, the high classification accuracy and consistency estimated from the DINA model demonstrated that the KMLE has a high classification reliability for mastery of the 8 content domains.

Limitation
The DINA was applied to a simple structural model, in which an item belongs to 1 content domain, whereas a complex structure model would allow each item to be assigned to multiple content domains and therefore have item loading to more than one content domain. The DINA model can also be applied to generate estimations for a complex structural model. Compensation and non-compensation rules that can take into account the complexity of relationships between items and content domains have been proposed for DCMs. Since limited content information is available about the KMLE, this study only applied a simple structural model. Therefore, a more complex structural model (i.e., with possible item loading to multiple domains) using DCMs would be a topic for future research. This study used only 1 type of DCM (the DINA model), and further research could examine several types of DCMs for medical education assessment.

Conclusion
Despite the limitations of the current study, the mastery information in terms of subscores for each domain can be used for remediation to improve a student's achievement. Such finegrained information would be useful for competency-based education and formative purposes in professional activities. In addition, providing examinees detailed feedback from a DCM analysis can contribute to health professions education by identifying areas of weakness for improvement and enhancing students' learning.