U TILISATION OF R ASCH MODEL FOR THE ANALYSIS OF AN INSTRUMENT DEVELOPED BY MAPPING ITEMS TO COGNITIVE LEVELS OF MARZANO TAXONOMY

: The scope this article was to develop an instrument to measure Chemistry students’ ability regarding ‘physical bonding’ and to validate it. A number of 24 items were developed by mapping items to cognitive levels described by the Marzano taxonomy. A number of N=73 students were evaluated. Four items exhibited a MNSQ >1.3 and were eliminated from the final data analysis. At the final data analysis, the item difficulty measures were in normal ranges, as well as item separation and item reliability values. Person separation and person reliability values were showing that the number of items must be increased, since the instrument may not be sensitive enough to differentiate between low and high performers. Nevertheless, it was proven that the utilisation of Marzano’s taxonomy in the development of items was successful in the sense that the items had difficulty measures in normal ranges and that the items had different levels of difficulty.


Introduction
Item Response Theory (IRT) is used for the validation of instruments and to provide data regarding the item difficulty and person ability. Many studies focused on the comparison between Classical Test Theory and Item Response Theory (for example, Hambleton & Jones, 1993;Wiberg, 2004).
The two most used IRT models are the 1-and 2-parameter logistic (1PL & 2 PL). The probability for student i to respond correct at question / item j is calculated after the following equation (Equation 1):

Equation 1. Probability for student i to respond correct to to item j -The 1PL Model
Where is student's ability and bj is the difficulty of task j (Rasch, 1960, Boone et al, 2014. High values for indicate a high ability level and high values for b indicate difficult items. The values of item difficulty or person ability are normally in the range [-3, 3] logit. Values outside this range show that there is a problem with the measured items. Information regarding how well a particular item discriminates between students with different abilities can be also obtained. The discrimination parameter, aj, could be added, forming the 2PL model (Equation 2): where and bj have the same meaning like in the 1PL model, and aj represents discrimination of item j. The values for aj are normally in the range [0, 2], and the higher values indicate items which differentiate better students' ability (Sudol & Struder, 2010). Altrough the 1PL model is sometimes associated with the Rasch model, there are differences between the Rasch model and the 1PL model (Linacre, 2005).
The Rasch model was elaborated by the Danish mathematician George Rasch in 1960. This model was developed in order to overcome the problems which appear when using classical test theory in analysis of instruments (Boone, 2016, Jackson et al, 2002. More exactly, because the items have different difficulty levels, the raw scores obtained by summing up the correct answers can not be used to compare students 'ability. Furthermore, Rasch technique can be used to transform the non-linear raw data in linear scales, which can then be evaluated by the utilization of statistical parametric tests.
Some of the parameters considered when analyzing an instrument using the Rasch model are: item fit, item separation and reliability, person separation and reliability, Wright map, discrimination (Linacre & Wright, 2000). When the number of participants in the study is large enough, a small value for separation (<3) when the value for reliability is also small (<0.8) shows that the instrument is not sensitive enough to differentiate between students with different abilities. In this case more items may be needed. (Linacre & Wright, 2000). Item separation verifies items hierarchy. A small value for items separation (<3) and reliability <0.9 implies that the number of study participants is not large enough to confirm items' difficulty hierarchy (Linacre & Wright, 2000

Scope of this study
The scope of this study was to develop an instrument to measure Chemistry students' ability regarding 'physical bonding' and to validate it.

Design of the study
A number of N=73 Chemistry and Chemistry Engineering students participated at this study: 29 students (40%) in the third year of study and 45 students (60%) in the second year of study. 83.6% of participants were female, 16.4% were male.
A number of 24 items were developed by using the different cognitive levels described by Marzano taxonomy. The ratio between items with low difficulty, items with medium difficulty and items with high difficulty was 1:1:1. Examples of items are presented in Annex.
Data was analyzed with Winsteps version four. Information regarding interpretation of Winsteps outputs could be found at http://www.winsteps.com/index.htm.

Results and Discussion
Data analysis was started with item misfit analysis. It is recommended that items with MNSQ value > 1.3, as these items may induce errors in measuring. Four items exhibited values >1.3 for Outfit MNSQ ( Figure 1) and were eliminated. A number of 13 participants in this study exhibited MNSQ >1.3. This shows that there are some issues with these participants; however, they could not be eliminated from the study. Final data analysis was undertaken with 20 items and 73 people.

Item difficulty and people ability
The measures for item difficulty were in the range [-1.83, 1.98] logit, M=0.00, SD=1.25. These values are in the [-3, 3] normal range. The measures for people ability were in the range [-1.47, 4.85] logit, M=0.83, SD=1.38. The measure for the ability of three persons was 4.85 logit. The rest of values were < 3. Hence, it can be considered that those three people whose ability measure was 4.85 logit had a higher ability level than the difficulty level of the tested items. The Wright map in which items ability and persons ability are presented on the same logit scale in presented in Figure 2. In Table 2 are depicted the values for measured difficulty of items by comparison with difficulty levels envisaged by Marzano taxonomy. As it can be observed, there is not a perfect alignment between the estimated levels of items and the measured values (for example, it was envisaged that item 12 has a difficulty level 4 after Marzano taxonomy, and the measured value was -1.41 logit, when the range of measures was [-1.83, 1.89]). However, utilisation of Marzano taxonomy enabled us to develop items of different difficulty levels and with measures in normal ranges.

Separation and reliability
The values for item separation and reliability (Item separation: 3.80, item reliability: 0.94, Figure 3) show that the number of persons who participated to this study was large enough to confirm the hierarchy of items with regard to their difficulty level.
The values for person separation and reliability (for 70 non-extreme persons: separation: 1.55, reliability: 0.71; for 73 extreme and non-extreme persons: separation: 1.68, reliability: 0.74, Figure 3) show that the instrument is not sensitive enough to differentiate between students with different ability levels.

Conclusion
The item difficulties are in normal ranges, as well as item separation and item reliability values. It was proven that the instrument containing items developed by incorporating the different cognitive levels of Marzano taxonomy into items is an instrument containing items exhibiting difficulty measures in normal ranges. Furthermore, the developed items have different levels of difficulty. However, person separation and person reliability values are showing that the number of items must be increased, since the instrument is not sensitive enough to differentiate between low and high performers. The goal for a further study is to increase the number of items with medium difficulty, in order to have the following ratio of item difficulty: items with low difficulty: items with medium difficulty: items with high difficulty: 25%:50%:25%, and the final instrument to be tested and the results analyzed.