Determining the Qualifications of the Secondary School Students in the Field of Quadrilaterals: A Scale Development Study

Testing-assessment is needed to determine whether the education process is successful or at what level. For this reason, it is of great importance to determine the achievement level of the students at the secondary level. The purpose of this study is to develop an achievement test consisting of multiple choice questions about ‘quadrilateral’ subject of secondary school students by using alternative assessment techniques. The study group of this research, which is a descriptive survey type, consists of 372 secondary school students who are in the seventh grade in Aydın and Muğla during the 2018-2019 academic year. In the study, ITEMAN, SPSS and JMETRIK programs were used based on Classical Test Theory and Material Testing Theory. The formed item pool was subjected to the appearance validity by five trainers, who are experts in that field. According to the experts’ opinions, it was decided that for the achievement test, 28 items were to be included in the test primarily. It was decided as a result of analysis of the data obtained from the students that an item with very low item-specificity index, four items that do not provide model data compliance in Rasch analysis and two items identified to have item bias may be excluded from the test were excluded from the test. After these processes, the final testing tool consisted of 21 items and the average difficulty index of the test was .64; the discrimination index was calculated as .40. The Cronbach Alpha internal consistency coefficient of the final state was determined as .81. According to these results, it can be said that the achievement test has sufficient validity and reliability.


INTRODUCITON
Geometry, a branch of mathematics, is a field of study that enables individuals to establish reasoning, problem solving, critical thinking and cause-effect relationship and to develop a high level of thinking skills as well as including many shapes and objects of knowledge, which also facilitates the understanding of the earth, direction and figure of the living world. Geometry has been included in the curriculum and educational programs since primary education because of the fact that it contributes to the students' critical thinking and problem solving skills, assists in teaching other subjects of mathematics, is an important part of mathematics used in daily life, is used in science and art, as well as helps students to better understand the world in which they live (Baykul, 2002).
Geometry teaching, as many researchers (Baykul, 1999;Duatepe, 2000;Fujita and Jones, 2007) emphasized, has been important not only to let the learners to comprehend the knowledge and relationships related to point, line, plane, planar shapes, space and spatial shapes, but also in terms of the development of spatial thinking and visual skills. Besides, it is stated that understanding the classification and properties of geometric shapes in geometry teaching contributes to the solution of problems related to real life and other fields of mathematics (testing, algebra and rational numbers) (NCTM, 2000). In this contex, in the mathematics curriculum of primary schools in Turkey, while recognition, naming, constructing, drawing, comparing and grouping activities of geometric objects and shapes are prioritized according to certain characteristics (MEB, 2013a); in the middle school mathematics curriculum, these figures are characterized by a small number of characteristic features and classifications (rectangles are rectangular parallellines etc.) (MEB, 2013b). Thus, while students are expected to recognize and understand the geometric shapes and properties in elementary mathematics curriculum, in secondary education mathematics curriculum, it is expected that they will form the relationships between these figures and classify the figures according to certain characteristics. However, students experience problems in understanding and classifying geometric concepts.
The memorization of the features of the forms, the inadequate sample presentation (eg, giving the typical image only) cause them to create limited structures related to geometric concepts and therefore not to understand the concept. On the other hand, the hierarchical classification of quadrilaterals is seen as a field of study to promote the development of geometric thinking (Fujita and Jones, 2007). For example, the parallelogram is defined as a quadrilateral with opposite sides of the parallel. Since the opposite sides of the rhombus, square and rectangle are parallel to each other, these rectangles are parallel edges. Therefore, if a property is correct for the parallelogram, it is also true for rhombus, square, and rectangle. However, the results of many studies have shown that students experience some problems in the hierarchical order of the quadrilaterals (Monaghan, 2000;Toluk, Olkun and Durmus, 2002;Olkun and Aydogdu, 2003;Aktas, 2005;Erez and Yerushalmy, 2006;Pickreign, 2007;Fujita and Jones, 2007;Akuysal, 2007;Ergün, 2010;Aktaş and Aktaş, 2011;Türnüklü, Alaylı and Akkaş, 2013). In a study conducted by Olkun and Aydoğdu (2003), it was determined that students see geometric shapes only separately and independently from each other. Similarly, Okazaki and Fujita (2007) found that many of the students had difficulties in perceiving the square as a special state of rectangular and rhombus, but were more successful in perceiving the rhombus as parallelogram. Again, it is stated that most students experience difficulties in this issue with thoughts such as the square is not a parallelogram because the parallelogram appears oblique (Erez and Yerushalmy, 2006;Okazaki and Fujita, 2007). Fujita and Jones (2007) state that students' opinions about whether a rhombus is a special parallelogram, for example, are simply not enough to control their images, and that the features of the figure should be mutually correlated. In fact, in order to establish these relations, students should decide not by taking the shape of the given quadrilateral into account, but by using the properties of this quadrilateral (edge, angle, etc.). This process is considered to be a useful activity in the development of geometric thought (Fujita, 2008;Fujita, 2012).
Defining concepts in geometry is important for teaching. Although it is not approved by the researchers (De Villers, 1998;Tall and Vinner, 1981) that the definitions of the concepts are directly approved, according to Türnüklü, Alaylı and Akkaş (2013), it is also accepted that the definitions of concepts have important roles in the formation of concept image and problem solving situations.
According to Tall and Vinner (1981), students are able to filter out these definitions in their minds either the definitions of concepts are taught through direct narration, or the students are enabled to structure these definitions. These personally structured definitions may differ from formal definitions and these personal concept definitions may cause individuals to create their own concept images. According to Türnüklü, Alaylı and Akkaş kavrama (2013), the visual image contained by each geometric concept can be more prominent than the concept. In this context, typical (prototype) samples are the key factor. Each concept can have multiple prototypes. These prototypes are examples of some of the features included in the long feature list of the concept. These prototype shapes always have an effect on concept image (Fischbein, 1993;Hershkowitz, 1990). Fujita (2012), as a result of many researches, the definition of geometric shape and the family relationship in this figure has revealed that the properties can often lead to a contradiction. This contradiction leads to false perceptions and generalizations through conceptual perception from prototype form (Fujita, 2012;Fujita and Jones, 2006;Hershkowitz, 1990). For example, the definition of the parallelogram and the shape of the prototype are contradictory to the rectangle from the same family. It leads to false generalizations. In the case in this example, the perception that the parallelogram cannot be perpendicular develops (Türnüklü, Alaylı and Akkaş, 2013).
As it can be understood from the literature, the knowledge that students learn in geometry is important for students to acquire reasoning, problem solving and critical thinking skills while they are creating solutions for many problems they will encounter in daily life. In this study, it was aimed to develop a quadratic hierarchy test.

METHOD
This research can be accepted as a basic research because it is to develop a scale to reveal the knowledge of the 7th grade students on quadrilaterals and their ability to make inferences based on this information. Before the decision to develop an achievement test for this study, existing tests were examined. However, in the existing tests, even though there are questions that address the hierarchy of quadrilateral under the title of Quadrangles and Quadrilateral, it was determined that there were only certain special quadrilaterals' relationship such as square with rectangular or square with rhombus. In this study, since it is aimed to determine to what extent the students can see the relationships between all quadrilaterals, the test includes the items in which all quadrilaterals are related to each other.

Process
The test development process consists of preparation, implementation and reporting stages, and various steps are followed at this stages when the test development studies in educational researches are examined (Caliskan & Kaptan, 2009). In fact, in this research, the steps of writing test items, piloting, validity, reliability and item analysis, which were also used by Burns et al. (1985) and Karslı and Ayas (2013) for test development studies, were followed and applied respectively. In this study, the following process was followed while developing the test.

1.
Determination of the purpose of the test: In the literature search, no test development studies have been observed on the hierarchy of quadrilaterals. Therefore, in order to determine the knowledge of the students about the definitions of the quadrilaterals, to determine the special cases of the quadrilaterals and to classify the quadrilateral hierarchical, developing an achievement test was aimed.

2.
Determining the subject: While the studies were examined, although there are studies conducted with the teacher candidates or students about the quadrilaterals, since there was no test development study for the hierarchy of quadrilaterals, the subject of the test was determined as rectangles.

3.
Determination of the properties that can be measured by the test: A table of statements containing the related learning outcomes and sub-learning outcomes were prepared and the items appropriate to the steps of Bloom Taxonomy were prepared.

4.
Writing the items of the test: In order to determine how many items according to the determined learning outcomes and sub-learning outcomes will be written, the opinions of the mathematics teachers who are working in the primary school and field experts were taken.

5.
As a result of the expert opinions, the 33-item test consisting of 9 items from the recall step, 17 from the comprehension step and 7 from the application, was formed.

6.
Spelling and obtaining expert opinion: For the purpose of assessing the correctness and scientific accuracy of the prepared articles, the test was submitted to the opinion of the five trainers. In line with expert opinions, necessary corrections were carried out for deficiencies, errors and weakening of the scope validity, and 5 items have been removed by the experts and the total number of items has been decreased to 28.

7.
Implementation of the test: The test was administered to 372 students from a private school and four state schools in Aydın and Muğla, determined by random sampling method.

8.
Item analysis: The distinction and difficulty levels of items constituting the test were calculated using the following formulas through the scores of the upper group of students who gave the most accurate answers to the test and through the scores of the lower group formed from the students who gave the least correct answers.

Study Group
Within the scope of the aim of the researcher, it was aimed to reach 10 times more than the total number of items in the item pool within the provinces where the researchers are in the 2017-2018 academic year. For this reason, 372 students who participated in the study voluntarily from the randomly selected schools in the mentioned provinces formed the study group of the research. The schools and distribution of students in the study group are given in Table 1.

Data Collection Tools
In the process of developing data collection tools, firstly a literature search on the subject of research has been done and studies which include students' knowledge about quadrilaterals are examined. Among the existing studies on quadrilaterals, it was observed that there is no test development study on the hierarchy of rectangles. Therefore, developing an achievement test was intended in order to determine the knowledge of students of 7th grade on rectangle definitions, to determine the special cases of the quadrilaterals and to classify the quadrilateral hierarchical information.
During the process of preparing questions in the Quadrilateral Test, a related learning outcome in the area of Geometry and Measurement sub-learning area in the 7th Grade in the Curriculum of Mathematics Course in the Ministry of National Education was taken into consideration. It was thought that this learning outcome involves more than one behavior, and sub-gains were determined in order to measure a single behavior for some questions. For the main learning outcomes or suboutcomes, it was tried to determine how many questions will be written for the recall, understanding, practice, analysis, evaluation and synthesis steps, which consists the cognitive step of Bloom Taxonomy.
Taking into account the learning outcomes and sub-learning outcomes, the number of questions to be written at the level of remembering, comprehension and application appropriate to 7th grade level was decided and 33 questions were written and a table was prepared. In the first instance of the test, there are 9 questions in the recall step, 17 in the comprehension step and 7 in the application step. In the test development studies in the literature, it is stated that the question root of the questions, the distractors in the question, the questions' coverage of the learning outcomes, the behavior measured in the question and the adaptation of the behavior to be measured in the question should be consulted to the field experts (Webb, 1997). The test was submitted to the opinion of five educators in order to evaluate the correctness and scientific accuracy of the prepared items. In line with the expert opinions, the number of questions was reduced to 28 by making necessary corrections for deficiencies, errors and conditions that weaken the content validity. Following the completion of the necessary arrangements, the test was applied to 372 students who had completed their learning outcomes in quadrilaterals.

Data Analysis
After the application of the test within the scope of reliability analysis, the data were analyzed according to two different measurement theories. Analysis based on the Classical Test Theory was evaluated using Iteman and SPSS program; reliability analysis in terms of internal consistency was evaluated according to the results obtained from Cronbach Alpha and Dot Dual Series Correlation Analysis. The analysis carried out within the context of Item Response Theory were carried out with the help of Rasch analysis which is a special case of a parameter logistics model with jMetric program. According to the criteria proposed by Linacre (2002), the items to be included in the test and the items to be removed were determined with the help of unweighted and standardized compliance statistics. Within the scope of validity analysis, expert opinions were asked and they were asked to express their opinions about whether the scale developed from experts according to the table of specifications has got content validity. In this study, the scattering graph obtained from the Iteman program shows to which outcome each item belongs on the figure. In the evaluation of the results obtained from the Rasch analysis, the criteria determined for the unweighted and standardized compliance statistics defined by Linacre (2002) will be used. Acceptable values for the criteria to be used in the study are shown in Table 2. The item is insufficient, but not too pessimistic for the measurement process. 0.5 -1.5 The item very suitable for measuring. * < .50 The item is not enough for the measurement process but not too bad. Compliance Criteria (Std.WMS and Std.UMS) ≥ 3 The data do not correspond to the model. 2.0 -2.9 Data cannot be predicted significantly. -1.9 -1.9 The data are reasonably predictable.* ≤ -2.0 The data can be estimated at very low levels.
• Shows the ideal ranges.

FINDINGS
Data from 372 students were first analyzed with Iteman program. The scatter plot of item discrimination indexes and item difficulty values for each of the 28 items in the test is shown in Figure  1.

Figure 1. Scattering Graph of the Items in the Test
In Figure 1, the relationship between the points of the graphical items and the total score obtained from the whole of the test was formed according to the item discrimination index. While the item number is shown on itself, it was stated in the red numbering that the items belong to which learning outcomes or sub-learning outcomes. Whether these graphical items can be used shows which items will have the same or similar difficulty and discrimination while they measure the same outcome. This graph is used to determine the items in the test. When the graph is examined, it can be seen that the most difficult questions are item 10, item 20 and item 27 respectively; the easiest questions are item 2, item 12 and item 3. Since the test is intended to be composed of items with different difficulty and distinctiveness indices, no item was removed at this stage. The difficulty and discrimination values of the items in the 28-item test are shown in Table 3.  When Table 3 is examined, it is seen that the difficulty values of the items in the test vary between .15 and .94 and accordingly, there are questions at all levels in the test. When the item discrimination index is examined, it is determined that the discriminant value is below the critical value (.20) only for item 20. It can be seen that the average difficulty index of the item analysis results was 0.60; and the discrimination index is calculated as 0.45. The KR21 coefficient determined as the reliability value of the test was found to be 0.82. In addition, the lowest score obtained from the whole test was 4.00 and the highest score was 28.00; it was also found that the coefficient of skewness for the whole test was -0.03 and the coefficient of curtosis was -0.86. According to the skewness and kurtosis coefficients, the scores obtained from the test showed normal distribution (Baykul and Güzeller, 2014). Besides, it was determined that the mean score of the test was 16.72 and its standard deviation was 5.68. Item statistics for the whole test are shown in Table 4. According to the skewness and kurtosis coefficients, the scores obtained from the test showed normal distribution (Baykul and Güzeller, 2014). Besides, it was determined that the mean score of the test was 16.72 and its standard deviation was 5.68. Material statistics for the whole test are shown in Table 4. When Table 4 is examined, it is determined that the reliability coefficient obtained according to three different methods of 28-item test varies between .80 and .85 and therefore the results obtained from the test are reliable (Cronbach, 2004). In order to determine the degree to which the results obtained according to the classical test theory are consistent with the item response theory, the item statistics obtained with the jMetric program are shown in Table 5. When Table 5 is examined, it can be seen that item difficulty values varied between 0.15 and 0.94 while discriminant indexes were found to vary between 0.04 and 0.55. According to this result, it was decided to subtract from the measurement tool because the discriminant index of item 20 in the test was too low and it did not have item validity. In addition, the reliability coefficients of the 27 items included in the measurement tool with different reliability methods for each item are shown in Table 6. When Table 6 is examined, it is seen that 27 items in the measurement instrument have a reliability coefficient of over .80 which is considered as critical with 5 different reliability determination methods (Cronbach, 1951). According to this result, it was determined that the items in the measurement instrument met the assumption of reliability in terms of internal consistency. The reliability coefficients calculated for the whole scale and the 95% confidence interval and standard error values for this value are shown in Table 7. When the Table 7 is examined, it is determined that the reliability coefficients determined by different methods for the whole scale consisting of 27 items vary between .856 and .859. According to this result, the results obtained from the measuring instrument are considered to be reliable. With the thought that it will not be sufficient to look only at the item difficulty and discrimination indices in the determination of the items in the scale, compatibility indices with transformed discriminant indices related to items were calculated by Rasch analysis from 1 parameter logistic models. In the Rasch analysis, the item statistics obtained according to the parameters determined as maximum number of iterations 150, convergence criterion 0.005 and endpoints criterion 0.3 are shown in Table 8. Unweighted Mean Square (UMS) and Weighted Mean Square (WMS) fit, which are shown in Table 8, are compliance statistics for fit statistics. From these values, WMS is accepted as the in fit criteria and UMS as the outfit criteria. When Table 8 is examined, it is seen that the scale consisting of 27 items is suitable for the measurement process according to WMS and UMS values. In other words, the items in the test were found to be in the ideal range in terms of non-compliance measures. In addition, according to standardized UMS and WMS values, item 10, item 11 and item 23 do not meet the compliance criteria, in other words, it was determined that the items do not provide the model data compliance. According to this result, it was decided that 4 items in the test should be removed. Accordingly, for the items in the Quadrature test consisting of a total of 23 items, the bias analysis which is another method of determining the validity of the items in the scale was started. DIF analysis was performed to determine whether the items are biased. In the comparison of the bias of the items in the scale, the type of school where the students studied was determinant. As a result of the determination of the students from private schools as the focus group and public school students of as reference group, the results obtained by using the Mantel-Haenszel method on the basis of common odds ratio are shown in Table 9.  Table 9 is examined, it was determined that all of the items other than item21 and item28 are negligible and have got insignificant level of item function. However, the item characteristic curves (Item Characteristic Curve) were used to determine whether the two items identified as high (C + or C-) biased were working in favor of a particular group of students from public and private schools or against the other group. The material characteristic curves of the items determined to show significant DMF are shown in Figure 2.  Figure 2 was examined, it was determined that when the total scores taken are in the range of 2.50-20.00, item 21 is in favor of the public school students (C-) determined as the reference group. On the other hand, when the total scores taken from the test were within the range of 5,00-25,00, it was determined that item 28 works in favor of the private school students determined as the focus group (C +). According to this result, it was decided that both of the items in the test should be removed from the test (Koyuncu, Aksu and Kelecioğlu, 2018). After this process, the final version of the measurement tool was determined to include a total of 21 items. The test statistics obtained for the last version of the scale are shown in Table 10. When Table 10 is examined, it was determined that the reliability coefficient obtained according to the three different methods of the test item consisting of 21 items varied between .81 and .84 and therefore the results obtained from the test were reliable (Cronbach, 2004). When the results were examined as a whole, it was decided that the results obtained from a total of 21 items were valid and reliable.

DISCUSSION, CONCLUSION AND RECOMMENDATIONS
The purpose of the study is to develop an achievement test consisting of multiple choice questions which are valid and reliable in accordance with the learning outcomes of the quadrilaterals in order to determine the knowledge of the secondary school students about the definitions of the quadrilaterals, to determine the special cases of the quadrilaterals and to classify the quadrilaterals hierarchically. As it is known, multiple choice tests, which are used to measure student success, provide the opportunity to ask a large number of questions, and are among the most frequently used measurement tools that are used today to poll all of the subjects learned in the courses in a short time (Kempa, 1986;Ogan Bekiroğlu, 2004). Multiple-choice tests provide with information about the errors students have and offers a wide range of questions about a subject or unit and the ability to measure and evaluate all knowledge. Besides, multiple choice tests are the most preferred measurement tool in measuring the success of students among the other measurement and evaluation tools in education. Achievement tests are passed through the standardized stages and tests which ensure the reliability and validity are provided (Narlı & Başer, 2008).
As a result of analysis, it is understood that there are a certain amount of both very difficult and very easy items in the test used in the study. The average value obtained shows that the test is not too difficult and has got medium difficulty.
The value of item discrimination, which determines the extent to which the items of the test are measured, is between +1 and -1. As this value approaches + 1, it is evaluated that the item measures the feature that the test aims to measure better, and that the closer to 0, the more the item is inadequate in measuring the property the test aims to measure. If the value of discrimination is minus, it is thought that the item measures another characteristic than the purpose (Kan, 2011). It was determined that the test used in this study consisted of 21 items and the average difficulty index of the test was .64; the discrimination index was .40. This result demonstrates that the discriminant of the test is sufficient. The Cronbach Alpha value, which was used to determine the reliability of the test prepared for quadrangles, was calculated as .82.
The validity and reliability analysis of the developed test showed that the test was a valid and reliable test. Since difficulty and discrimination indices are at desired levels showed that the test can be used as an achievement test. Based on the findings of this study, the following recommendations were made: • This achievement test can be used by researchers who carry out relevant studies in order to determine the target and behavioral learning outcomes of the students.
• Although it is determined that the achievement test has sufficient validity and reliability, it can be said that the test needs supportive studies.
• It is believed that a test prepared in accordance with the acquisition of the curriculum will enable teachers to get information about his/her students' readiness and misconceptions and to teach their lessons in more planned and efficient way.