Understanding the effect of response rate and class size interaction on students evaluation of teaching in a higher education

Objective: This study aims to investigate the interaction between response rate and class size and its effects on students’ evaluation of instructors and the courses offered at a higher education Institution in Saudi Arabia. Study Design: A retrospective study design was chosen. Methods: One thousand four hundred and forty four different courses belonging to all the Colleges (N = 21) located across seven different campuses of University of Dammam (UOD) were considered in this study. All the course evaluation surveys (CES) (N = 168,574) conducted during the academic year 2013–2014 were analyzed to investigate response rate and class size interaction effect on students evaluation of courses and instructors. Results: It is observed that when the class size is at the medium level, the ratings of instructors and courses increase as the response rate increases. On the contrary; when the class size is small, a high response rate is required for the evaluation of instructors and at least medium response rate is required for evaluation of courses. The study suggests that the interaction between response rate and class size is an important factor that needs to be taken into account while interpreting the students’ evaluation of instructors and courses. Originality: This research study examined the effect of interaction *Corresponding author: Ahmed Al Kuwaiti, Deanship of Quality and Academic Accreditation, University of Dammam, P.O. Box 40065, Al-Khobar 31952, Saudi Arabia E-mails: akuwaiti@uod.edu.sa, qaa@uod.edu.sa


PUBLIC INTEREST STATEMENT
University of Dammam (UOD) is currently performing several evaluations by students as a measure to improve the quality of the institution as well as its programs. While carrying out these evaluations, several factors influence the outcome of students' evaluations and most critical one is the influence of class size and response rate of students. This research paper addresses the interaction of class size and response rate and it concluded that class size has a significant effect on the students' evaluation of instructors and courses in Higher Education. This study will help the higher education institutions to understand the optimal response rate required for different class sizes while planning students' evaluation surveys of courses and instructors.

Introduction
Students' Evaluation of Teaching Effectiveness is one of the most widely accepted indicators of measuring the Quality of Higher Education worldwide, and it has gained "marvelous" attention in the field of psychology, quality control, and quality assurance in the last few decades (Ginns, Prosser, & Barrie, 2007;Vijay, 2014). The process of improving the quality of Higher Education is a dynamic one, and universities ought to continuously improve their teaching based on the students' perceptions (Konting, Kamaruddin, & Man, 2009). Utilizing students perception in improving the quality of Higher Education is a commonly utilized practice in almost every university across the globe (Zabaleta, 2007). Higher Education Institutions (HEIs) in the Kingdom of Saudi Arabia (KSA) are also becoming increasingly aware of the importance of quality in the knowledge delivery due to increasing numbers of students entering the educational institutions (Al-Kuwaiti & Subbarayalu, 2015). Since students are the individuals who are most exposed to and the most affected by the teacher's teaching, their perceptions and point of views about both instructors and courses are of paramount importance. Research also indicates that students are the most qualified sources of data and information about teaching and learning settings (Archibong & Nja, 2011). Thus, the quality of university courses should certainly be evaluated by its recipients (Nikolaidis & Dimitriadis, 2014) and these evaluations need to include all the elements of the teaching and learning processes (Wachtel, 1998;Chen & Hoshower, 2003;Clayson, 2009;Berk, 2005). This practice of evaluating the courses through students feedback has been included as one of the key mechanisms in the internal quality-assurance processes as a way of demonstrating institution's performance in accounting and auditing practices (Blackmore, 2009;Johnson, 2000). In order to facilitate that, several instruments are available to measure the level of student satisfaction of the courses (Coffey & Gibbs, 2001;Ramsden, 1991).
Traditionally, these surveys comprised a series of closed-ended questions about courses and teaching effectiveness with at least one question pertaining to overall teaching effectiveness. The surveys were typically anonymous and mostly conducted near the end of the semester using either a paper-based or electronic format (Kogan, Schoenfeld-Tacher, & Hellyer, 2010). Once the data were collected, reports were generated across the instructors, departments, and colleges, which are considered as evidence of teaching effectiveness (Sproule, 2000). An important part of any such evaluation is the communication of results in such a way that allows fair and meaningful interpretations and comparisons of results to make judgment on quality of teaching, career advancement, and funding of teaching (Burden, 2008;Kuzmanovic, Savic, Andric Gusavac, Makajic-Nikolic, & Panic, 2013;Neumann, 2000). These evaluations have contributed to quality in the educational process, especially if the proper reliability coefficients are used to assess the psychometric properties of the instruments (Al-Kuwaiti, 2014;Burden, 2008;Morley, 2014).
There are several reasons that make HEI's use students' evaluations and assessments (Spooren, Brockx, & Mortelmans, 2013), namely (i) quick feedback assuming that instructors make changes based on students' evaluation; (ii) students evaluation is used for critical decisions such as promotion; and (iii) for accreditation and governmental agencies that require such evaluations. Besides this there are several benefits that the students' evaluations can have for institutions, some of the benefits being: (i) instructors value the input and make improvements in their teaching; (ii) instructors are rewarded for having excellent ratings; (iii) instructors with very low ratings are encouraged to seek help; (iv) students perceive and use evaluations as a way to suggest improvements in teaching; (v) students have more information on which to make their course selections; and (vi) evaluations motivate instructors to improve teaching (Archibong & Nja, 2011;Neumann, 2000;Ory & Ryan, 2001). University administrators also utilize students' evaluations to make administrative decisions; despite the fact that a number of administrators expressed concern about the validity and value of student evaluations of teaching (Beran, Violato, & Kline, 2007;Kogan & Shea, 2007).
A previous study conducted by Addison, Best, and Warrington (2006) identified three key factors influencing students' rating during the course evaluation survey (CES) (i.e. Course, instructors, and students) and examined the relationship between the students' perceptions related to course difficultly and expected grades. The results revealed that perceived difficulty is associated with grade expectations and the ratings that students give on formal evaluations. Students who have high academic achievement evaluated instructors higher than those who have lower academic achievement. Further, it is also observed that regardless of the academic achievement, higher evaluations were given by students who found the course easier than expected compared to those who found it harder than initially anticipated.
Several studies have been conducted to study the effect of class size on the outcome of students' evaluation of courses in higher education. Gleason (2012) demonstrated that medium classes (i.e. 30-55 students) had little to no benefit over large classes (i.e. 110-130 students) in student learning and student achievement, whereas large classes have small-to-medium positive effect over medium classes in the area of student satisfaction. The only area in which the small classes had a small positive effect was in the area of student engagement, while Pezzella, Paladino, Zoller, and Mandery (2014) compared the performance of students in large-sized classes to students in small classes to assess the efficacy of student learning in large classes. The results of that study indicated that large classes are as efficacious as small classes and that student performance is quite nuanced. On exploring the existing literature, no studies have examined the effect of Interaction between the response rate (RR) and class size (CS) on students' evaluation of Instructors. Thus, the primary objective of this study is to find out the effect of interaction between response rate and class size on students' evaluation of courses and instructors.

The instrument
In this study, CES used to collect the data, which contains 14 five-point Likert Scale items, divided into two subscales (instructor and course related items). The questionnaire has been developed based on the guidelines of the National Commission for Academic Accreditation and Assessment (NCAAA) in Saudi Arabia for the purpose of accreditation. CES was developed by a panel of experts in the related areas, and several studies investigated its psychometric properties and usefulness (Al Rubaish, Wosornu, & Dwivedi, 2011, 2012Al-Kuwaiti & Maruthamuthu, 2014). Besides that, Corrected-Item-To-Total Correlation and Cronbach's Alpha were calculated from a random sample (n = 50) selected from the current data. The results show that Cronbach's Alpha equals 0.963 and the Corrected-Item-To-Total Correlation ranges from 0.568 to 0.888 which adds evidences for the reliability and the validity of CES.

Data collection
University of Dammam (UOD) through the Deanship of Quality and Academic Accreditation (DQAA) has developed a special application called "UDQUEST" to collect data electronically, which is related to several attributes of the courses offered at different programs in the university. Dommeyer, Baum, Hanna, and Chapman (2004) found that online evaluations do not produce significantly different mean evaluation scores than traditional in-class evaluations, even when different incentives are offered to students to complete online evaluations. Contrary to this, Capa-Aydin (2014) indicated that in-class evaluations have a significantly higher response rate than online evaluation. In addition, Rasch's analysis showed that mean ratings in in-class evaluation were significantly higher than those in online evaluations. Several evaluations have been conducted at UOD for the purpose of accreditation, as stipulated by NCAAA. One among them is the CES, which is used to evaluate courses and the related instructors at the end of every semester to all the students who registered in every course offered in that semester.
A total of 1,443 different undergraduate courses belonging to all the colleges (N = 21) located across seven different campuses of UOD were considered in this study. Accordingly, all the CES surveys (N = 168,574) conducted during the academic year 2013-2014 were included in the analysis.

Statistical analysis
A Two-Way Analysis of Variance (Full model) was used to investigate whether there is any significant difference in students' evaluations with respect to different class sizes and response rates. Students' evaluations of both instructors and courses are considered as dependent variables and the independent variables were class size and response rate. "CS" is defined as the number of students registered in the course or the number of students who targeted to fill the survey whereas "RR" is defined as the proportion of students who responded to the survey with respect to the number of students who were supposed to fill the survey. If the main effect (i.e. neither "RR" nor "CS") and the interaction effect (both "RR" and "CS") are found significant, then the Tukey Post Hoc test was used for pairwise comparison. In this study, the response rate and class size were categorized into three levels; 1 -low [class size less than 60 and response rate less than 73%]; 2 -medium [class size (61-200) and response rate (74-91) and]; 3 -high [class size over 200 and response rate over 92%] (Nulty, 2008). The above classification was designed assuming 3% sampling error and 95% confidence level conditions (Nulty, 2008). Table 1 shows Descriptive statistics of students' evaluation of instructors according to class size and response rate. In order to explore the interaction effect between response rate and class size on students' evaluation of instructors and courses, the descriptive statistics of two independent variables (response rate and class size) with respect to the dependent variable i.e. mean score of two sub-scale (i.e. effectiveness of the instructors and courses) were calculated using the statistical package for the social science (SPSS), version 19.  Table 1 shows that the mean ratings of students' evaluation of instructors which ranges from 3.88 to 4.00. The lowest mean is [3.88] when the response rate is low across all level of the class size. Likewise, highest mean is [4.00] when the response rate is high and the class size is medium. A Two-Way ANOVA was used to test the differences between these means and the results are presented in Table 2.

Results
From Table 2, it is shown that both the main effects (response rate; class size) as well as the interaction effect are statistically significant and hence the data are further subjected to pairwise comparisons.
From Table 3, it is observed that the highest mean differences in students rating is found between high and low response rate, taken into account that all the mean difference are statistically significant.   From the above Table 4, it is observed that the mean students rating differ significantly between large class size and smaller or medium class sizes. Such result is in conformance with the finding of previous studies done by Koh and Tan (1997) and Badri, Abdulla, Kamali, and Dodeen (2006). The investigators explored further to study the interaction effect of response rate and class size on the students rating of instructors.
From Figure 1, it is inferred that when the class size is medium or small, rating of students evaluation of instructors increases as long as the response rate increases. On the other hand; when the class size is large, rating of students evaluation of instructors increases when the response rate is moving from low response rate to medium response rate, and then it gets stable when it is moving to high response rate. Across all levels of response rate, the mean rating of students' evaluation of instructors is found to be less for the small class size than that for the large class size. This suggests that when the class size is small, the response rate needs to be at least at the medium level to have stable students' evaluation of instructors, and the medium response rate is sufficient, when the class size is large. Table 5 shows descriptive statistics of students' evaluation of courses according to response rate and class size. It ranges from 3.66 to 3.88. The lowest mean ratings is observed (i.e. 3.66) when the response rate is low and the class size is small, while the highest rating (i.e. 3.88) is found when the response rate is high and the class size is medium. Further, a Two-Way ANOVA was used to test whether observed mean differences are statistically significant and the results are shown in Table 6.
From Table 6, it is inferred that both the main effects (i.e. CS and RR) and the interaction effect (i.e. CS and RR) are statistically significant. Further, the data is subjected to pairwise comparison and the results are shown in Tables 7 and 8.  Table 7 shows the mean differences in students rating between three different response rates. It is observed that all the mean difference are statistically significant at 0.05 levels. Specifically, highest mean difference is observed between the high and low response rate (i.e. 0.118).   From Table 8, it is inferred that all the observed mean difference in the students evaluation of courses between three different class sizes are statistically significant and the highest mean difference is noted between medium and small class sizes (i.e. 0.068).
Irrespective of the response rate (RR), the mean rating of the students' evaluation of courses are found to be less if the class size is small, however, it improves correspondingly in a steady pace with an increase in the size of the class. Also, rating of students' evaluation of courses increases when the response rate is moving from "low" to "medium" and then it gets relatively stable when moving to "high" response rate. This suggests that when the class size is small, the response rate needs to be at least at the medium level to have stable students' evaluation of courses. Moreover, we observe that when the class size is medium, rating of students' evaluation of courses increases as the response rate increases (Figure 2).

Discussion
This study explored the effect of interaction between response rate and class size on ratings of students' evaluations of instructors and courses. The frequent use of students' evaluation of courses is largely due to the easiness of collecting the data, presenting, and interpreting the results (Penny, 2003). The interpretation of this evaluation is more complicated than it looks, and it entails a risk of inappropriate use by both teachers and administrators for both formative and summative purposes (Franklin, 2001). This study indicated that if the students' evaluation of courses and instructors are conducted using a small class size and low response rate, it might lead to misinterpretation of findings. Previous study also suggested that inadequate results of students' evaluation of teaching could not be used for formative and faculty decision purposes (Conle, 1999;Franklin, 2001;Galbraith, Merrill, & Kline, 2012).
Precisely, an unstable students' evaluation of courses and instructors might occur when the response rate is low and the class size is small. So it is recommended that at least medium response rate is required if the class size is small in order to get stable ratings of students' evaluation of both instructors and courses. Further, we observe that when the class size is medium, the ratings of instructors and courses increase as the response rate increases. From the statistical point of view, poor response rate among the students in these evaluation surveys would make it difficult to arrive at generalisations and utmost caution need to be taken while interpreting the results of these evaluations (Al-Kuwaiti & Subbarayalu, 2015). Therefore, the HEIs need to be aware of these interpretations while using the students' evaluation surveys as a measure to take quality improvement and accountability purposes (Nulty, 2008).
Moreover, the ways in which administrators engage in students' evaluation of teaching effectiveness could be considered one of the greatest threats to the purposes of these evaluations (Penny, 2003). Many users are not sufficiently trained to handle these data, and they may even be unaware of their own ignorance about the collection and interpretation of these evaluations (Spooren et al., 2013). Therefore, misuse of these evaluations might have consequences for both the improvement of teaching and the career development (Boysen, Kelly, Raesly, & Casner, 2013). At the same time, a response rate should be considered large enough for the survey data to provide adequate evidence for accountability and the improvement purposes in order to maximize the benefits from these evaluations (Nulty, 2008). Also, it is very important to consider the types of students participated in the evaluation to ascertain whether the perceptions of participants differ from those of non-participants, particularly when the response rate is low and class size is small.
Besides class size and response rate, there are other factors that might interfere with the students evaluation of teaching. These factors are grouped under three categories viz. Students, faculty, and course factors. The students' factors include gender (Campbell & Bozeman, 2007); cultural background of the students (Capa-Aydin, 2014); Domain-specific vocational interests (Chen & Watkins, 2010) and; psychosocial dynamics such as instructors' attractiveness (Freng & Webber, 2009); Faculty factors include gender (Smith, 2009) and teachers characteristics (Chen & Watkins, 2010;Clayson, 2009;Clayson & Sheffet, 2006). The Course factors include grades or expected grades (Al-Kuwaiti & Maruthamuthu, 2014;Campbell & Bozeman, 2007;Dommeyer et al., 2004); course level (Galbraith et al., 2012); and course difficulty (Ginns et al., 2007). The present study also contributes to this discussion by Table 9. Suggested response rate for different class sizes while conducting the students' evaluation of instructors and courses investigating the interaction between response rate and class size (total number of students on the course) and its effects on students' evaluation of instructors and the courses offered at HEIs in Saudi Arabia. This study adds value to the literature by providing an appropriate response rate required for different class sizes while conducting students' evaluation of courses and instructors (Table 9).

Conclusions
We conclude that the interaction between class size and response rate has a significant effect on the students' evaluation of instructors and courses in higher education. Specifically, if the class size is small, a high response rate is required for evaluation of instructors and at least medium response rate is required for evaluation of courses. The results of the study will help the policy planners of HEIs to understand the response rate required for different class sizes while planning students' evaluation surveys of courses and instructors.