Developing questionnaires for students' evaluation of individual faculty's teaching skills: A Saudi Arabian pilot study.

BACKGROUND
The National Commission for Academic Accreditation and Assessment is responsible for the academic accreditation of universities in the Kingdom of Saudi Arabia (KSA). Requirements for this include evaluation of teaching effectiveness, evidence-based conclusions, and external benchmarks.


AIMS
To develop a questionnaire for students' evaluation of the teaching skills of individual instructors and provide a tool for benchmarking.


SETTING
College of Nursing, University of Dammam [UoD], May-June 2009.


MATERIALS AND METHODS
The original questionnaire was "Monash Questionnaire Series on Teaching (MonQueST) - Clinical Nursing. The UoD modification retained four areas and seven responses, but reduced items from 26 to 20. Outcome measures were factor analysis and Cronbach's alpha coefficient.


RESULTS
Seven Nursing courses were studied, viz.: Fundamentals, Medical, Surgical, Psychiatric and Mental Health, Obstetrics and Gynecology, Pediatrics, and Family and Community Health. Total number of students was 74; missing data ranged from 5 to 27%. The explained variance ranged from 66.9% to 78.7%. The observed Cornbach's α coefficients ranged from 0.78 to 0.93, indicating an exceptionally high reliability. The students in the study were found to be fair and frank in their evaluation.


INTRODUCTION
The accreditation body charged with academic accreditation of universities recently introduced in the Kingdom of Saudi Arabia is the National Commission for Academic Accreditation and Assessment (NCAAA). University of Dammam (UOD) was one of the fi rst to be involved in the process. [1] Of the 11 areas identified by NCAAA for evaluation according to internationally accepted standards of good practice, "Students' Learning and Teaching" is considered of primary importance. [2] Requirements include: "A comprehensive system for evaluation of teaching effectiveness, including but not limited to student surveys." [3] The NCAAA "Course Evaluation Survey" (CES) evaluates the effectiveness of teaching in each course as a unit. However, there are other NCAAA requirements. First, "Faculty maintain portfolio of evidence of evaluation, and, of strategies for improvement." 3 Second, "analyses and conclusions should be based on valid evidence rather than subjective impressions." [4] Third, benchmarks should include external comparison. [5] Informative and important as they are, these directives are not suffi cient for comprehensive evaluation of instructor's individual professional areas of strength and weakness in general, and teaching skills in particular. The development of valid and reliable questionnaires for completion by students anonymously on each instructor separately is an indispensable tool for the provision of an authentic judgment on the teacher's individual potential and aptitudes. This input for the evaluation of instructors' teaching skills should preferably be focused each time on a single area of teaching skills. Student Evaluation of Teaching Effectiveness (SETE) has been criticized on several grounds. 6 Traditionally, it is regarded as sensitive. The controversy begins with questioning the validity of students' evaluation of their professors' teaching skills. [7][8][9] Teaching in universities is a complex and multi-dimensional task. [10] Another potential bias against SETE is that, it might induce leniency in the grades assigned to students among other factors. [11,12]

Aim
The primary aim of this study was to develop a valid and reliable instrument for students' evaluation of the teaching skills of individual instructors. A secondary aim was to provide a potential tool with which to benchmark teaching skills among different institutional settings. This paper reports initial results on the teaching skills of clinical nursing instructors.

Study population
The study was carried out in the College of Nursing, UoD in the 2008/09 academic year. The focus of the study was students' evaluation of each instructor's teaching skills in clinical nursing courses. Students were assembled in their respective classes and the questionnaires were distributed to them. They were given suffi cient time to respond to the questionnaire without prompting. Each group was supervised by an independent faculty member (i.e. one who was not being evaluated in that session.) Throughout the study, care was taken to protect anonymity of evaluators i.e. the students, but not the evaluated i.e. the instructors.

The questionnaire
The original questionnaire was the "Monash Questionnaire Series on Teaching (MonQueST) -Clinical Nursing. [13] It consists of four areas, 26 items and seven response options. These were: (1) All or almost all, (2) Most, (3) About half, (4) Only some and (5) Very few as well as (6) Entirely inappropriate and (7) Attended too few.
In the modifi cation by UOD, the four areas and seven response options were retained, but the items were reduced from 26 to 20 [ Table 1]. Response options 6 and 7 were put in a separate category because all students in the study were full-time, and their attendance at clinical instructions was mandatory. Accordingly, statistical analysis of the modifi ed MonQueST was based on a 5-point scale relating to the fi rst fi ve response options. Outcome Measures were factor analysis and Cronbach's alpha coeffi cient.

Statistical analysis
Data entry and analyses required SPSS version 13. Factor analysis was performed to measure the ability of the questions asked to relate in the actual construction that was intended for use. In this fi rst step, the inter-item correlation was explored. This created a matrix of correlation of all items. Eignevalue and amount of variances explained was calculated for each item and for the different modules in the study.
At this stage, the risk of "singularity" had to be borne in mind (i.e. items that are perfectly correlated with R > 0.9). Therefore, two sub-types of items were identifi ed: (a) Those that failed to correlate with others, and (b) Those which demonstrated singularity. This was a pre-requisite for the second step (i.e. reliability test) since the above items, if any, had to be excluded. A check for the normal distribution of the scores was also done.
Internal consistency reliability test (test-retest measure of reliability) was then performed by administering the same instrument to the same group of students for different instructors for each course. The internal reliability estimates were calculated using Cronbach's alpha coeffi cient. [14] It provides a conservative estimate of reliability, and, generally represents the lower bound to the reliability of a scale item. Cronbach's alpha coeffi cient greater than or equal to 0.70 was taken as an acceptable criterion for reliability of the scale. [15]

RESULTS
At present, all the students and staff of the Nursing College are females. Seven courses from the Nursing Program were studied, namely: Fundamentals of Nursing, Medical Nursing, Surgical Nursing, Psychiatric and Mental Health Nursing, Obstetrics and Gynecologic Nursing, Pediatric Nursing, and Family and Community Health Nursing. There was one course from Level 2 and three each from Levels III and IV.
Based on a 5-point scale, the total number of students was 74; missing data ranged from 5 to 27%.

Factor analysis
All the 20 items of the employed questionnaire were entered in a factor analysis for each module, with a minimum of one eigenvalue for factor extraction and or 0.4 for item-to-factor loading. The procedure generated four areas in which all the 20 items were included. The explained variance ranged from 66.9% to 78.7%, depending on the module, except the "Fundamentals of Nursing". In this module (sample size=74), inter-item correlations failed to emerge in 23% of paired items, and the explained variance was less than 54%. As a result, this module had to be excluded from further analysis. [16]

Reliability
The internal consistency reliability was tested by Cornbach's  coeffi cient for each of the four areas in each of the six modules with the individual student as the unit of analysis. The observed  coeffi cients ranged from 0.78 to 0.93, indicating an exceptionally high reliability. By convention, a lenient cut-off of 0.60 is common in exploratory research; alpha should be at least 0.70 or higher to retain an item in an "adequate" scale. Many researchers require a cut-off = 0.80 for a "good scale." [15] DISCUSSION All student evaluations are based on the hypothesis that students are the best experts to assess their teachers. [17,18] Nevertheless, Students Evaluation of Teaching Effectiveness (SETE) is controversial. [7][8][9][10][11][12][19][20][21][22][23][24] With the advent of NCAAA, institutions seeking academic accreditation in KSA will be required to apply SETE in the medium term. Writing from King Faisal University of Petroleum and Minerals in Dhahran, KSA, Siddiqi (2002) observed: "Proper questionnaire design has been cited as one of the key factors in the qualitative outcome of the exercise." [18] Questionnaires seeking students' opinion should be reliable, valid and consistent, but also concise and adequate [Tables 2 and 3]. This is especially so if the area studied is traditionally regarded as sensitive such as students' evaluation of their individual professors' teaching skills. The exclusion of six items was informed by the logical and pragmatic approach. This demanded that all the key components in the original questionnaire be retained. Furthermore, the remaining 20 items which covered major aspects of teaching Clinical Nursing were more simply and clearly phrased for the students.
Hence, it was gratifying to note that, the reduction of the items from 26 in the original instrument to 20 in the present version did not result in a signifi cant reduction in reliability, validity or consistency of the instrument. It rendered the modifi ed version more concise and suitable, for use in our local socio-cultural setting. It was therefore, fi t for the intended purpose: that of readily providing valid, objective data.
Another issue for discussion is the minimum number of students required for an assessment of teaching to be valid. [25] In a recent publication, Chenot, Kochen and Himmel used a cut-off point of fi ve students. [26] Thus, the number of students in this study was considered adequate, especially for a pilot study.
The modifi ed MonQueST demonstrated another useful attribute: the ejection of one module as a result of statistical scrutiny: "Fundamentals of Nursing". This outcome was subsequently validated by the Course Supervisor who pointed out that in actual delivery,