Abstract
We developed a criterion-referenced student rating of instruction (SRI) to facilitate formative assessment of teaching. It involves four dimensions of teaching quality that are grounded in current instructional design principles: Organization and structure, Assessment and feedback, Personal interactions, and Academic rigor. Using item response theory and Wright mapping methods, we describe teaching characteristics at various points along the latent continuum for each scale. These maps enable criterion-referenced score interpretation by making an explicit connection between test performance and the theoretical framework. We explain the way our Wright maps can be used to enhance an instructor’s ability to interpret scores and identify ways to refine teaching. Although our work is aimed at improving score interpretation, a criterion-referenced test is not immune to factors that may bias test scores. The literature on SRIs is filled with research on factors unrelated to teaching that may bias scores. Therefore, we also used multilevel models to evaluate the extent to which student and course characteristic may affect scores and compromise score interpretation. Results indicated that student anger and the interaction between student gender and instructor gender are significant effects that account for a small amount of variance in SRI scores. All things considered, our criterion-referenced approach to SRIs is a viable way to describe teaching quality and help instructors refine pedagogy and facilitate course development.
Similar content being viewed by others
Notes
The complete measure is available upon request. For brevity, we did not include it in this paper.
An unpublished manuscript about the original study is available upon request.
A Wright map is also referred to as an item map.
References
Aleamoni, L. M. (1999). Student rating myths versus research facts from 1924 to 1998. Journal of Personnel Evaluation in Education, 13(2), 153–166.
Anderson, K., & Miller, E. D. (1997). Gender and student evaluations of teaching. PS: Political Science and Politics, 30(2), 216–219.
Arreola, R. A. (2007). Developing a comprehensive faculty evaluation system: A guide to designing, building, and operating large-scale faculty evaluation systems (3rd ed.). Bolton, MA: Anker Publishing Company Inc.
Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology, 87(4), 656–665.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01.
Bensley, D. A. (2010). A brief guide for teaching and assessing critical thinking in psychology. Observer, 23(10). Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2010/december-10/a-brief-guide-for-teaching-and-assessing-critical-thinking-in-psychology.html.
Benson, J. (1998). Developing a strong program of construct validation: A test anxiety example. Educational Measurement: Issues & Practice, 17, 10–22.
Benton, S. L., & Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In M. B. Paulsen (Ed.), Higher education: Handbook of theory & research (Vol. 29, pp. 279–326). Dordrecht, The Netherlands: Springer.
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). New York: Routledge.
Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economic of Education Review, 41, 71–88.
Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.). (1999). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press.
Brophy, J. E. (1999). Teaching (educational practices series—1). Geneva, Switzerland: International Academy of Education and International Bureau of Education, UNESCO.
Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118(3), 409–432.
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards for tests. Thousand Oaks, CA: Sage.
Clark, R. E. (1983). Reconsidering research on learning from media. Review of Educational Research, 53, 445–459. doi:10.3102/00346543053004445.
Clark, R. E. (2009). Translating research into new instructional technologies for higher education: The active ingredient process. The Journal of Computing in Higher Education, 21, 4–18. doi:10.1007/s12528-009-9013-8.
Clayson, D. E. (2009). Student evaluations of teaching: Are they related to what students learn? A meta-analysis and review of the literature. Journal of Marketing Education, 31, 16–30.
Cohen, P. A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51, 281–309.
Feldman, K. A. (1992). College students’ views of male and female college teachers: Part I—evidence from the social laboratory and experiments. Research in Higher Education, 33(3), 317–375.
Feldman, K. A. (1993). College students’ views of male and female college teachers: Part II—evidence from students’ evaluations of their classroom teachers. Research in Higher Education, 34(2), 151–211.
Ferguson, R. F. (2012). Can student surveys measure teaching quality. The Phi Delta Kappan, 94(3), 24–28.
Fink, L. D. (2013). Creating significant learning experiences: An integrated approach to designing college courses. San Franciso: Jossey-Bass.
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.
Greenwald, A. G., & Gillmore, G. M. (1997). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209–1217.
Hamre, B. K., Pianta, R. C., Downer, J. T., DeCoster, J., Mashburn, A. J., Jones, S. M., et al. (2013). Teaching through interactions: Testing a developmental framework of teacher effectiveness in over 4000 classrooms. The Elementary School Journal, 113(4), 461–487.
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. London: Routledge.
Huynh, H. (1998). On score locations of binary and partial credit items and their applications to item mapping and criterion-referenced interpretation. Journal of Educational and Behavioral Statistics, 23, 35–56.
Huynh, H., & Meyer, J. P. (2003). Maximum information approach to scale description for affective measures based on the Rasch model. Journal of Applied Measurement, 4, 101–110.
Johnson, V. E. (2003). Grade inflation: A crisis in college education. New York: Springer.
Kennedy, M. J., Thomas, C. N., Aronin, S., Newton, J. R., & Lloyd, J. W. (2014). Improving teacher candidate knowledge using content acquisition podcasts. Computers & Education, 70, 116–127. doi:10.1016/j.compedu.2013.08.010.
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2015). Package ‘lmerTest’ [computer software] version 2.0–29. Retrieved from https://cran.r-project.org/web/packages/lmerTest/index.html.
Linacre, J. M. (2006). A user’s guide to WINSTEPS Rasch-model computer programs. Chicago, IL: Author.
Lüdtke, O., Robitzsch, A., Trautwein, U., & Kunter, M. (2009). Assessing the impact of learning environments: How to use student ratings of classroom or school characteristics in multilevel modeling. Contemporary Educational Psychology, 34, 120–131.
Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253–388.
Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187–1197.
Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myths, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92(1), 202–228.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.
Mayer, R. E. (2008). Applying the science of learning: Evidence-based principles for the design of multimedia instruction. American Psychologist, 63, 760–769. doi:10.1037/0003-066X.63.8.760.
Mayer, R. E. (2009). Multimedia learning (2nd ed.). New York: Cambridge University Press.
McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., & Hamilton, D. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: Rand.
McKeachie, W. J., & Svinicki, M. (2006). McKeachie’s teaching tips: Strategies, research, and theory for college and university teachers (12th ed.). Boston: Houghton Mifflin.
Meyer, J. P. (2014). Applied measurement with jMetrik. New York: Routledge.
Ory, J. C., & Ryan, K. (2001). How do student ratings measure up to a new validity framework? New Directions for Institutional Research, 109, 27–44. doi:10.1002/ir.2.
Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), 109–119.
Pyc, M. A., Agarwal, P. K., & Roediger, H. L., III (2014). Test-enhanced learning. In V. A. Benassi, C. E. Overson, & C. M. Hakala (Eds.) Applying science of learning in education: Infusing psychological science into the curriculum. Retrieved from the Society for the Teaching of Psychology website http://teachpsych.org/ebooks/asle2014/index.php.
Raudenbush, S. W., & Jean, M. (2014). To what extent do student perceptions of classroom quality predict teacher value added? In T. J. Kane, K. A. Kerr, & R. C. Pianta (Eds.), Designing teacher evaluation systems. San Franciso, CA: Jossey-Bass.
Wiggins, G., & McTighe, J. (2011). Understanding by design (2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development.
Willingham, D. T. (2007). Critical thinking: Why is it so hard to teach? American Educator. Washington, D.C.: American Federation of Teachers. http://www.aft.org/newspubs/periodicals/ae.
Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum.
Acknowledgments
We thank Emily Bowling, Fares Karam, Bo Odom, and Laura Tortorelli for their work on the original version of this measure. They developed the original teaching framework and wrote the initial pool of items as part of a course project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Meyer, J.P., Doromal, J.B., Wei, X. et al. A Criterion-Referenced Approach to Student Ratings of Instruction. Res High Educ 58, 545–567 (2017). https://doi.org/10.1007/s11162-016-9437-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11162-016-9437-8