806
views
0
recommends
+1 Recommend
1 collections
    4
    shares
      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Small samples, unreasonable generalizations, and outliers: Gender bias in student evaluation of teaching or three unhappy students?

      research-article
        1 , * , , 1
      ScienceOpen Research
      ScienceOpen
      student evaluation of teaching, SET, small samples, outliers, generalization
      Bookmark

            Abstract

            In a widely cited and widely talked about study, MacNell et al. (2015) [1] examined SET ratings of one female and one male instructor, each teaching two sections of the same online course, one section under their true gender and the other section under false/opposite gender. MacNell et al. concluded that students rated perceived female instructors more harshly than perceived male instructors, demonstrating gender bias against perceived female instructors. Boring, Ottoboni, and Stark (2016) [2] re-analyzed MacNell et al.’s data and confirmed their conclusions. However, the design of MacNell et al. study is fundamentally flawed. First, MacNell et al.’ section sample sizes were extremely small, ranging from 8 to 12 students. Second, MacNell et al. included only one female and one male instructor. Third, MacNell et al.’s findings depend on three outliers – three unhappy students (all in perceived female conditions) who gave their instructors the lowest possible ratings on all or nearly all SET items. We re-analyzed MacNell et al.’s data with and without the three outliers. Our analyses showed that the gender bias against perceived female instructors disappeared. Instead, students rated the actual female vs. male instructor higher, regardless of perceived gender. MacNell et al.’s study is a real-life demonstration that conclusions based on extremely small sample-sized studies are unwarranted and uninterpretable.

            Main article text

            INTRODUCTION

            In an article entitled “What’s in a name: Exposing gender bias in student ratings of teaching”, MacNell, Driscoll, and Hunt [1] examined whether students are biased against female faculty when completing student evaluation of teaching (SET) questionnaires. MacNell et al. examined SET ratings of one female and one male instructor teaching an online course under two conditions: when students were either truthfully told the gender of each instructor (True Gender condition) or when students were misled about their instructors’ genders and told that each instructor’s gender was in fact the opposite of what it was (False Gender condition). Accordingly, students evaluated a single identical female instructor under either perceived female/actual female (pF/aF) or under perceived male/actual female (pM/aF) conditions, and evaluated a single identical male instructor under either perceived female/actual male (pF/aM) or under perceived male/actual male (pM/aM) conditions. In each condition, the male and female instructors were evaluated by 8 to 12 students only. MacNell et al. stated that both instructors interacted with their students exclusively online (allowing them to mislead students about their genders) through discussion boards and emails only; graded students’ work at the same time; used the same grading rubrics; and co-ordinated their grading to ensure that grading was equitable in all four sections.

            MacNell et al. [1] concluded that study demonstrated gender bias in student ratings of teaching. They stated:

            “Our findings show that the bias we saw here is not [emphasis in original] a result of gendered behavior on the part of the instructor, but of actual bias on the part of the students. Regardless of actual gender or performance, students rated the perceived female instructor significantly more harshly than the perceived male instructor, which suggests that a female instructor would have to work harder than a male to receive comparable ratings....” (p. 301)

            A year later, MacNell et al.’s [1] data were re-analyzed by Boring, Ottoboni, and Stark [2] using non-parametric permutation tests rather than parametric tests used by MacNell et al. Boring et al. similarly concluded that

            “The results suggests that students rate instructors more on the basis of the instructor’s perceived gender than on the basis of the instructor’s effectiveness. Students of the TA who is actually female did substantially better in the course, but students rated apparently male TAs higher.” (p. 9)

            Thus, two independents sets of three researchers analyzed MacNell et al.’s [1] data and both teams concluded that MacNell et al.’s data were strong evidence of gender bias. However, a detailed examination of MacNell et al.’s study suggests that MacNell et al.’s conclusions are unwarranted and uninterpretable. First, MacNell et al. found no statistically significant gender difference overall (using Student Rating Index) between perceived male and perceived female (p = .128). Boring, Ottoboni, and Stark [2] confirmed the lack of statistically significant gender difference in MacNell et al.’s study using permutation test (p = .12; see their Table 8).

            Second, MacNell et al.’s [1] sample of students in each of the four conditions was extremely small, ranging from only 8 to 12 students. Results based on such small samples typically have low statistical power, inflated discovery rate, inflated effect size estimation, low replicability, low generalizability, and high sensitivity to outliers [3].

            Third, MacNell et al.’s [1] study included only one female and one male instructor. It is difficult to see how one could make valid generalizations about how students rate female vs. male instructors based on how students rate one particular male and one particular female instructor.

            Fourth, MacNell et al.’s [1] Table 2 as well as Figure 2 suggest that the variability of SET ratings is much larger in some conditions than in other conditions, indicating the likely presence of outliers inflating variability in some but not other conditions. In fact, MacNell et al.’s data shown in Table 1, include three obvious outliers – three unhappy students who gave their instructors the lowest possible ratings on all or nearly all SET items (a familiar scenario to anyone who has ever taught such small courses). The three outliers are printed in bold in Table 1. All three occurred in perceived female conditions.

            Table 1:

            MacNell et al. [1] data.

            GroupSET Item
            sexagPg
            123456789101112131415
            pM/aF554443444444354201
            pM/aF444455553455554101
            pM/aF555555555555555201
            pM/aF555553555535555201
            pM/aF555555534555555201
            pM/aF444444343355334101
            pM/aF444444444444444101
            pM/aF555454555555555101
            pM/aF443444544435444201
            pM/aF443333342433333101
            pM/aF555555555555555101
            pM/aF554433443434444101

            pM/aM555555555535555211
            pM/aM555555555555545211
            pM/aM554443443435524211
            pM/aM555445444534424211
            pM/aM444444444444444111
            pM/aM543452255435523111
            pM/aM555445444445544111
            pM/aM454434444444444111
            pM/aM442333342333323111
            pM/aM554443454435444211
            pM/aM444444444444444211

            pF/aF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
            pF/aF 1 1 1 1 1 1 1 1 1 4 3 4 4 1 1 2 0 0
            pF/aF555555555555555100
            pF/aF554443433344544100
            pF/aF553432454344444200
            pF/aF555555555555555200
            pF/aF445543455334434100
            pF/aF555555555555555200

            pF/aM444444444434444210
            pF/aM555555544455555110
            pF/aM553454545534555210
            pF/aM544444444444444110
            pF/aM444444544444444210
            pF/aM444443444444444210
            pF/aM333333333333333110
            pF/aM554444433344444210
            pF/aM 1 1 2 1 1 3 1 1 1 1 1 1 1 1 1 2 1 0
            pF/aM454343443344544110
            pF/aM554344222244414210
            pF/aM443334444444333210

            Note. Group: pM/aF = perceived male/actual female, pM/aM = perceived male/actual male, pF/aF = perceived female/actual female, pF/aM = perceived female/actual male; sex: 1 = male student, 2 = female student; ag = actual gender: 0 = female, 1 = male; pg = perceived gender: 0 = female, 1 = male; SET Item: 1 = professional, 2 = respect, 3 = caring, 4 = enthusiastic, 5 = communicate, 6 = helpful, 7 = feedback, 8 = prompt, 9 = consistent, 10 = fair, 11 = responsive, 12 = praised, 13 = knowledgeable, 14 = clear, 15 = overall.

            Table 2:

            Mean student ratings of teaching for each of the 12 items used by MacNell et al. [1]. The top third shows the data copied from MacNell et al.’s Table 2; the middle third shows our replication of MacNell et al.’s analyses; and the bottom third shows our replication of MacNell et al.’s analyses with the three outliers removed.

            SET
            Item
            Actual Gender
            Perceived Gender
            aF
            M
            aF
            SD
            aM
            M
            aM
            SD
            diff. r2 p pF
            M
            pF
            SD
            pM
            M
            pM
            SD
            diff. r2 P
            MacNell et al.’s analyses copied from their Table 2
            Caring4.001.2573.870.8680.13.0043.651.2264.170.834-0.52.071
            Consistent3.801.3223.701.0200.10.0023.501.3573.960.928-0.47.045
            Enthusiastic4.051.1913.780.8500.27.0193.601.3144.170.576-0.57.112
            Fair4.051.0503.780.9510.27.0183.501.1924.260.619-0.76.188*
            Feedback4.101.2523.831.0290.27.0153.701.3804.170.834-0.47.054
            Helpful3.651.3093.830.834-0.18.0083.501.1923.960.928-0.46.049
            Knowledgeable4.201.0564.090.9490.11.0033.951.1914.300.765-0.35.038
            Praise4.350.9884.090.9000.26.0203.851.0894.520.665-0.67.153*
            Professional4.301.2184.350.935-0.05.0004.001.4144.610.499-0.61.124
            Prompt4.101.2523.870.9190.23.0133.551.3564.350.573-0.80.191*
            Respectful4.301.2184.350.935-0.05.0014.001.4144.610.499-0.61.124
            Responsive4.001.1243.570.8430.43.0523.651.1373.870.869-0.22.013
            Replication of MacNell et al.’s analyses
            Caring4.001.2573.870.8690.13.004.6993.651.2264.170.834-0.52.063.116
            Consistent3.801.3223.701.0200.10.002.7763.501.3573.960.928-0.46.040.214
            Enthusiastic4.051.1913.780.8500.27.018.4093.601.3144.170.576-0.57.081.083
            Fair4.051.0503.780.9510.27.018.3903.501.1924.260.619-0.76.149.016
            Feedback4.101.2523.831.0290.27.015.4423.701.3804.170.834-0.47.045.191
            Helpful3.651.3093.830.834-0.18.007.6093.501.1923.960.928-0.46.046.174
            Knowledgeable4.201.0564.090.9490.11.003.7163.951.1914.300.765-0.35.033.262
            Praise4.350.9884.090.9000.26.020.3703.851.0894.520.665-0.67.130.023
            Professional4.301.2184.350.935-0.05.001.8874.001.4144.610.499-0.61.084.080
            Prompt4.101.2523.870.9200.23.012.5023.551.3564.350.573-0.80.139.022
            Respectful4.301.2184.350.935-0.05.001.8874.001.4144.610.499-0.61.084.080
            Responsive4.001.1243.570.8430.43.049.1653.651.1373.870.869-0.22.012.486
            Re-analysis of MacNell et al.’s analyses without outliers
            Caring4.330.7673.950.7850.38.058.1334.060.7484.170.834-0.11.005.650
            Consistent4.110.9633.820.8530.29.027.3213.940.8993.960.928-0.02.000.958
            Enthusiastic4.390.6083.910.6100.48.139.0184.060.7484.170.576-0.11.008.601
            Fair4.220.8083.910.7500.31.041.2163.760.9034.260.619-0.50.101.062
            Feedback4.440.7053.950.8440.49.092.0534.180.8094.170.8340.01.000.992
            Helpful3.940.9983.860.8340.08.002.7863.820.8833.960.928-0.14.005.648
            Knowledgeable4.390.7784.230.6850.16.013.4954.290.6864.300.765-0.01.000.965
            Praise4.560.6164.230.6120.33.069.1014.180.5294.520.665-0.34.076.075
            Professional4.670.4854.500.5980.17.023.3364.530.6244.610.499-0.08.005.669
            Prompt4.440.7054.000.6900.44.096.0534.000.8664.350.573-0.35.058.162
            Respectful4.670.4854.500.5980.17.023.3364.530.6244.610.499-0.08.005.669
            Responsive4.220.8783.680.6460.54.117.0384.000.7073.870.8690.13.007.604
            N23202023

            Note. † p < .10; * p < .05; pM/aF = perceived male/actual female, pM/aM = perceived male/actual male, pF/aF = perceived female/actual female, pF/aM = perceived female/actual male.

            Accordingly, we examine the effect of the three outliers – three unhappy students – on MacNell et al.’s [1] findings and conclusions. Specifically, we re-analyzed MacNell et al.’s data and attempted to replicate summaries in MacNell et al.’s Table 2 and Figure 1 under two scenarios: (1) with the three outliers kept in the analyses and (2) with the three outliers removed from the data set.

            Figure 1.

            MacNell et al.’s [1] data. Panel A shows the boxplot of SET ratings – the mean average of 12 items used by MacNell et al. The boxplot highlights the presence of three outliers – three students giving their instructors the lowest possible rating on all or nearly all SET items. Panel B shows the same data but for the mean average for all 15 items. The same three outliers are visible. Panel C shows the near identity relationship between the average of 12 items and the average of 15 items (r = .998). Panel D shows the strip chart of the 12-item means for each of the four experimental conditions and highlights extremely small number of students in each condition. It also shows that the three outliers occurred in the two perceived female conditions.

            Figure 2.

            SET ratings for 12-item averages. Panel A shows the SET ratings for the actual male vs. female instructor and for the perceived male vs. female instructor for all data. Panel B shows the SET ratings by the four experimental conditions for all data. The instructor perceived as male received higher ratings that the instructor perceived as female. Panel C shows the SET ratings for the actual male vs. female instructor and for the perceived vs. female instructor when the three outliers are removed. Panel D shows the SET ratings by the four experimental conditions when the three outliers are removed. The actual female instructor received higher ratings than the actual male instructor, regardless of their perceived gender.

            METHOD

            We downloaded MacNell et al.’s [1] data from http://n2t.net/ark:/b6078/d1mw2k, via the link provided in Boring, Ottoboni, and Stark [2]. We formally examined MacNell et al.’s data for outliers using Tukey’s rule for identifying outliers as values more than 1.5 interquartile range from the quartiles and then re-analyzed MacNell’s data with and without the three outliers plainly visible in Table 1.

            Based on preliminary principal component factor analysis of their data, MacNell et al. [1] used only 12 of 15 SET items in their analyses – they excluded communicate (item 5), clear (item 14), and overall (item 15) SET items. Given the hazardous nature of conducting a principal component factor analysis on 15 variables with only 43 participants and three outliers, we used the same 12 items identified by MacNell et al. but we also examined how the mean of these 12 items correlates with the mean of all 15 items.

            Specifically, we attempted to replicate MacNell et al.’s [1] summaries in Table 2 and Figure 1 and to see how these summaries would change when the three outliers were removed. Notably, neither MacNell et al. nor Boring et al. [2] mentioned outliers in their analyses of MaNell et al.’s data.

            RESULTS

            Figure 1, Panel A, shows the boxplot of SET ratings – the mean average of 12 items used by MacNell et al. [1]. The boxplot shows the three outliers – three students giving their instructors the lowest possible ratings on all or nearly all items. Similarly, Panel B shows the same data but for the mean average of all 15 items. The same three outliers are identified in this boxplot. Panel C shows the near identity relationship between the average of 12 items and the average of 15 items, with the correlation r = .998. This suggests that MacNell et al. would have obtained nearly identical results if they used all SET items rather than select only 12. Panel D shows the stripchart of the 12-item means for each of the four experimental conditions: pF/aF, pM/aM, pM/aF, and pF/aM. The stripchart shows that the three outliers occurred in the two perceived female conditions (i.e, pF/aF and pF/aM) and highlights the extremely small number of students in each of the four conditions, with ns ranging from 8 to 12 students.

            Table 2 shows the mean student ratings for each of the 12 SET items used by MacNell et al. [1]. The top third shows the means, standard deviations, and other statistics for 12 SET items comparing the male instructor with the female instructor and comparing the perceived male and perceived female instructors as reported by MacNell et al. in their Table 2. MacNell et al. did not report actual p-values but only whether any given p-value was < .10 and < .05.

            Table 2, the middle third, shows our re-analysis of MacNell et al.’s [1] data with outliers not removed. Accordingly, the values in the middle third ought to be identical to those reported by MacNell et al. and shown in the top third of the table. The values are indeed identical – we consider differences in the last significant digit as identical – to those in MacNell et al. with two notable exceptions: the values in the r2 column comparing the male instructor with the female instructor match except for the last value in the column, and the values in the r2 column for the perceived male and perceived female instructors do not match except the last value in the column which matches. However, the statistically significant difference between male and female, using p < .05 standard, occurred only for the perceived instructor conditions and only for fair, praise, and prompt SET items, replicating MacNell et al.’s inferential statistics conclusions.

            Table 2, the bottom third, shows the identical analyses with the three outliers removed. As expected, the values change considerably except in the perceived male conditions as these did not include any outliers. First, in the actual gender conditions, the female instructor was rated higher than the male instructor on all 12 items, with the female instructor rated 0.08 to 0.54 points higher than the male instructor. For two items only, these differences were statistically significant at p < .05. Second, in the perceived gender conditions, the female and male instructors were rated comparably, with no difference statistically significant at p < .05 level. Accordingly, these item level analyses showed that when the three outliers were removed, the SET effects favouring males vs. females reported by MacNell et al. [1] were wiped out and some SET effects favouring females vs. males emerged instead.

            Figure 2 shows the mean SET ratings for the 12 items. Panel A shows the SET ratings for the actual male vs. female instructor and for the perceived male vs. female instructor for all data. The Actual Gender bars show the data for the actual male and actual female instructor with data collapsed across True and False Gender conditions. The Perceived Gender bars show the data for the perceived male and the perceived female instructor with the data collapsed across actual gender. This figure highlights that students rated the actual female instructor numerically higher than the actual male instructor. In contrast, when the data were collapsed across Actual Gender conditions, the students rated the perceived male instructor higher than the perceived female instructor. The Panel A directly replicates MacNell et al.’s [1] analyses reported in their Figure 2.

            Figure 2, Panel B, shows SET ratings by the four experimental conditions (i.e., with no collapsing across conditions). This figure highlights that in the True Gender conditions, the male instructor was rated higher than the female instructor. In the False Gender conditions, the students rated the same female instructor who was presented as male higher than the same male instructor who was presented as female. Thus, this data pattern supports MacNell et al.’s [1] claim that it is the perception of the instructor as male vs. female that matters rather than what male vs. female instructors actually did.

            However, when the three outliers are removed, the findings change. Panel C shows the identical analyses to those in Panel A but with the three outliers removed. The Actual Gender condition shows that the female instructor is rated higher than the male instructor whereas the Perceived Gender condition shows that the differences between perceived female and male instructors all but disappeared. Panel D shows the identical analyses to those in Panel B but with the three outliers removed. The data show that female instructor was rated higher than the male instructor in both the True Gender and False Gender conditions.

            CONCLUSIONS

            MacNell et al. [1] claimed that their findings demonstrated that students were actually biased against female vs. male instructors rather than merely being in favor of female gendered behavior. Boring, Ottoboni, and Stark [2] re-analyzed MacNell et al.’s data, confirmed MacNell et al.’s findings, and concluded that students (1) rated instructors on the basis of gender rather than teaching effectiveness, and (2) rated male teachers better than female teachers even though they learned more from female teachers. However, in reality, neither MacNell et al. nor Boring et al. found the gender difference in overall SET in MacNell et al.’s data statistically significant (p = .128 and p = .12, respectively).

            Our re-analyses of MacNell et al.’s [1] small-sized study demonstrates that MacNell et al.’s data do not support either MacNell et al.’s or Boring et al.’s [2] conclusions. When three outliers – three unhappy students – are removed from the data set, the data change drastically and do not support MacNell et al.’s conclusions. If the results of such small sample-sized studies of one female and one male instructor were interpretable and generalizable to all female and male instructors – and we argue that they are not, with or without outliers, and regardless of what they show – MacNell et al.’s data actually suggest that students rate male instructors lower than female instructors regardless of what they are told about their genders.

            Importantly, MacNell et al.’s [1] published data highlight nothing short of the absurd practice of interpreting the mean SET ratings from a small number of students as having anything to do with the instructor. The same identical instructor (actual female) who received 4.31 SET rating in one section (pM/aF) received widely discrepant ratings of 3.73 or 4.49 in the other section (pF/aF) depending on whether or not two outliers – two unhappy students – were retained or excluded from the means, respectively. They highlight that professors ought to focus principally on students’ satisfaction and ought not to do anything to lower it, for example, ought not to call students on academic dishonesty, adhere to academic standards, etc. Moreover, given the Kruger-Dunning effect [4] and SET destroying effect of one or two outliers in small classes, professors must focus on satisfying principally the least able students who would perceive the greatest discrepancy between the grades reflecting their achievement and the grades they believe their work deserves, if their grades were not inflated [5].

            MacNell et al.’s [1] findings and conclusions received widespread news and social media coverage and hundreds of citations. As of March 3, 2020, MacNell et al.’s Altmetric score was 697, indicating that the article was in the 99th percentile – the top 1% of all research tracked by Altmetric. MacNell et al. has been cited 153 times within the Web of Science and 408 times within Google Scholar. We examined all of the 153 Web of Science citations to determine if the citing researchers noted MacNell et al.’s small sample sizes, unreasonable generalizations from one male and one female instructor and/or outliers. No citing article noted outliers. No citing article noted unreasonable generalization. And only one article noted small sample sizes. All citations cited MacNell et al. for evidence of gender bias against female instructors. Similarly, the Boring, Ottoboni and Stark’s [2] re-analysis of MacNell et al. received widespread attention with an Altmetric score of 525 and 243 citations on Google Scholar. We searched Google Scholar for “boring ottoboni stark outlier macnell” using full text search in an attempt to identify any article indexed by Google Scholar noting outlier effects in the MacNell et al. study. Google Scholar returned 18 results and none of them mentioned outliers in the MacNell et al.’s study.

            MacNell et al.’s [1] findings of no statistically significant gender differences in overall SET ratings were recently replicated in similarly fatally flawed study by Khazan, Borden, Johnson, and Greenhaw [6]. Khazan et al. examined SET ratings of a single female TA who taught two sections of the same online course, one section under her true gender (perceived female TA) and one section under false/opposite gender (perceived male TA). Just as MacNell et al. did, Khazan et al. found no gender differences in overall SET ratings of perceived female vs. male TA (p = .73) but claimed that they found gender bias against perceived female TA nevertheless [7]. Moreover, Khazan et al. suffers from nearly identical set of fatal flaws that render their study uninterpretable and conclusions unwarranted including small samples, low statistical power, outliers, confounds, and use of a single female exemplar design [7].

            MacNell et al.’s [1] study is a real-life demonstration that conclusions based on small sample-size studies are unwarranted and uninterpretable. MacNell et al.’s study design, including extremely small samples, and use of only a single woman and a single man to represent female and male professors, is simply insufficient to answer their research question. Combined with small samples, failure to examine the data, and to recognize that the summaries of the data depend critically on three outliers, three unhappy students, was only the last fatal flaw rendering the study 100% uninterpretable, and its conclusions unwarranted. In the meantime, however, the world, or at least hundreds of researchers citing MacNell et al. and Boring, Ottoboni and Stark [2], falsely believes that MacNell et al.’s study demonstrated that students are biased against female professors. It is not true; MacNell et al. did not demonstrate students’ bias against female professors. If anything, their results suggest that students rate female professors higher than male professors, but it would be foolish to make that claim based on the fundamentally flawed small sample design.

            ACKNOWLEDGEMENTS

            We thank Amy Siegenthaler for careful reading and comments on the manuscript.

            Footnotes

            COMPETING INTERESTS Authors declare no conflicting interest.

            REFERENCES

            1. MacNell L, Driscoll A, Hunt A. What’s in a name: exposing gender bias in student ratings of Teaching. Innovative Higher Education. 2015. Vol. 40(4):291–303. [Cross Ref]

            2. Boring A, Ottoboni K, Stark P. Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research. 2016. 1–11. [Cross Ref]

            3. Ioannidis JPA. Why most published research findings are false. PLOS Medicine. 2005. Vol. 2(8):e124. [Cross Ref]

            4. Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology. 1999. Vol. 77(6):1121–1134. [Cross Ref]

            5. Uttl B, White CA, Gonzalez DW. Meta-analysis of faculty’s teaching effectiveness: student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation. 2017. Vol. 54:22–42. [Cross Ref]

            6. Khazan E, Borden J, Johnson S, Greenhaw L. Examining gender bias in student evaluation of teaching for graduate teaching assistants. NACTA Journal. 2020. Vol. 64:430–435

            7. Uttl B, Violo V. Gender bias in student evaluation of teaching or a mirage. ScienceOpen Preprints. 2020. 1–20. [Cross Ref]

            Author and article information

            Journal
            sor
            ScienceOpen Research
            ScienceOpen
            2199-1006
            22 January 2021
            : e20210001
            Affiliations
            [1] 1Psychology Department, Mount Royal University, 4825 Mount Royal Gate SW, Calgary, Alberta, Canada, T3E 6K6
            Author notes
            *Corresponding author’s e-mail address: uttlbob@ 123456gmail.com
            Article
            S2199-1006.1.SOR.2021.0001.v1
            10.14293/S2199-1006.1.SOR.2021.0001.v1
            b1353421-2e05-4a79-9c6f-4ac5a44dcc03
            Copyright © 2021 Uttl B and Violo VC.

            This work has been published open access under Creative Commons Attribution License CC BY 4.0 https://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

            History
            Page count
            Figures: 2, Tables: 2, References: 7, Pages: 7

            outliers,SET,small samples,student evaluation of teaching,generalization

            Comments

            Comment on this article