Content originally published on Learn more about Interfolio’s acquisition of Data180 here.

Although student evaluations have become a nearly universal component in measuring faculty performance, a new paper suggests that student evaluations are statistically unreliable, and that many institutions are relying too heavily on them in the faculty review process.

The paper, “An Evaluation of Course Evaluations,” was written by Philip Stark, professor of statistics at the University of California, Berkeley, and Richard Freishtat, senior consultant at Berkeley’s Center for Teaching and Learning. Stark and Freishtat use literature from the fields of statistics and education to argue that student evaluations are often unfairly biased and do not provide objective data that can be reliably averaged into a single numerical score.

Stark and Freishtat say that response rates to student evaluations vary significantly, as do the marks that a single professor receives from his or her various students, which suggests that students bring their own biases into the evaluation process. Because of this, they say, averaging the scores from these evaluations produces metrics that are statistically suspect and potentially misleading, particularly when comparing one faculty member’s score with another’s.

“Averages of numerical student ratings have an air of objectivity,” they argue in the paper, “simply because they are numerical.”  They go on to say that many institutions are relying too heavily on these numerical averages, which fail to take these limitations into account.  “For teaching evaluations, there is no reason any of those things should be true,” they write. “Such averages and comparisons make no sense, as a matter of statistics.”

The authors suggest that a better use of student evaluations would be to include their response rates, along with the range of students’ scores, in faculty performance metrics, thus creating a more complete picture of how different students react to each particular faculty member.

Finally, the authors argue, fair and reliable faculty performance measurement should involve much more than student scores, including elements such as teaching portfolios and peer evaluations based on classroom visits.

A PDF of the draft paper is available here:

A Chronicle of Higher Education article on the subject, including an interview with Stark, is available here:

Content originally published on Learn more about Interfolio’s acquisition of Data180 here.