All Your Rankings Are Belong to Us

A. Waters, D. Tinapple, and R. G. Baraniuk, "BayesRank: A Bayesian Approach to Ranked Peer Grading," ACM Conference on Learning at Scale, Vancouver, March 2015.

Abstract: Advances in online and computer supported education afford exciting opportunities to revolutionize the classroom, while also presenting a number of new challenges not faced in traditional educational settings. Foremost among these challenges is the problem of accurately and efficiently evaluating learner work as the class size grows, which is directly related to the larger goal of providing quality, timely, and actionable formative feedback. Recently there has been a surge in interest in using peer grading methods coupled with machine learning to accurately and fairly evaluate learner work while alleviating the instructor bottleneck and grading overload. Prior work in peer grading almost exclusively focuses on numerically scored grades – either real-valued or ordinal. In this work, we consider the implications of peer ranking in which learners rank a small subset of peer work from strongest to weakest, and propose new types of computational analyses that can be applied to this ranking data. We adopt a Bayesian approach to the ranked peer grading problem and develop a novel model and method for utilizing ranked peer-grading data. We additionally develop a novel procedure for adaptively identifying which work should be ranked by particular peers in order to dynamically resolve ambiguity in the data and rapidly resolve a clearer picture of learner performance. We showcase our results on both synthetic and several real-world educational datasets.

The figures below compares BayesRank to the known ground truth item ordering in a synthetic experiment using Kendall’s tau metric, which measures the general agreement between two ordered sets.  Kendall’s tau metric looks at each pair in one ranking and compares the same items in the second ranking to check for consistency.  D_tau = +1/-1 corresponds to perfect agreement/disagreement.  The two curves correspond to BayesRank with observations generated using adaptive assignment (red) and random item assignment (blue); we plot the tau metric as a function of the size of the class N (left) and as a function of the number of items K assigned to each grader.  In all cases, BayesRank random assignment achieves significantly better performance than traditional random assignment.