A. Waters, C. Studer, and R. G. Baraniuk, "Collaboration-Type Identification in Educational Datasets," *Journal of Educational Data Mining*, Vol. 6, No. 1, 2014.

Abstract: Identifying collaboration between learners in a course is an important challenge in education for two reasons: First, depending on the courses rules, collaboration can be considered a form of cheating. Second, it helps one to more accurately evaluate each learners competence. While such collaboration identification is already challenging in traditional classroom settings consisting of a small number of learners, the problem is greatly exacerbated in the context of both online courses or massively open online courses (MOOCs) where potentially thousands of learners have little or no contact with the course instructor. In this work, we propose a novel methodology for collaboration-type identification, which both identifies learners who are likely collaborating and also classifies the type of collaboration employed. Under a fully Bayesian setting, we infer the probability of learners succeeding on a series of test items solely based on graded response data. We then use this information to jointly compute the likelihood that two learners were collaborating and what collaboration model (or type) was used. We demonstrate the efficacy of the proposed methods on both synthetic and real-world educational data; for the latter, the proposed methods find strong evidence of collaboration among learners in two non-collaborative take-home exams.

Below we show a collaboration-type identification result using Bayesian model selection for a collection of homework assignments in an undergraduate signal processing class. The data consists of 38 learners answering 50 homework questions plus 14 midterm exam questions. Grey ellipses designate the assigned homework groups. Dashed green lines denote parasitic collaborations, while solid blue lines denote symbiotic collaborations. Dotted red lines denote the connections found using Wesolowsky’s method, which, in general, finds fewer ground truth connections than the our method.