SPARFA – Sparse Factor Analysis for Learning and Content Analytics

A. S. Lan, A. E. Waters, C. Studer, R. G. Baraniuk, "Sparse Factor Analysis for Learning and Content Analytics," to appear in Journal of Machine Learning Research, 2014

Abstract:  We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner’s knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question’s intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when the only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest.

The above example illustrates the result of applying SPARFA to data from a grade 8 science course in STEMscopes, an online science curriculum program. The data input to SPARFA consisted solely of whether a student answered a given potential homework or exam question correctly or incorrectly. From these limited and quantized data, SPARFA automatically estimates (a) a collection (in this case five) of abstract “concepts” that underlie the course (“Concept 3” is illustrated here); (b) a graph that links each question (rectangular box) to one or more of the concepts (circles), with thicker links indicating a stronger association with the concept; (c) the intrinsic difficulty of each question,  indicated by the number in each box; (d) descriptive word tags drawn from the text of the questions, their solutions, and instructor-provided metadata that make each concept interpretable (as shown for Concept 3); and (e) each student’s knowledge profile, which indicates both estimated knowledge of each concept and concepts ripe for remediation or enrichment.

Some follow-on papers that extend the SPARFA framework.
Get your SPARFA merchandise while it's hot!