A. S. Lan, A. E. Waters, C. Studer, R. G. Baraniuk, "Sparse Factor Analysis for Learning and Content Analytics," to appear in Journal of Machine Learning Research, 2014
Abstract: We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner’s knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question’s intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when the only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest.
The above example illustrates the result of applying SPARFA to data from a grade 8 science course in STEMscopes, an online science curriculum program. The data input to SPARFA consisted solely of whether a student answered a given potential homework or exam question correctly or incorrectly. From these limited and quantized data, SPARFA automatically estimates (a) a collection (in this case five) of abstract “concepts” that underlie the course (“Concept 3” is illustrated here); (b) a graph that links each question (rectangular box) to one or more of the concepts (circles), with thicker links indicating a stronger association with the concept; (c) the intrinsic difficulty of each question, indicated by the number in each box; (d) descriptive word tags drawn from the text of the questions, their solutions, and instructor-provided metadata that make each concept interpretable (as shown for Concept 3); and (e) each student’s knowledge profile, which indicates both estimated knowledge of each concept and concepts ripe for remediation or enrichment.
Some follow-on papers that extend the SPARFA framework.
Get your SPARFA merchandise while it's hot!