A. Waters, C. Studer, and R. G. Baraniuk, "Collaboration-Type Identification in Educational Datasets," Journal of Educational Data Mining, Vol. 6, No. 1, 2014.

Abstract:  Identifying collaboration between learners in a course is an important challenge in education for two reasons: First, depending on the courses rules, collaboration can be considered a form of cheating. Second, it helps one to more accurately evaluate each learners competence. While such collaboration identification is already challenging in traditional classroom settings consisting of a small number of learners, the problem is greatly exacerbated in the context of both online courses or massively open online courses (MOOCs) where potentially thousands of learners have little or no contact with the course instructor. In this work, we propose a novel methodology for collaboration-type identification, which both identifies learners who are likely collaborating and also classifies the type of collaboration employed. Under a fully Bayesian setting, we infer the probability of learners succeeding on a series of test items solely based on graded response data. We then use this information to jointly compute the likelihood that two learners were collaborating and what collaboration model (or type) was used. We demonstrate the efficacy of the proposed methods on both synthetic and real-world educational data; for the latter, the proposed methods find strong evidence of collaboration among learners in two non-collaborative take-home exams.

Below we show a collaboration-type identification result using Bayesian model selection for a collection of homework assignments in an undergraduate signal processing class. The data consists of 38 learners answering 50 homework questions plus 14 midterm exam questions.  Grey ellipses designate the assigned homework groups. Dashed green lines denote parasitic collaborations, while solid blue lines denote symbiotic collaborations. Dotted red lines denote the connections found using Wesolowsky’s method, which, in general, finds fewer ground truth connections than the our method.

Rice University-based nonprofit OpenStax, which has already provided free textbooks to hundreds of thousands of college students, has been chosen by the Bill & Melinda Gates Foundation to develop personalized courseware for college students as part of the foundation’s $20 million Next Generation Courseware Challenge.  The initiative was announced last week at the 2014 Educause Conference in Orlando.

“The same technology that retailers and online search firms use to deliver personalized choices can be combined with cognitive science models to deliver personalized learning experiences for college students,” said OpenStax founder Richard Baraniuk, Rice’s Victor E. Cameron Professor of Engineering. “Our free textbooks are already making college more accessible for those who couldn’t otherwise afford it, and personalized learning technology will improve student learning outcomes as well.”

Read more:

In typical applications of machine learning (ML), humans typically enter the process at an early stage, in determining an initial representation of the problem and in preparing the data, and at a late stage, in interpreting and making decisions based on the results. Consequently, the bulk of the ML literature deals with such situations. Much less research has been devoted to ML involving “humans-in-the-loop,” where humans play a more intrinsic role in the process, interacting with the ML system to iterate towards a solution to which both humans and machines have contributed. In these situations, the goal is to optimize some quantity that can be obtained only by evaluating human responses and judgments. Examples of this hybrid, “human-in-the-loop” ML approach include:

  • ML-based education, where a scheduling system acquires information about learners with the goal of selecting and recommending optimal lessons;
  • Adaptive testing in psychological surveys, educational assessments, and recommender systems, where the system acquires testees’ responses and selects the next item in an adaptive and automated manner;
  • Interactive topic modeling, where human interpretations of the topics are used to iteratively refine an estimated model;
  • Image classification, where human judgments can be leveraged to improve the quality and information content of image features or classifiers.

In this workshop in December 2014, we focused on the emerging new theories, algorithms, and applications of human-in-the-loop ML algorithms.

Workshop web page with speakers and their slides

More information about the NIPS conference

Rice University-based publisher OpenStax College today announced $9.5 million in philanthropic grants from the Laura and John Arnold Foundation (LJAF), Rice alumni John and Ann Doerr, and the William and Flora Hewlett Foundation to add 10 titles to its catalog of free, high-quality textbooks for the nation’s most-attended college courses by 2017.  OpenStax College is creating free books for 25 of the most-attended college courses in the country.

OpenStax College uses philanthropic gifts to produce high-quality, peer-reviewed textbooks that are free online and low-cost in print. Its first seven books have already saved students more than $13 million. The books have been downloaded more than 650,000 times and have been adopted for use in nearly 900 courses at community colleges, four-year colleges, universities and high schools.  OpenStax College has four titles in production for next year and plans to expand its library to 21 titles by 2017.  The additional funding will allow the nonprofit publisher to develop textbooks for additional high-enrollment courses, including several science and mathematics courses.

“Our books are opening access to higher education for students who couldn’t otherwise afford it,” said Rice Professor Richard Baraniuk, founder and director of OpenStax College. “We’ve already saved students millions of dollars, and thanks to the generosity of our philanthropic partners, we hope to save students more than $500 million by 2020.”

Read more:

C. A. Metzler, A. Maleki, and R. G. Baraniuk, "From Denoising to Compressed Sensing," July 2014.  arXiv version

Abstract:  A denoising algorithm seeks to remove perturbations or errors from a signal. The last three decades have seen extensive research devoted to this arena, and as a result, today's denoisers are highly optimized algorithms that effectively remove large amounts of additive white Gaussian noise. A compressive sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomized measurements. Typical CS reconstruction algorithms can be cast as iteratively estimating a signal from a perturbed observation. This paper answers a natural question: How can one effectively employ a generic denoiser in a CS reconstruction algorithm? In response, in this paper, we develop a denoising-based approximate message passing (D-AMP) algorithm that is capable of high-performance reconstruction. We demonstrate that, for an appropriate choice of denoiser, D-AMP offers state-of-the-art CS recovery performance for natural images. We explain the exceptional performance of D-AMP by analyzing some of its theoretical features. A critical insight in our approach is the use of an appropriate Onsager correction term in the D-AMP iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.

The figure below illustrates reconstructions of the 256x256 Barbara test image (65536 pixels) from 6554 randomized measurements.  Exploiting the state-of-the-art BM3D denoising algorithm in D-AMP enables state-of-the-art CS recovery.

Rice University-based nonprofit OpenStax, which has already provided free textbooks to hundreds of thousands of college students, today announced a $9 million effort supported by the Laura and John Arnold Foundation to develop free, digital textbooks capable of delivering personalized lessons to high school students.

"Using advanced machine learning algorithms and new models from cognitive science, we can improve educational outcomes in a number of ways,” said project founder Richard Baraniuk. “We can help teachers and administrators by tapping into metrics that they already collect — like which kind of homework and test questions a student tends to get correct or incorrect — as well as things that only the book would notice — like which examples a student clicks on, how long she stays on a particular illustration or which sections she goes back to reread.”

The technology will pinpoint areas where students need more assistance, and it will react by delivering specific content to reinforce concepts in those areas. The personalized books will deliver tailored lessons that allow individual students to learn at their own pace. For fast learners, lessons might be streamlined and compact; for a struggling student, lessons might include supplemental material and additional learning exercises.

Read more:

D. Vats and R. G. Baraniuk, “Swapping Variables for High-Dimensional Sparse Regression with Correlated Measurements,” NIPS 2013, journal preprint 2014.

Abstract: We consider the high-dimensional sparse linear regression problem of accurately estimating a sparse vector using a small number of linear measurements that are contaminated by noise. It is well known that the standard cadre of computationally tractable sparse regression algorithms---such as the Lasso, Orthogonal Matching Pursuit (OMP), and their extensions---perform poorly when the measurement matrix contains highly correlated columns. To address this shortcoming, we develop a simple greedy algorithm, called SWAP, which iteratively swaps variables until convergence. SWAP is surprisingly effective in handling measurement matrices with high correlations. In fact, we prove that SWAP outputs the true support, the locations of the non-zero entries in the sparse vector, under a relatively mild condition on the measurement matrix. Furthermore, we show that SWAP can be used to boost the performance of any sparse regression algorithm. We empirically demonstrate the advantages of SWAP by comparing it with several state-of-the-art sparse regression algorithms.


The above example illustrates the advantages of using SWAP for regression with correlated measurements (see Figure 3 in http://dsp.rice.edu/publications/swap-journal).  The x-axis corresponds to the amount of correlations in the measurement matrix and the y-axis corresponds to the mean true positive rate (TPR), i.e., the fraction of the true support.  The dashed lines correspond to traditional algorithms while the solid lines correspond to SWAP based algorithms.  We clearly see that SWAP is able to boost the performance of traditional algorithms.  In particular, as the correlations become large, SWAP is able to infer a larger fraction of the variables in the true support.

Software: http://dsp.rice.edu/software/swap