Rice DSP group faculty **Richard Baraniuk** will be leading a team of engineers, computer scientists, mathematicians, and statisticians on a five-year ONR MURI project to develop a **principled theory of deep learning** based on rigorous mathematical principles. The team includes:

**Richard Baraniuk**, Rice University (project director)**Moshe Vardi**, Rice University**Ronald DeVore**, Texas A&M University**Stanley Osher**, UCLA**Thomas Goldstein**, University of Maryland**Rama Chellappa**, University of Maryland**Ryan Tibshirani**, Carnegie Mellon University**Robert Nowak**, University of Wisconsin

International collaborators include the **Alan Turing** and **Isaac Newton** Institutes in the UK.

- D. LeJeune, H. Javadi, R. G. Baraniuk, "
**The Implicit Regularization of Ordinary Least Squares Ensembles**," AISTATS, 2020 - D. LeJeune, G. Dasarathy, R. G. Baraniuk, "
**Thresholding Graph Bandits with GrAPL**," AISTATS, 2020

** Ensemble methods** that average over a collection of independent predictors that are each limited to a subsampling of both the examples and features of the training data command a significant presence in machine learning, such as the ever-popular random forest, yet the

nature of the **subsampling effect**, particularly of the features, is not well understood. We study the case of an ensemble of linear predictors, where each individual predictor is fit using ordinary least squares on a random submatrix of the data matrix. We show that, under standard Gaussianity assumptions, when the number of features selected for each predictor is optimally tuned, **the asymptotic risk of ****a large ensemble is equal to the asymptotic ridge regression ****risk**, which is known to be optimal among linear predictors in this setting. In addition to eliciting this **implicit regularization **that results from subsampling, we also connect this ensemble to the **dropout** technique used in training deep (neural) networks, another strategy that has been shown to have a ridge-like regularizing effect.

Above: Example (rows) and feature (columns) subsampling of the training data **X** used in the ordinary least squares fit for one member of the ensemble. The *i*-th member of the ensemble is only allowed to predict using its subset of the features (green). It must learn its parameters by performing ordinary least squares using the subsampled examples of (red) and the subsampled examples (rows) and features (columns) of **X** (blue, crosshatched).

OpenStax provides textbooks for 36 college and Advanced Placement courses. Students can access the materials for free digitally (via browser, downloadable PDF or recently introduced OpenStax + SE mobile app), or pay for a low-cost print version. Overall, students are saving more than $200 million on their textbooks in 2019, and have saved a total of $830 million since OpenStax launched in 2012.

Future plans for the publisher include the rollout of Rover by OpenStax, an online math homework tool designed to give students step-by-step feedback on their work. OpenStax also plans to continue its research initiatives on digital learning, using cognitive science-based approaches and the power of machine learning to improve how students learn.

]]>

Writes Chris Taylor from Reuters in Moneysaving 101: Four Ways to Cut College Textbook Costs, "While sky-high U.S. college tuition might be the headline number, here is a sneaky little figure that might surprise you: the cost of textbooks." See what OpenStax is doing about the crisis here.

]]>An article in the 28 July 2019 *Wall Street Journal*, "**A Key Reason the Fed Struggles to Hit 2% Inflation: Uncooperative Prices**" discusses the disruptive impact on the college textbook market of the free and open-source textbooks provided by OpenStax . Read online at Morningstar.com.

Frontiers of Deep Learning Workshop, Simons Institute

16 July 2019

References:

- “A Spline Theory of Deep Networks,” ICML 2018
- “Mad Max: Affine Spline Insights into Deep Learning,” arxiv.org/abs/1805.06576, 2018
- “From Hard to Soft: Understanding Deep Network Nonlinearities…,” ICLR 2019
- “A Max-Affine Spline Perspective of RNNs,” ICLR 2019
- “A Hessian Based Complexity Measure for Deep Networks,” arxiv.org/abs/1905.11639, 2019

Co-authors: Randall Balestriero, Jack Wang, Hamid Javadi

An alternative presentation at the Alan Turing Institute, May 2019 (Get your SPARFA merchandise here!)

]]>- R. Balestriero and R. G. Baraniuk, “
**Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference**” - J. Wang, R. Balestriero, and R. G. Baraniuk, “
**A Max-Affine Perspective of Recurrent Neural Networks**” - A. Mousavi, G. Dasarathy, and R. G. Baraniuk, “
**A Data-Driven and Distributed Approach to Sparse Signal Representation and Recovery**” - J. J. Michalenko, A. Shah, A. Verma, R. G. Baraniuk, S. Chaudhuri, and A. B. Patel, “
**Representing Formal Languages: A Comparison between Finite Automata and Recurrent Neural Networks**”