## Statistical Learning Theory

Prerequisite: Background in applied mathematics, probability, and statistics

**Instructor**:

Robert Nowak

E-mail: nowak@ece.wisc.edu

Web: http://nowakresearchgroup.discovery.wisc.edu

3539 Engineering Hall

Office Hours: T-R 12:15-1pm

**Meeting times**:

Tuesday and Thursday 11am-12:15pm, room 2305 Engineering Hall

**Course Format**:

This course will serve as a primer for statistical learning theory as well as a platform

for exploring emerging theory and methods in large-scale data analytics. The course is

intended for PhD students in ECE, CS, Statistics and Mathematics who already have a strong

background in signal processing, machine learning, or mathematical statistics. The first

few weeks of the course will give an introduction to statistical learning theory (somewhat

following the lecture notes below). The rest of the course will involve reading and

discussion of a selection of papers. Each student will be responsible for reading all

papers and presenting a paper in the class.

**Grading**: Grades will be based on class participation, quizzes, homework assignments,

presentations and/or reports. Each student will be responsible for presenting one of the papers

that the class will study. Every student is required to carefully read the lecture notes and/or

papers prior to class lectures.

**Lectures**:

Topic 1: Regression

Click to access ece830_fall11_lecture24.pdf

Click to access ece830_fall11_lecture26.pdf

Click to access ece830_fall11_lecture21.pdf

Topic 2: Concentration of Measure and Empirical Risk Minimization

Click to access ece901_concentration.pdf

Click to access ece830_fall11_lecture25.pdf

Topic 3: Vapnik-Chervonenkis Theory and Binary Classification

Topic 4: Structural Risk Minimization

Click to access LuZe96-IT-Concept.pdf

Topic 5: Minimax Lower Bounds

Click to access sparse_recovery.pdf

Topic 6: Bernstein’s Inequality and the Johnson-Lindenstrauss Lemma

section 2.8 and 2.9 in Concentration Inequalities: A Nonasymptotic Theory of Independence

Topic: Convex Losses and Rademacher Complexity

Click to access BartlettM02.pdf

Topic: Logistic Regression and Lasso

Nikhll Rao, March 6 http://arxiv.org/abs/1202.1212

Nikhll Rao, March 11 http://arxiv.org/pdf/1402.4512.pdf

March 13 TBA: Ravi Sastry

Topic: Compressed Sensing

Jing F and Shike M, March 25 http://www.eecs.berkeley.edu/~brecht/papers/11.candes.recht.pdf

Click to access ece901_compressed_sensing.pdf

Topic: Sparsity and Learning Graphs

Sumeet K and Atul D, March 27 http://stat.ethz.ch/~nicolai/consistent.pdf

Topic: Dictionary Learning

Qinyuan Sun and Shulei Wang, April 1 http://people.csail.mit.edu/moitra/docs/bookex.pdf, chapter 5

Topic: The SVD and Matrix Completion

Urvashi O and Vamsi K, April 3 chapter 4 of http://www.cs.cornell.edu/jeh/NOSOLUTIONS90413.pdf

Han Li and Karthik A, April 8 http://pages.cs.wisc.edu/~brecht/papers/09.Recht.ImprovedMC.pdf

Topic: Clustering

Xin Zhang and Albert Oh, April 10 http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

Daniel P and Rao Fu, April 15 http://www.cs.sunysb.edu/~jgao/paper/clustering_lines.pdf

Topic: Stochastic Gradient Methods and Large-scale Optimization

Xiaomao L and Ting-Ting, April 17 http://www.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Nemirovski.pdf

Willem M and Urvashi O, April 22 http://nowak.ece.wisc.edu/ece830/ece830_spring13_adaptive_filtering.pdf

Topic: Sketching

Wentao Wu and Luwan Z, April 24 chapter 7 of http://www.cs.cornell.edu/jeh/NOSOLUTIONS90413.pdf

Song W and Won Kim, April 29 http://www.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Strohmer.pdf

Topic: Multi-armed Bandits and Active Learning

Sumeet K and Atul D, May 1 http://arxiv.org/pdf/1312.7308.pdf

Wentao W and Karthik R, May 6 http://nowak.ece.wisc.edu/IT_minimax.pdf

Shike M and Ting-Ting, May 8 http://www.cc.gatech.edu/%7Eninamf/papers/optimal-lin-sep.pdf

Lecture Notes from Previous Course Offerings

**Lecture 1 A Probabilistic Approach to Pattern Recognition**

**Lecture 2 Introduction to Classification and Regression**

**Lecture 3 Introduction to Complexity Regularization**

**Lecture 4 Denoising in Smooth Function Spaces**

**Lecture 5 Plug-in Rules and Histogram Classifiers**

**Lecture 6 Probably Approximately Correct (PAC) Learning**

**Lecture 7 Chernoff’s Bound and Hoeffding’s Inequality**

**Lecture 8 Classification Error Bounds**

**Lecture 9 Error Bounds in Countably Infinite Models Spaces**

**Lecture 10 Complexity Regularization**

**Lecture 11 Decision Trees**

**Lecture 12 Complexity Regularization for Squared Error Loss**

**Lecture 13 Maximum Likelihood Estimation**

**Lecture 14 Maximum Likelihood and Complexity Regularization**

**Lecture 15 Denoising II: Adapting to Unknown Smoothness**

**Lecture 16 Wavelet Approximation Theory**

**Lecture 17 Denoising III: Spatial Adaptivity**

**Lecture 18 Introduction to VC Theory**

**Lecture 19 The VC Inequality**

**Lecture 20 Applications of VC Theory**

**Lecture 21 Minimax Lower Bounds**

**Textbooks and References**:

A textbook will not be followed in this course. A collection of

notes, relevant papers and materials will be prepared and distributed.

Textbooks recommended for further reading are listed below.

A probabilistic theory of pattern recognition, Devroye, Gyorfi, Lugosi, Springer

Nonparameteric Estimation Theory, Iain Johnstone, unpublished monograph

The Elements of Statistical Learning, Hastie, et al, Springer

An introduction to support vector machines, Cristianini and Shawe-Taylor, Cambridge Press

Combinatorial methods in density estimation, Devroye and Lugosi, Springer

Statistical Learning Theory, Vapnik, Wiley

An Introduction to Computational Learning Theory, Kearns and Vazirani, MIT Press

Empirical Processes in M-Estimation, van de Geer, Cambridge Press

**Grading and Evaluation**:

Grades will be based on course participation, quizzes, and lecture presentations.