EE 798C Fall 2022

EE 798C: Machine Learning Theory

Instructor: Ketan Rajawat (ACES 307)
Prerequisites: Probability, Introduction to Machine Learning, Introduction to Optimization.
Objective: This course provides the theoretical basis for many of the modern machine learning algorithms, and attempts to answer the question of what it is that allows us to draw valid conclusions from empirical data. Mathematically, we will often be concerned with finite sample guarantees on learning or more precisely, on the generalization error of various algorithms. During the course, we will encounter mathematical versions of the following statements:

Random variables with bounded variance do not deviate much from their mean with high probability
The no-free-lunch theorem says that learning is impossible unless we make assumptions on the underlying distribution,
Prior knowledge about a task can be incorporated by restricting our search to a suitable hypothesis class of predictor functions,
Some hypothesis classes are PAC learnable, i.e., it is possible to learn a predictor function from these classes that are probably approximately correct
The Bayes predictor is optimal
The complexity of a hypothesis class can be traded-off against the bias it introduces
Regularization plays an important role in avoiding overfitting in expected risk minimization (ERM) framework
Local averaging methods such as k-nearest neighbor and kernel methods can learn all continuous non-linear functions efficiently
Curse of dimensionality can make overfitting unavoidable when the goal is to learn arbitrary complicated non-linear functions
Kernel methods and neural networks can learn smooth functions much more efficiently
Neural networks can be even more efficient when the target function depends on a low-dimensional linear projection
If the optimal predictor is known to be sparse, learning can be much much faster
Gradient descent methods exhibit an implicit bias resulting in the double descent curve

(Light Reading) Von Luxburg, Ulrike, and Bernhard Scholkopf. Statistical Learning Theory: Models, Concepts, and Results, Handbook of the History of Logic. Vol. 10. North-Holland, 2011. 651-706.
(Key Reference) Francis Bach, Learning Theory from First Principles, 2021
(Key Reference) Shalev-Shwartz, Shai, and Shai Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge university press, 2014.
(Key Reference) Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar, Foundations of Machine Learning, MIT press, 2018.
(Heavy Reading) Bruce Hajek, Maxim Reginsky, Statistical Learning Theory, 2021

TAs: TBD.
Format: the usual, including quizzes, midsem, endsem, assignments. Weightages TBD.
Time and place: MTh 1400:1515, TBD
Tentative Topics

What this course will NOT cover (a) structured prediction problems, ensemble learning, online learning, probabilistic methods (b) implementation aspects of various algorithms (c) design of neural networks for computer vision (d) practical use cases, data sets, (e) details of optimization algorithms (f) fundamentals of probability theory, conditional expectation, etc. This course will not have any programming assignments.
Attendance: 100% attendance is mandatory. Apply for leave via Pingala or inform by email if missing a class due to any reason. In case of medical emergencies, submit some kind of proof. Missing classes without any justifiable reason will result in an F grade.
Plagiarism: -20% for each act of plagiarism (student will not be informed till after the grading is over)