EE 798C: Machine Learning
Theory
- Instructor: Ketan
Rajawat (ACES 307)
- Prerequisites: Probability, Introduction to Machine Learning, Introduction to Optimization.
- Objective: This course
provides the theoretical basis for many of the modern machine learning
algorithms, and attempts to answer the question of what it is that
allows us to draw valid conclusions from empirical data.
Mathematically, we will often be concerned with finite sample
guarantees on learning or more precisely, on the generalization error
of various algorithms. During the course, we will
encounter mathematical versions of the following statements:
-
Random
variables with bounded variance do not deviate much
from their mean with high probability
- The
no-free-lunch theorem says that learning is impossible
unless we make assumptions on the underlying distribution,
- Prior
knowledge about a task can be incorporated by
restricting our search to a suitable hypothesis class of predictor
functions,
- Some
hypothesis classes are PAC learnable, i.e., it is
possible to learn a predictor function from these classes that are
probably
approximately correct
- The Bayes
predictor is optimal
- The complexity
of a hypothesis class can be traded-off
against the bias it introduces
- Regularization
plays an important role in avoiding overfitting
in expected risk minimization (ERM) framework
- Local
averaging methods such as k-nearest neighbor and
kernel methods can learn all continuous non-linear functions efficiently
- Curse of
dimensionality can make overfitting unavoidable when
the goal is to learn arbitrary complicated non-linear functions
- Kernel methods
and neural networks can learn smooth
functions much more efficiently
- Neural
networks can be even more efficient when the target
function depends on a low-dimensional linear projection
- If the optimal
predictor is known to be sparse, learning can
be much much faster
- Gradient
descent methods exhibit an implicit bias resulting
in the double descent curve
- References:
- (Light Reading) Von
Luxburg, Ulrike, and Bernhard Scholkopf. Statistical
Learning Theory: Models, Concepts, and Results, Handbook of
the History of Logic. Vol. 10. North-Holland, 2011. 651-706.
- (Key Reference)
Francis Bach, Learning
Theory from First Principles, 2021
- (Key Reference)
Shalev-Shwartz, Shai, and Shai Ben-David, Understanding Machine Learning:
From Theory to Algorithms, Cambridge university
press, 2014.
- (Key Reference) Mohri,
Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar, Foundations
of Machine Learning, MIT press, 2018.
- (Heavy Reading) Bruce
Hajek, Maxim Reginsky, Statistical Learning Theory,
2021
- TAs: TBD.
- Format: the usual, including quizzes, midsem, endsem, assignments. Weightages TBD.
- Time and
place: MTh 1400:1515,
TBD
- Tentative
Topics
- Formal
definition of learning, Bayesian framework, No-free-lunch theorem
- Concentration
inequalities
- PAC
learning, sample complexity, hypothesis classes
- Radmacher
complexities, ERM
- Local
averaging (k-means, partitioning methods, kernel methods)
- Sample
complexity of kernel methods
- Sample
complexities of neural networks
- Learning
sparse predictors
- Optimization
in machine learning
- Others,
if time permits
- What this course will NOT cover
(a) structured prediction problems, ensemble learning, online learning,
probabilistic methods (b) implementation aspects of various algorithms
(c) design of neural networks for computer vision (d) practical use
cases, data sets, (e) details of optimization algorithms (f)
fundamentals of probability theory, conditional expectation, etc. This
course will not have any programming assignments.
- Attendance: 100%
attendance is mandatory. Apply for leave via Pingala or inform by email
if missing a class due to any reason. In case of medical emergencies,
submit some kind of proof. Missing classes without any justifiable
reason will result in an F grade.
- Plagiarism:
-20% for each act of plagiarism (student will not be informed till
after the grading is over)