EE 798C: Machine Learning Theory

  1. Instructor: Ketan Rajawat (ACES 307)
  2. Prerequisites: Probability, Introduction to Machine Learning, Introduction to Optimization. 
  3. Objective: This course provides the theoretical basis for many of the modern machine learning algorithms, and attempts to answer the question of what it is that allows us to draw valid conclusions from empirical data. Mathematically, we will often be concerned with finite sample guarantees on learning or more precisely, on the generalization error of various algorithms. During the course, we will encounter mathematical versions of the following statements:
  4. References:
    1.  (Light Reading) Von Luxburg, Ulrike, and Bernhard Scholkopf. Statistical Learning Theory: Models, Concepts, and Results, Handbook of the History of Logic. Vol. 10. North-Holland, 2011. 651-706. 
    2. (Key Reference) Francis Bach, Learning Theory from First Principles, 2021
    3. (Key Reference) Shalev-Shwartz, Shai, and Shai Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge university press, 2014.
    4. (Key Reference) Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar, Foundations of Machine Learning, MIT press, 2018.
    5. (Heavy Reading) Bruce Hajek, Maxim Reginsky, Statistical Learning Theory, 2021
  1. TAs: TBD.
  2. Format: the usual, including quizzes, midsem, endsem, assignments. Weightages TBD. 
  3. Time and place: MTh 1400:1515, TBD
  4. Tentative Topics
    1. Formal definition of learning, Bayesian framework, No-free-lunch theorem
    2. Concentration inequalities
    3. PAC learning, sample complexity, hypothesis classes
    4. Radmacher complexities, ERM
    5. Local averaging (k-means, partitioning methods, kernel methods)
    6. Sample complexity of kernel methods
    7. Sample complexities of neural networks
    8. Learning sparse predictors
    9. Optimization in machine learning
    10. Others, if time permits
  5. What this course will NOT cover (a) structured prediction problems, ensemble learning, online learning, probabilistic methods (b) implementation aspects of various algorithms (c) design of neural networks for computer vision (d) practical use cases, data sets, (e) details of optimization algorithms (f) fundamentals of probability theory, conditional expectation, etc. This course will not have any programming assignments. 
  6. Attendance: 100% attendance is mandatory. Apply for leave via Pingala or inform by email if missing a class due to any reason. In case of medical emergencies, submit some kind of proof. Missing classes without any justifiable reason will result in an F grade.
  7. Plagiarism: -20% for each act of plagiarism (student will not be informed till after the grading is over)