EE 698U: Optimization for Big Data
- Instructor: Ketan Rajawat (ACES 305C)
- Prerequisites: Linear Algebra, Probability,
Convex Optimization Basics
- Objective: This
course covers complexity results for first order and related optimization
algorithms. Such algorithms are widely applied to numerous problems in
machine learning, signal processing, communications, etc. and have
witnessed a surge in popularity since the advent of Big Data around 2005.
Wherever possible, we will attempt to follow a generic template for the
proofs, thereby unifying a large number of results in the literature.
- References:
- Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge
University Press. url: http://www.stanford.edu/~boyd/cvxbook/
- Amir Beck, First-Order
Methods in Optimization, 2017
- Yuxin Chen, Notes
on Large Scale Optimization for Data Science, url: http://www.princeton.edu/~yc5/ele522_optimization/lectures.html
- S. Bubeck, S.
Bubeck’s blog: I’m a bandit, url: https://blogs.princeton.edu/imabandit/
- S. Bubeck, "Convex
optimization: Algorithms and complexity." Foundations and TrendsŪ in
Machine Learning 8.3-4 (2015): 231-357. url: http://sbubeck.com/Bubeck15.pdf
- TAs: Hrushikesha Pradhan (hpradhan@iitk.ac.in)
- Format:
- Weekly assignments (approx 8): released with videos weekly, to be submitted within 7 days, 4% per assignment (~30%)
- Project (50%): individually or in pairs; details will be announced later; outstanding performance in project
is sufficient to earn A grade, regardless of end-sem performance.
- No Midsem exam
- End-sem exam (20%)
- Make your own problems (bonus): max 5% per problem bonus, will start in week 3.
- Time and place: WF 10:30am WL226
- This course will cover
- Introduction: convex functions, black box model
- Basics: dual norm, smoothness
and strong convexity
- Gradient descent for smooth
functions
- Projected gradient descent
- Subgradient descent
- Frank-Wolfe or conditional
gradient descent
- Mirror descent
- Proximal gradient descent
- Accelerated gradient descent
for smooth functions
- Dual descent, saddle point
method
- Augmented Lagrangian method of
multipliers
- Stochastic gradient descent,
mini batch SGD
- Variance-reduced SGD
- Other topics as time permits
- Plagiarism: -20% for each act of
plagiarism (student will not be informed)