EE 698U: Optimization for Big Data

    1. Instructor: Ketan Rajawat (ACES 305C)
    2. Prerequisites: Linear Algebra, Probability, Convex Optimization Basics
    3. Objective: This course covers complexity results for first order and related optimization algorithms. Such algorithms are widely applied to numerous problems in machine learning, signal processing, communications, etc. and have witnessed a surge in popularity since the advent of Big Data around 2005. Wherever possible, we will attempt to follow a generic template for the proofs, thereby unifying a large number of results in the literature.
    4. References:
      1. Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press. url: http://www.stanford.edu/~boyd/cvxbook/
      2. Amir Beck, First-Order Methods in Optimization, 2017
      3. Yuxin Chen, Notes on Large Scale Optimization for Data Science,  url: http://www.princeton.edu/~yc5/ele522_optimization/lectures.html
      4. S. Bubeck, S. Bubeck’s blog: I’m a bandit, url: https://blogs.princeton.edu/imabandit/
      5. S. Bubeck, "Convex optimization: Algorithms and complexity." Foundations and TrendsŪ in Machine Learning 8.3-4 (2015): 231-357. url: http://sbubeck.com/Bubeck15.pdf
    1. TAs: N Mohan Krishna (nmohank@iitk.ac.in) and Srujan Teja T. (srujant@iitk.ac.in)
    2. Format:
      1. Project (60%): groups of 2 only; details will be announced later; outstanding performance in project is sufficient to earn A grade, regardless of end-sem performance.
      2. End-sem exam (20%)
      3. Assignments (20%): valid attempts (correct or incorrect) will receive full credit!
    3. Time and place: WF 10:30am WL226
    4. This course will cover
      1. Introduction: convex functions, black box model
      2. Basics: dual norm, smoothness and strong convexity
      3. Gradient descent for smooth functions
      4. Projected gradient descent
      5. Subgradient descent
      6. Frank-Wolfe or conditional gradient descent
      7. Mirror descent
      8. Proximal gradient descent
      9. Accelerated gradient descent for smooth functions
      10. Dual descent, saddle point method
      11. Augmented Lagrangian method of multipliers
      12. Stochastic gradient descent, mini batch SGD
      13. Variance-reduced SGD
      14. Other topics as time permits
    5. Attendance: 100% attendance is mandatory. Apply for leave via Pingala or inform by email if missing a class due to any reason. In case of medical emergencies, submit some kind of proof. Missing classes without any justifiable reason will result in an F grade.
    6. Plagiarism: -20% for each act of plagiarism (student will not be informed)