EE 698U: Optimization for Big Data

    1. Instructor: Ketan Rajawat (ACES 305C)
    2. Prerequisites: Linear Algebra, Probability, Convex Optimization Basics
    3. Objective: This course covers complexity results for first order and related optimization algorithms. Such algorithms are widely applied to numerous problems in machine learning, signal processing, communications, etc. and have witnessed a surge in popularity since the advent of Big Data around 2005. Wherever possible, we will attempt to follow a generic template for the proofs, thereby unifying a large number of results in the literature.
    4. References:
      1. Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press. url:
      2. Amir Beck, First-Order Methods in Optimization, 2017
      3. Yuxin Chen, Notes on Large Scale Optimization for Data Science,  url:
      4. S. Bubeck, S. Bubeck’s blog: I’m a bandit, url:
      5. S. Bubeck, "Convex optimization: Algorithms and complexity." Foundations and TrendsŪ in Machine Learning 8.3-4 (2015): 231-357. url:
    1. TAs: Hrushikesha Pradhan (
    2. Format:
      1. Weekly assignments (approx 8): released with videos weekly, to be submitted within 7 days, 4% per assignment (~30%)
      2. Project (50%): individually or in pairs; details will be announced later; outstanding performance in project is sufficient to earn A grade, regardless of end-sem performance.
      3. No Midsem exam
      4. End-sem exam (20%)
      5. Make your own problems (bonus): max 5% per problem bonus, will start in week 3.
    3. Time and place: WF 10:30am WL226
    4. This course will cover
      1. Introduction: convex functions, black box model
      2. Basics: dual norm, smoothness and strong convexity
      3. Gradient descent for smooth functions
      4. Projected gradient descent
      5. Subgradient descent
      6. Frank-Wolfe or conditional gradient descent
      7. Mirror descent
      8. Proximal gradient descent
      9. Accelerated gradient descent for smooth functions
      10. Dual descent, saddle point method
      11. Augmented Lagrangian method of multipliers
      12. Stochastic gradient descent, mini batch SGD
      13. Variance-reduced SGD
      14. Other topics as time permits
    5. Plagiarism: -20% for each act of plagiarism (student will not be informed)