Exploring an interesting optimization technique that skips projection steps to potentially improve convergence in constrained convex optimization.
During my time at IIT Madras, I had the incredible opportunity to attend a guest lecture by Prof. Sridhar Mahadevan, a distinguished researcher from University of Massachusetts Amherst. The lecture focused on acceleration in optimization methods, and one particular finding caught my attention that sparked this exploration.
Professor Mahadevan mentioned something curious about Runge-Kutta methods - that sometimes, when you look further ahead (higher order methods), you can actually get better convergence in fewer steps. This seemed almost counterintuitive at first glance, and I couldn't help but think there must be some trade-off involved.
This observation inspired me to investigate whether a similar phenomenon might exist in constrained convex optimization, specifically with projected gradient descent methods.
Consider the standard constrained optimization problem:
where is convex and is a convex constraint set.
The traditional projected gradient descent algorithm proceeds as:
where is the projection operator onto the constraint set , and is the step size.
What if we could skip some projection steps and still maintain convergence? I explored this idea by developing k-skip projected gradient descent variants.
Instead of projecting at every step, we take two gradient steps and then project:
Extending this further, we can skip even more projection steps:
The key insight is that we're taking multiple gradient steps in the unconstrained space before projecting back to the feasible region.
I implemented these methods and tested them on constrained quadratic optimization problems. The results were quite interesting:
This zoomed view reveals the subtle but important differences in where each method ultimately converges.
The analysis revealed two interesting extreme cases:
The trade-off becomes clear: while skipping projections can improve initial convergence speed, it may lead to convergence to suboptimal points.
To address the convergence point issue, I experimented with decaying step sizes. The idea is to use larger step sizes initially (to benefit from the acceleration) and then decay them to ensure convergence to the correct optimal point.
This approach showed promise in aligning the convergence points while maintaining some of the acceleration benefits.
The main practical benefit of k-skip projected gradient descent could be in scenarios where:
This preliminary exploration raises several interesting questions:
The investigation into k-skip projected gradient descent revealed an interesting trade-off between convergence speed and solution accuracy. While the initial results showed promise for faster convergence, the challenge of ensuring convergence to the correct optimal point remains.
This work demonstrates how ideas from one area of optimization (Runge-Kutta methods) can inspire novel approaches in another (constrained optimization). Sometimes the most interesting research comes from asking "what if we tried something slightly different?"
The complete analysis and code for these experiments can be found in my research notes, and I encourage others to explore this direction further.
This post is based on explorations conducted during my time at IIT Madras, inspired by Prof. Sridhar Mahadevan's insights on acceleration in optimization methods.