Gradient-descent method is perhaps the oldest generic iterative method for optimization. It finds application is both unconstratined and constrained optimization problems. However, the speed of convergence of gradient-descent is largely limited by the condition number of the optimization problem being solved. Specifically, the method requires a large number of iterations to converge to a solution if the problem is ill-conditioned. Moreover, implementation of the gradient-descent method on digital machines that process or register values with finite precision is unstable when the optimization problem is ill-conditioned.
We propose and study a novel pre-conditioning approach that provably attenuates the detrimental impact of the condition number of the optimziation problem on the convergence speed of the gradient-descent method. The proposed pre-conditioning approach is iterative, and can be easily implemented on distributed system architectures. Moreover, in certain cases the resultant iterative method (i.e. the gradient-descent method combined with our iterative pre-conditioning approach) attains superlinear rate of convergence, and therefore, requires exponentially less number of iterations for guaranteeing comparable proximity to the desired optimal solution.
Besides the improved convergence speed, our proposed pre-conditioning approach is also provably more robust to system noise. Detailed comparison of our method with other momentum-based accelerated variants of the gradient-descent method can be found in the publications, and technical reports listed below.