Last week, I explained why typical distributed gradient descent leads to inexact solution, even though the convergence rate is linear (in log scale). There is a very important paper that shows that we can in fact have exact solution by incorporating the previous gradient.
This algorithm is called EXACT.
\(x^{k+2} = (I+W) x^{k+1} - \tilde{W} x^k - \alpha (\nabla f(x^{k+1}) - \nabla f(x^k) )\)
There is a following up paper that deal with the stochastic gradient. At the first sight, that paper focuses on uniformly picking a data point, rather than a general characterization of the stochastic gradient, which should be further investigated.