קולוקוויום בביה"ס למדעי המחשב - Exponentially vanishing sub-optimal local minima in multilayer neural networks

Daniel Soudry

26 במרץ 2017, 11:00

בניין שרייבר, חדר 006

Background:

Multilayer neural networks, trained with simple variants of stochastic gradient descent (SGD), have achieved state-of-the-art performances in many areas of machine learning. It has long been a mystery why does SGD work so well – rather than converging to sub-optimal local minima with high training error (and therefore, high test error).

Results:

We examine a neural network with a single hidden layer, quadratic loss, and piecewise linear units, trained in a binary classification task on a standard normal input. We prove that the volume of differentiable regions of the empiric loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given "mild" (polylogarithmicly) over-parameterization. This suggests why SGD tends to converge to global minima in such networks.