Keyon Vafa
October 27, 2016
Two papers this week proved convergence results for optimizing non-convex loss functions using stochastic gradients.
In The Landscape of Empirical Risk for Non-convex Losses by Mei, Song, Yu Bai, and Andrea Montanari, 2016 [1], the authors
show that while empirical risk for squared loss is non-convex for linear classifiers, there are
numerous desirable qualities once we reach a certain sample size, namely exponentially fast
convergence to a local minimum (which is also the global minimum). In Deep Learning without Poor
Local Minima by Kawaguchi (2016) [2], by reducing deep linear networks to deep nonlinear networks,
the author shows that, among other things, every unique minimum is a global minimum and every non-minimum
critical point is a saddle point.
Continue reading
Ghazal Fazelnia
October 24, 2016
This week we read a paper on Scalable Exact algorithm for Bayesian inference for Big Data by
Pollock et al, 2016 [1].
They introduce a Monte Carlo algorithm based on a Markov process with a quasi-stationary
distribution that coincides with the distribution of interest. The authors show theoretical guarantees
for recovering the correct limiting target distribution using their proposed algorithm.
Moreover, they argue that this
methodology is practical for big data in which they use a subsampling technique with sub-linear
iterative cost as a function of data size.
Continue reading
We read two papers last Thursday: the “DRAW” paper by Gregor et al, 2014 and the “Show, Attend, Tell” paper by Xu et al, 2015. Both embed an attention model in deep neural networks (DNNs). The first paper generates images that attempt to match the distribution of input images while the second paper generates captions for images. With the attention model, both models output data over multiple time steps, focusing on one region of the image per step. To implement this sequential generative process, both models used long short-term memory units (LSTMs) as one layer of the network.
Continue reading