Papers | Columbia Advanced Machine Learning Seminar

This is a list of suggested papers to choose from, loosely organized by topic.

Inference

Mnih, Andriy, and Danilo J. Rezende. “Variational inference for Monte Carlo objectives.” arXiv preprint arXiv:1602.06725 (2016). link
Kingma, Diederik P., and Max Welling. “Auto-encoding variational Bayes.”arXiv preprint arXiv:1312.6114 (2013). link
Rhee, Chang-han, and Peter W. Glynn. “Unbiased estimation with square root convergence for SDE models.” Operations Research 63.5 (2015): 1026-1043. link
Tripuraneni, Nilesh, et al. “Magnetic Hamiltonian Monte Carlo.” arXiv preprint arXiv:1607.02738 (2016). link
Grosse, Roger B., Siddharth Ancha, and Daniel M. Roy. “Measuring the reliability of MCMC inference with bidirectional Monte Carlo.” arXiv preprint arXiv:1606.02275 (2016). link
Kucukelbir, Alp, et al. “Automatic Differentiation Variational Inference.” arXiv preprint arXiv:1603.00788 (2016). link
Duvenaud, David, Dougal Maclaurin, and Ryan P. Adams. “Early Stopping as Nonparametric Variational Inference.” AISTATS (2016). link
Rudolph, Maja R., et al. “Exponential Family Embeddings.” arXiv preprint arXiv:1608.00778 (2016). link
Bouchard-Côté, Alexandre, Sebastian J. Vollmer, and Arnaud Doucet. “The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo Method.” arXiv preprint arXiv:1510.02451 (2015). link
Pakman, Ari, et al. “Stochastic Bouncy Particle Sampler.” arXiv preprint arXiv:1609.00770 (2016). link
Giles, Mike, et al. “Multilevel Monte Carlo for Scalable Bayesian Computations.” arXiv preprint arXiv:1609.06144 (2016). link
Lopez-Paz, David, and Maxime Oquab. “Revisiting Classifier Two-Sample Tests.” arXiv preprint arXiv:1610.06545 (2016). link
He, Niao, et al. “Fast and Simple Optimization for Poisson Likelihood Models.” arXiv preprint arXiv:1608.01264 (2016). link

Theory

Arora, Sanjeev, et al. “Generalization and Equilibrium in Generative Adversarial Nets (GANs).” arXiv preprint arXiv:1703.00573 (2017). link
Hardt, Moritz, Tengyu Ma, and Benjamin Recht. “Gradient Descent Learns Linear Dynamical Systems.” arXiv preprint arXiv:1609.05191 (2016). link
Yang, Fanny, Sivaraman Balakrishnan, and Martin J. Wainwright. “Statistical and computational guarantees for the Baum-Welch algorithm.” 53rd Annual Allerton Conference on Communication, Control, and Computing. IEEE (2015). link
Chen, Yudong, and Martin J. Wainwright. “Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.”arXiv preprint arXiv:1509.03025 (2015). link
Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. “A Variational Perspective on Accelerated Methods in Optimization.” arXiv preprint arXiv:1603.04245 (2016). link
Chi, Jin et al. “Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences.” arXiv preprint arXiv:1609.00978 (2016) link
Bloem-Reddy, Benjamin, and Peter Orbanz. “Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs.” arXiv preprint arXiv:1612.06404 (2016). link
Huszár, Ferenc. “How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?.” arXiv preprint arXiv:1511.05101 (2015). link
Mohamed, Shakir, and Balaji Lakshminarayanan. “Learning in Implicit Generative Models.” arXiv preprint arXiv:1610.03483 (2016). link

Deep Learning

Gulrajani, Ishaan, et al. “Improved Training of Wasserstein GANs.” arXiv preprint arXiv:1704.00028 (2017). link
Metz, Luke, et al. “Unrolled Generative Adversarial Networks.” arXiv preprint arXiv:1611.02163 (2016). link
Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015). link
Graves, Alex, et al. “Hybrid computing using a neural network with dynamic external memory.” Nature 538.7626 (2016): 471-476. link
Chung, Junyoung, et al. “A recurrent latent variable model for sequential data.” Advances in neural information processing systems. 2015. link
Burda, Yuri, Roger Grosse, and Ruslan Salakhutdinov. “Importance weighted autoencoders.” arXiv preprint arXiv:1509.00519 (2015). link
He, Kaiming, et al. “Deep residual learning for image recognition.” arXiv preprint arXiv:1512.03385 (2015). link
Rezende, Danilo Jimenez, et al. “Unsupervised Learning of 3D Structure from Images.” arXiv preprint arXiv:1607.00662 (2016). link
Bowman, Samuel R., et al. “Generating sentences from a continuous space.”arXiv preprint arXiv:1511.06349 (2015). link
Dosovitskiy, Alexey, et al. “Learning to Generate Chairs, Tables and Cars with Convolutional Networks.” (2016). link
Maaløe, Lars, et al. “Auxiliary Deep Generative Models.” arXiv preprint arXiv:1602.05473 (2016). link
Mnih, Volodymyr, Nicolas Heess, and Alex Graves. “Recurrent models of visual attention.” Advances in Neural Information Processing Systems. 2014. link
Makhzani, Alireza, et al. “Adversarial autoencoders.” arXiv preprint arXiv:1511.05644 (2015). link
Gregor, Karol, et al. “Deep autoregressive networks.” arXiv preprint arXiv:1310.8499 (2013). link
Nguyen, Anh, et al. “Synthesizing the preferred inputs for neurons in neural networks via deep generator networks.” arXiv preprint arXiv:1605.09304 (2016). link
Kiros, Ryan, et al. “Skip-thought vectors.” Neural Information Processing Systems (NIPS). (2015). link
Mansimov, Elman, et al. “Generating images from captions with attention.”arXiv preprint arXiv:1511.02793 (2015). link
Salimans, Tim, et al. “Improved Techniques for Training GANs.” arXiv preprint arXiv:1606.03498 (2016). link
Nalisnick, Eric, and Padhraic Smyth. “Deep Generative Models with Stick-Breaking Priors.” arXiv preprint arXiv:1605.06197 (2016). link
Kulkarni, Tejas D., et al. “Deep convolutional inverse graphics network.” Neural Information Processing Systems (NIPS) (2015). link
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “A neural algorithm of artistic style.” arXiv preprint arXiv:1508.06576 (2015). link
Alain, Guillaume, and Yoshua Bengio. “Understanding intermediate layers using linear classifier probes.” arXiv preprint arXiv:1610.01644 (2016). link
Chen, Yutian, et al. “Learning to Learn for Global Optimization of Black Box Functions.” arXiv preprint arXiv:1611.03824 (2016). link
Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. “Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” Proceedings of The 31st International Conference on Machine Learning. 2014. link

State Space Models

Foerster, Jakob N., et al. “Intelligible language modeling with input switched affine networks.” arXiv preprint arXiv:1611.09434 (2016). link
Fraccaro, Marco, et al. “Sequential Neural Models with Stochastic Layers.” Advances in Neural Information Processing Systems (2016). link
Chung, Junyoung, et al. “A recurrent latent variable model for sequential data.” Advances in neural information processing systems. 2015. link
Schein, Aaron, Hanna Wallach, and Mingyuan Zhou. “Poisson-Gamma dynamical systems.” Advances in Neural Information Processing Systems (2016). link
Winner, Kevin, and Daniel R. Sheldon. “Probabilistic Inference with Generating Functions for Poisson Latent Variable Models.” Advances in Neural Information Processing Systems (2016). link
Ross, Stéphane, Geoffrey J. Gordon, and Drew Bagnell. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.” AISTATS (2011). link
Sun, Wen, et al. “Learning to filter with predictive state inference machines.”arXiv preprint arXiv:1512.08836 (2015). link
Sussillo, David, et al. “LFADS-Latent Factor Analysis via Dynamical Systems.” arXiv preprint arXiv:1608.06315 (2016). link
Bastani, Vahid et al. “Incremental Nonlinear System Identification and Adaptive Particle Filtering Using Gaussian Process.” arXiv preprint arXiv:1608.08362 (2016) link
Shestopaloff, Alexander Y., and Radford M. Neal. “MCMC for non-linear state space models using ensembles of latent sequences.” arXiv preprint arXiv:1305.0320 (2013). link
Kandasamy, Kirthevasan, Maruan Al-Shedivat, and Eric Xing. “Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices” arXiv preprint arXiv:1609.06390 (2016) link

Density Estimation

Tabak, E. G., and Cristina V. Turner. “A family of nonparametric density estimation algorithms.” Communications on Pure and Applied Mathematics (2013): 145-164. link
van den Oord, Aaron, Nal Kalchbrenner, and Koray Kavukcuoglu. “Pixel Recurrent Neural Networks.” Proceedings of The 33rd International Conference on Machine Learning. 2016. link
Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. “Density estimation using Real NVP.” arXiv preprint arXiv:1605.08803 (2016). link

Reinforcement Learning

Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” Journal of Machine Learning Research 17.39 (2016): 1-40. link
Tamar, Aviv, Sergey Levine, and Pieter Abbeel. “Value Iteration Networks.” NIPS (2016). link
Finn, Chelsea, Ian Goodfellow, and Sergey Levine. “Unsupervised Learning for Physical Interaction through Video Prediction.” arXiv preprint arXiv:1605.07157 (2016). link
Agarwal, Alekh, et al. “Corralling a Band of Bandit Algorithms.” arXiv preprint arXiv:1612.06246 (2016). link

Statistical Neuroscience

Marblestone, Adam, Greg Wayne, and Konrad Kording. “Towards an integration of deep learning and neuroscience.” arXiv preprint arXiv:1606.03813 (2016). link
Gershman, Samuel J., Eric J. Horvitz, and Joshua B. Tenenbaum. “Computational rationality: A converging paradigm for intelligence in brains, minds, and machines.” Science 349.6245 (2015): 273-278. link

Previously Discussed in CAMLS

Lu, Xiaoyu, et al. “Relativistic Monte Carlo.” arXiv preprint arXiv:1609.04388 (2016). link
Y. Ma, T. Chen, and E.B. Fox, “A Complete Recipe for Stochastic Gradient MCMC,” Neural Information Processing Systems (NIPS) (2015). link
Rezende, Danilo Jimenez, and Shakir Mohamed. “Variational inference with normalizing flows.” arXiv preprint arXiv:1505.05770 (2015). link
Kingma, Diederik P., Tim Salimans, and Max Welling. “Improving Variational Inference with Inverse Autoregressive Flow.” arXiv preprint arXiv:1606.04934 (2016). link
Moreno, Alexander, et al. “Automatic Variational ABC.” arXiv preprint arXiv:1606.08549 (2016). link
Meeds, Edward, Robert Leenders, and Max Welling. “Hamiltonian ABC.” arXiv preprint arXiv:1503.01916 (2015). link
Meeds, Edward, and Max Welling. “GPS-ABC: Gaussian process surrogate approximate Bayesian computation.” arXiv preprint arXiv:1401.2838 (2014). link
Johnson, Matthew J., et al. “Composing graphical models with neural networks for structured representations and fast inference.” arXiv preprint arXiv:1603.06277 (2016). link
Polloc, Murray et al. “The Scalable Langevin Exact Algorithm: Bayesian Inference for Big Data” arXiv preprint arXiv:1609.03436 (2016). link
Advani, Madhu, and Surya Ganguli. “Statistical Mechanics of Optimal Convex Inference in High Dimensions.” Physical Review X 6.3_ (2016): 031034. link
Kawaguchi, Kenji. “Deep Learning without Poor Local Minima.” arXiv preprint arXiv:1605.07110 (2016). link
Mei, Song, Yu Bai, and Andrea Montanari. “The Landscape of Empirical Risk for Non-convex Losses.” arXiv preprint arXiv:1607.06534 (2016). link
Goodfellow, Ian, et al. “Generative adversarial nets.” Neural Information Processing Systems (NIPS) (2014). link
Nowozin, Sebastian, Botond Cseke, and Ryota Tomioka. “f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization.” arXiv preprint arXiv:1606.00709 (2016). link
Huang, Gao, et al. “Deep networks with stochastic depth.” arXiv preprint arXiv:1603.09382 (2016). link
Gal, Yarin, and Zoubin Ghahramani. “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.” arXiv preprint arXiv:1506.02142 (2015). link
Gregor, Karol, et al. “DRAW: A recurrent neural network for image generation.” arXiv preprint arXiv:1502.04623 (2015). link
Xu, Kelvin, et al. “Show, attend and tell: Neural image caption generation with visual attention.” arXiv preprint arXiv:1502.03044 2.3 (2015): 5. link
Eslami, S. M., et al. “Attend, Infer, Repeat: Fast Scene Understanding with Generative Models.” arXiv preprint arXiv:1603.08575 (2016). link
Gao, Yuanjun, et al. “Linear dynamical neural population models through nonlinear embeddings.” arXiv preprint arXiv:1605.08454 (2016). link
Krishnan, Rahul G., Uri Shalit, and David Sontag. “Deep Kalman Filters.”arXiv preprint arXiv:1511.05121 (2015). link
Uria, Benigno, et al. “Neural Autoregressive Distribution Estimation.” arXiv preprint arXiv:1605.02226 (2016). link
Germain, Mathieu, et al. “MADE: masked autoencoder for distribution estimation.” International Conference on Machine Learning. 2015. link
Jang, Eric et al. “Categorical Reparameterization with Gumbel-softmax.” (2016) link
Maddison, Chris J et al. “The concrete distribution: a continuous relaxation of discrete random variables.” (2016) link
Kusner, Matt J. and José Miguel Hernández-Lobato. “GANs for Sequences of Discrete Elements with the Gumbel-softmax Distribution.” (2016) link
Khan, Mohammad E., et al. “Kullback-Leibler proximal variational inference.” Advances in Neural Information Processing Systems. 2015. link
Khan, Mohammad E., et al. “Faster stochastic variational inference using Proximal-Gradient methods with general divergence functions.” arXiv preprint arXiv:1511.00146 (2015). link
Rezende, Danilo Jimenez, et al. “One-Shot Generalization in Deep Generative Models.” arXiv preprint arXiv:1603.05106 (2016). link
Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. “Human-level concept learning through probabilistic program induction.” Science 350.6266 (2015): 1332-1338. link
Kakade, Sham, et al. “Prediction with a Short Memory.” arXiv preprint arXiv:1612.02526 (2016). link