A Tale of Sparsity in Deep Learning: Lottery Tickets, Subset Selection, and Efficiency in Distributed Learning
Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. Yet, more often than not, it involves a computationally expensive procedure, as the dense model must be fully pre-trained to achieve top-notch performance, at least for a number of iterations/epochs. Most existing works in this area remain empirical or impractical, based on heuristic rules about when, how and how much one needs to pre-train to recover meaningful sparse subnetworks. In this talk, we will scratch the surface of open questions in the general area of ``pruning techniques for neural network training'', with special focus on what can be theoretically characterized, in order to move from heuristics to provable protocols.
If time permits, the talk is split into the following three themes/questions:
- Can we theoretically characterize how much SGD-based pre-training is sufficient to identify meaningful sparse subnetworks?
- Can we drive connections between pruning methods and classical sparse recovery, in order to leverage decades of knowledge on theory for subset selection?
- From a practical standpoint, can we combine the above ideas with existing efficient distributed protocols in order to achieve end-to-end sparse neural network training, even avoiding heavy full-model pre-training phases?
Anastasios Kyrillidis is a Noah Harding Assistant Professor at the Computer Science department at Rice University. Prior to that, he was a Goldstine postdoctoral fellow at IBM T. J. Watson Research Center (NY), and a Simons Foundation postdoc member at the University of Texas at Austin. He finished his PhD at the CS Department of EPFL (Switzerland) under the supervision of Volkan Cevher. Tasos got his M.Sc. and Diploma from the Electronic and Computer Engineering Dept. at the Technical University of Crete (Chania). His research interests include (but not limited to): Optimization for machine learning, convex and non-convex algorithms and analysis, large-scale optimization, any problem that includes a math-driven criterion and requires an efficient method for its solution.