Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Bartoldson, B. R. (2020). Generalization Performance and Properties of Pruned Neural Networks. Retrieved from https://purl.lib.fsu.edu/diginole/2020_Summer_Fall_Bartoldson_fsu_0071E_16109
By removing as much as 90% or more of a deep neural network's (DNN's) parameters, a wide variety of pruning approaches not only allow for DNN compression but also increase generalization (model performance on test/unseen data). This observation is in conflict with emerging DNN generalization theory and empirical observations, however, which suggest that DNNs generalize better as their parameter counts rise, despite overparameterization (use of more parameters than data points). Seeking to reconcile such modern findings and pruning-based generalization improvements, this thesis empirically studies the cause of improved generalization in pruned DNNs. We begin by providing support for our hypothesis that pruning regularizes similarly to noise injection with a perhaps surprising result: pruning parameters more immediately important to the network leads to better generalization later, after the network has adapted to the pruning. We show that this behavior is a manifestation of a more general phenomenon. Across a wide variety of experimental configurations and pruning algorithms, pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately after pruning). We study the limits of this generalization-stability tradeoff and use it to inform the derivation of a novel pruning algorithm that produces particularly unstable pruning and higher generalization. Such results suggest that accounting for this tradeoff would improve pruning algorithm design. Finally, we empirically examine the consistency of several generalization theories with the generalization-stability tradeoff and pruning-based generalization improvements. Notably, we find that pruning less stably heightens measures of DNN flatness (robustness to data-sample and parameter changes) that are positively correlated with generalization, and pruning-based generalization improvements are maintained when pruning is modified to only remove parameters temporarily. Thus, by demonstrating a regularization mechanism in pruning that depends on changes to sharpness-related complexity rather than parameter-count complexity, this thesis elucidates the compatibility of pruning-based generalization improvements and high generalization in overparameterized DNNs, while also corroborating the relevance of flatness to DNN generalization.
Compression, Deep neural networks, Flatness, Generalization, Noise injection, Pruning
Date of Defense
July 16, 2020.
Submitted Note
A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Gordon Erlebacher, Professor Directing Dissertation; Paul Beaumont, University Representative; Adrian Barbu, Committee Member; Anke Meyer-Baese, Committee Member; Sachin Shanbhag, Committee Member.
Publisher
Florida State University
Identifier
2020_Summer_Fall_Bartoldson_fsu_0071E_16109
Bartoldson, B. R. (2020). Generalization Performance and Properties of Pruned Neural Networks. Retrieved from https://purl.lib.fsu.edu/diginole/2020_Summer_Fall_Bartoldson_fsu_0071E_16109