Friday, Mar 25 2022 - 10:00 am (GMT + 7)
Implicit Regularization for Algorithm Design: Neural Collapse and Worst Group Generalization
Yiping Lu is a doctoral student in Computational and Mathematical Engineering at Stanford University, working with Lexing Ying and Jose Blanchet. Previously, he received his bachelor's degree in Information and Computing Sciences at the School of Mathematical Sciences, Peking University. His work spans statistical learning, stochastic control, numerical analysis and computational economics. His recent research interest focus on integrating structural/physics form and machine learning and robust decision making (experiment design/machine learning/control).
Although overparameterized models have shown their success on many machine learning tasks, the accuracy could drop on the testing distribution that is different from the training one. At the same time, importance weighting, a traditional technique to handle distribution shifts, has been demonstrated to have less or even no effect on overparameterized models both empirically and theoretically. In this talk, we aim to understand and fix this problem through Neural Collapse, a recently discovered implicit bias which showed a highly symmetric geometric pattern of neural networks that emerges during the terminal phase of training. In the first part of the talk, we showed how implicit regularization can lead the last-layer features and classifiers to the Neural Collapse Geometry from a surrogate model called the unconstrained layer-peeled model (ULPM). In the second part of the talk, we propose importance tempering to improve the decision boundary and achieve consistently better results for overparameterized models. Theoretically, we justify that the selection of group temperature can be different under label shift and spurious correlation setting. At the same time, we also prove that properly selected temperatures can extricate the minority collapse for imbalanced classification. Empirically, we achieve state-of-the-art results on worst group classification tasks using importance tempering. This talk is based on our recent mainly working with Wending Ji, Zhun Dang, Weijie Su, Zach Izoo and Lexing Ying.