Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Zhou, Y. (2023). Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers. Retrieved from https://purl.lib.fsu.edu/diginole/Zhou_fsu_0071E_18330
Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. In this dissertation, we develop a provably clustering algorithm namely Scalable Clustering by Robust Loss Minimization (SCRLM) that performs well on Gaussian Mixture Models with outliers. We derive theoretical guarantees that SCRLM obtains high accuracy with high probability under certain assumption. Moreover, it can also be used as an initialization strategy for k-means clustering. Experiments on real-world large-scale datasets demonstrate the effectiveness of SCRLM when clustering a large number of clusters, and a k-means algorithm initialized by SCRLM outperforms most classic clustering methods in both speed and accuracy, while scaling well to large datasets such as ImageNet. We further extend SCRLM to Hierarchical SCRLM (HSCRLM) to handle hierarchical structures while maintaining robustness and theoretical guarantees. These advancements contribute to addressing modern clustering challenges.
A Dissertation submitted to the Department of Mathematics in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Adrian Barbu, Professor Co-Directing Dissertation; Kyle A. Gallivan, Professor Co-Directing Dissertation; Gordon Erlebacher, University Representative; Giray Okten, Committee Member; Mark Sussman, Committee Member.
Publisher
Florida State University
Identifier
Zhou_fsu_0071E_18330
Zhou, Y. (2023). Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers. Retrieved from https://purl.lib.fsu.edu/diginole/Zhou_fsu_0071E_18330