Spectral methods for graph clustering

Abstract

Categorization, i.e. the ability to assign the same labels to objects sharing similar properties, is one of the main task in machine learning. In recent times, the ever increasing amount of data at our disposal gives us unprecedented possibilities to devise sophisticated and statistically significant categorization methods but it also requires a considerable effort in designing scalable and efficient algorithms, capable to properly deal with these datasets. Spectral clustering (SC) is one of the most popular techniques to categorize the items of a dataset that can be represented as a graph. This is a class of unsupervised algorithms in which the “best” partition does not require the help of additional information to be determined and is instead obtained by exploiting the dependencies between the dataset’s items. In SC algorithms, the information concerning the class structure of the input dataset is determined by the eigenvectors of a suited matrix. The intuitions and results justifying SC are at the crossroads between several fields such as statistics, random matrix theory, computer science, network science, signal processing, statistical physics and have so far mostly been treated independently. In this manuscript, we study the challenging (but relevant) sparse setting, in which only few entries of the matrix representation of the input dataset are non-zero. We focus in particular on the applications of SC for community detection (in both static and dynamical graphs) and for the sparsification of kernel matrices for time-efficient clustering of high dimensional vectors. We build on the recent advances in statistical physics for SC to propose improved algorithms that provably outperform the existing methods for both synthetic as well as real clustering tasks. Moreover, we propose a simple framework that gives a unified view of some of the most inflential state-of-the-art methods for SC. The existing algorithms from the literature can often be considered as corner cases of our proposed methods that constitute instead an “optimum” capable of self-adapting to the hardness of the clustering problem. We further detail extensively how to efficiently implement our proposed algorithms for practical SC tasks.

Lorenzo Dall'Amico
Lorenzo Dall'Amico
Postdoctoral fellow

I am currently a postdoctoral fellow at the ISI foundation in Turin, Italy.