Understanding the Success of Graph-based Semi-Supervised Learning using Partially Labelled Stochastic Block Model

Avirup Saha; Shreyas Sheshadri; Samik Datta; Niloy Ganguly; Disha Makhija; Priyank Patel

doi:10.24963/ijcai.2020/187

Understanding the Success of Graph-based Semi-Supervised Learning using Partially Labelled Stochastic Block Model

Avirup Saha, Shreyas Sheshadri, Samik Datta, Niloy Ganguly, Disha Makhija, Priyank Patel

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 1345-1351. https://doi.org/10.24963/ijcai.2020/187

PDF BibTeX

With the proliferation of learning scenarios with an abundance of instances, but limited amount of high-quality labels, semi-supervised learning algorithms came to prominence. Graph-based semi-supervised learning (G-SSL) algorithms, of which Label Propagation (LP) is a prominent example, are particularly well-suited for these problems. The premise of LP is the existence of homophily in the graph, but beyond that nothing is known about the efficacy of LP. In particular, there is no characterisation that connects the structural constraints, volume and quality of the labels to the accuracy of LP. In this work, we draw upon the notion of recovery from the literature on community detection, and provide guarantees on accuracy for partially-labelled graphs generated from the Partially-Labelled Stochastic Block Model (PLSBM). Extensive experiments performed on synthetic data verify the theoretical findings.

Keywords:

Data Mining: Classification, Semi-Supervised Learning

Data Mining: Mining Graphs, Semi Structured Data, Complex Data