THE USE OF K-MEANS plus plus FOR APPROXIMATE SPECTRAL CLUSTERING OF LARGE DATASETS


Yalcin B., TAŞDEMİR K.

22nd IEEE Signal Processing and Communications Applications Conference (SIU), Trabzon, Turkey, 23 - 25 April 2014, pp.220-223 identifier

  • Publication Type: Conference Paper / Full Text
  • City: Trabzon
  • Country: Turkey
  • Page Numbers: pp.220-223

Abstract

Spectral clustering (SC) has been commonly used in recent years, thanks to its nonparametric model, its ability to extract clusters of different manifolds and its easy application. However, SC is infeasible for large datasets because of its high computational cost and memory requirement. To address this challenge, approximate spectral clustering (ASC) has been proposed for large datasets. ASC involves two steps: firstly limited number of data representatives (also known as prototypes) are selected by sampling or quantization methods, then SC is applied to these representatives using various similarity criteria. In this study, several quantization and sampling methods are compared for ASC. Among them, k-means++, which is a recently popular algorithm in clustering, is used to select prototypes in ASC for the first time. Experiments on different datasets indicate that k-means++ is a suitable alternative to neural gas and selective sampling in terms of accuracy and computational cost.