THE USE OF K-MEANS plus plus FOR APPROXIMATE SPECTRAL CLUSTERING OF LARGE DATASETS


Yalcin B., TAŞDEMİR K.

22nd IEEE Signal Processing and Communications Applications Conference (SIU), Trabzon, Türkiye, 23 - 25 Nisan 2014, ss.220-223 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Trabzon
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.220-223
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

Spectral clustering (SC) has been commonly used in recent years, thanks to its nonparametric model, its ability to extract clusters of different manifolds and its easy application. However, SC is infeasible for large datasets because of its high computational cost and memory requirement. To address this challenge, approximate spectral clustering (ASC) has been proposed for large datasets. ASC involves two steps: firstly limited number of data representatives (also known as prototypes) are selected by sampling or quantization methods, then SC is applied to these representatives using various similarity criteria. In this study, several quantization and sampling methods are compared for ASC. Among them, k-means++, which is a recently popular algorithm in clustering, is used to select prototypes in ASC for the first time. Experiments on different datasets indicate that k-means++ is a suitable alternative to neural gas and selective sampling in terms of accuracy and computational cost.