Approximate spectral clustering with utilized similarity information using geodesic based hybrid distance measures


Tasdemir K., YALCIN B., Yıldırım İ.

PATTERN RECOGNITION, cilt.48, sa.4, ss.1465-1477, 2015 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 48 Sayı: 4
  • Basım Tarihi: 2015
  • Doi Numarası: 10.1016/j.patcog.2014.10.023
  • Dergi Adı: PATTERN RECOGNITION
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.1465-1477
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

Spectral clustering has been popular thanks to its ability to extract clusters of varying characteristics without using a parametric model in expense of high computational cost required for eigen-decomposition of pairwise similarities. In order to utilize its advantages in large datasets where it is infeasible due to its computational burden, approximate spectral clustering (ASC) methods apply spectral clustering on a reduced set of points (data representatives) selected by sampling or quantization. This two-step approach (i.e. finding the representatives and then clustering them) brings new opportunities for precise similarity definition such as manifold based topological relations, data distribution within the Voronoi polyhedra of the representatives, and their geodesic distance information, which are often ignored in similarity definition for ASC. In this study, we propose geodesic based hybrid similarity criteria which enable the use of different types of information for accurate similarity representation in ASC. Despite the fact that geodesic concept has been widely used in clustering, our contribution is the unique way of representing data topology to form geodesic relations and jointly harnessing various information types including topology, distance and density. The proposed criteria are tested using both sampling (selective sampling) and quantization (neural gas and k-means++) approaches. Experiments on artificial datasets, well-known small/medium-size real datasets, and four large datasets (four remote-sensing images), with different types of clusters, show that the proposed geodesic based hybrid similarity criteria outperform traditional similarity criteria in terms of clustering accuracies and several cluster validity indices. (C) 2014 Elsevier Ltd. All rights reserved.