Sampling based approximate spectral clustering ensemble for partitioning datasets


Moazzen Y., TAŞDEMİR K.

23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4 - 08 December 2016, pp.1630-1635 identifier

  • Publication Type: Conference Paper / Full Text
  • City: Cancun
  • Country: Mexico
  • Page Numbers: pp.1630-1635

Abstract

Spectral clustering is able to extract clusters with various characteristics without a parametric model, however it is infeasible for large datasets due to its high computational cost and memory requirement. Approximate spectral clustering (ASC) addresses this challenge by a representative-based partitioning approach which first finds a set of data representatives either by sampling or quantization, then applies spectral clustering on them. To achieve an optimal partitioning with ASC, several sampling or quantization methods together with advanced similarity criteria have been recently proposed. While quantization is more accurate than sampling in expense of heavy computation, and geodesic based hybrid similarity criteria are often more informative than others, there is no unique solution optimum for all datasets. Alternatively, we propose to use ensemble learning to produce a consensus partitioning constructed from different set of representatives and similarity criteria. The proposed ensemble (SASCE) not only produces a relatively more accurate partitioning but also eliminates the need to determine the best pair (the optimum set of representatives and the optimum similarity). Thanks to the efficient similarity definition on the representative level, the SASCE can be powerful for clustering small and medium datasets, outperforming traditional clustering approaches and their ensembles.