22nd International Conference on Pattern Recognition (ICPR), Stockholm, İsveç, 24 - 28 Ağustos 2014, ss.1360-1364
Spectral clustering has been successfully used in various applications, thanks to its properties such as no requirement of a parametric model, ability to extract clusters of different characteristics and easy implementation. However, it is often infeasible for large datasets due to its heavy computational load and memory requirement. To utilize its advantages for large datasets, it is applied to the dataset representatives (either obtained by quantization or sampling) rather than the data samples, which is called approximate spectral clustering. This necessitates novel approaches for defining similarities based on representatives exploiting the data characteristics, in addition to the traditional Euclidean distance based similarities. To address this challenge, we propose similarity measures based on geodesic distances and local density distribution. Our experiments using datasets with varying cluster statistics show that the proposed geodesic based similarities are successful for approximate spectral clustering with high accuracies.