Feature Selection for Collective Classification

24th International Symposium on Computer and Information Sciences, Güzelyurt, Kıbrıs (Kktc), 14 - 16 Eylül 2009, ss.285-290

Yayın Türü: Bildiri / Tam Metin Bildiri
Basıldığı Şehir: Güzelyurt
Basıldığı Ülke: Kıbrıs (Kktc)
Sayfa Sayıları: ss.285-290
İstanbul Teknik Üniversitesi Adresli: Evet

Özet

When in addition to node contents and labels, relations (links) between nodes and some unlabeled nodes are available, collective classification algorithms can be used. Collective classification algorithms, like ICA (Iterative Classification Algorithm), determine labels for the unlabeled nodes based on the contents and/or labels of the neighboring nodes. Feature selection algorithms have been shown to improve classification accuracy for traditional machine learning algorithms. In this paper, we use a recent and successful feature selection algorithm, mRMR (Minimum Redundancy Maximum Relevance, Ding and Peng, 2003), on content features. On two scientific paper citation data sets, Cora and Citeseer, when only content information is used, we ahow that the selected features may result in almost as good performance as all the features. When feature selection is performed both on content and link information, even better classification accuracies are obtained. Feature selection considerably reduces the training time for both content only and ICA algorithms.