Feature Selection for Collective Classification

Senliol B., Aral A., Çataltepe Z.

24th International Symposium on Computer and Information Sciences, Güzelyurt, Cyprus (Kktc), 14 - 16 September 2009, pp.285-290 identifier

  • Publication Type: Conference Paper / Full Text
  • City: Güzelyurt
  • Country: Cyprus (Kktc)
  • Page Numbers: pp.285-290
  • Istanbul Technical University Affiliated: Yes


When in addition to node contents and labels, relations (links) between nodes and some unlabeled nodes are available, collective classification algorithms can be used. Collective classification algorithms, like ICA (Iterative Classification Algorithm), determine labels for the unlabeled nodes based on the contents and/or labels of the neighboring nodes. Feature selection algorithms have been shown to improve classification accuracy for traditional machine learning algorithms. In this paper, we use a recent and successful feature selection algorithm, mRMR (Minimum Redundancy Maximum Relevance, Ding and Peng, 2003), on content features. On two scientific paper citation data sets, Cora and Citeseer, when only content information is used, we ahow that the selected features may result in almost as good performance as all the features. When feature selection is performed both on content and link information, even better classification accuracies are obtained. Feature selection considerably reduces the training time for both content only and ICA algorithms.