Turkish Document Classification with Coarse-Grained Semantic Matrix


Donmez I., Adali E.

17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Konya, Turkey, 3 - 09 April 2016, vol.9624, pp.472-484 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 9624
  • Doi Number: 10.1007/978-3-319-75487-1_37
  • City: Konya
  • Country: Turkey
  • Page Numbers: pp.472-484
  • Istanbul Technical University Affiliated: Yes

Abstract

In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grained semantic matrix. Predicate features and semantic class type are also added to the coarse-grained semantic matrix representation. The highest success rate in Turkish Document Classification "97.12" is obtained by adding the coarse-grained semantic matrix representation to the data which has previous highest result in the previous studies about Turkish Document Classification.