Turkish Document Classification with Coarse-Grained Semantic Matrix


Donmez I., Adali E.

17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Konya, Türkiye, 3 - 09 Nisan 2016, cilt.9624, ss.472-484 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 9624
  • Doi Numarası: 10.1007/978-3-319-75487-1_37
  • Basıldığı Şehir: Konya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.472-484
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grained semantic matrix. Predicate features and semantic class type are also added to the coarse-grained semantic matrix representation. The highest success rate in Turkish Document Classification "97.12" is obtained by adding the coarse-grained semantic matrix representation to the data which has previous highest result in the previous studies about Turkish Document Classification.