Annotation of semantic roles for the Turkish Proposition Bank


Sahin G. G., Adali E.

LANGUAGE RESOURCES AND EVALUATION, cilt.52, sa.3, ss.673-706, 2018 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 52 Sayı: 3
  • Basım Tarihi: 2018
  • Doi Numarası: 10.1007/s10579-017-9390-y
  • Dergi Adı: LANGUAGE RESOURCES AND EVALUATION
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.673-706
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

In this work, we report large-scale semantic role annotation of arguments in the Turkish dependency treebank, and present the first comprehensive Turkish semantic role labeling (SRL) resource: Turkish Proposition Bank (PropBank). We present our annotation workflow that harnesses crowd intelligence, and discuss the procedures for ensuring annotation consistency and quality control. Our discussion focuses on syntactic variations in realization of predicate-argument structures, and the large lexicon problem caused by complex derivational morphology. We describe our approach that exploits framesets of root verbs to abstract away from syntax and increase self-consistency of the Turkish PropBank. The issues that arise in the annotation of verbs derived via valency changing morphemes, verbal nominals, and nominal verbs are explored, and evaluation results for inter-annotator agreement are provided. Furthermore, semantic layer described here is aligned with universal dependency (UD) compliant treebank and released to enable more researchers to work on the problem. Finally, we use PropBank to establish a baseline score of 79.10 F1 for Turkish SRL using the mate-tool (an open-source SRL tool based on supervised machine learning) enhanced with basic morphological features. Turkish PropBank and the extended SRL system are made publicly available.