Annotation of Financial Entities Using A Comprehensive Scheme in Turkish Türkçe'de Finansal Varlik Isimlerinin Kapsamli Tanimlanmasi ve Işaretlenmesi


Adali K., Tantuğ A. C.

30th Signal Processing and Communications Applications Conference, SIU 2022, Safranbolu, Türkiye, 15 - 18 Mayıs 2022 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu55565.2022.9864782
  • Basıldığı Şehir: Safranbolu
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Annotation, annotation scheme, financial information extraction, language resource, named entity recognition
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

© 2022 IEEE.Information extraction (IE) which refers to the task of turning texts into structured form is also employed in finance domain for extraction of information which have a big importance for different financial concepts such as market, stock, and indices etc. As many other applications in Natural Language Processing(NLP), annotated corpora which involves entities, that represent characteristics of the related domain, is also essential resources for training and evaluation of IE models. Unfortunately, the creation of these resources is rather thorny, thus the scarcity of annotated language resources is one of the most prominent problems for lesser-studied language; as in the case for Turkish. In this paper, we present an ontology of financial concepts, and an effort to produce a high-quality corpus which includes 500 news documents annotated with these concepts in Turkish. We employ the dataset in the training of a baseline entity recognition model, and performance achieved over the dataset is 64.5% F-scores.