TFEEC: Turkish Financial Event Extraction Corpus


Kaynak K. Ş., Tantuğ A. C.

19th International Symposium on Distributed Computing and Artificial Intelligence, DCAI 2022, L'Aquila, Italy, 13 - 15 July 2022, vol.585 LNNS, pp.49-58 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 585 LNNS
  • Doi Number: 10.1007/978-3-031-23210-7_5
  • City: L'Aquila
  • Country: Italy
  • Page Numbers: pp.49-58
  • Keywords: Active learning, Corpus generation, Event extraction, Semi-supervised, Weak supervision
  • Istanbul Technical University Affiliated: Yes

Abstract

Event extraction from the news is essential for making financial decisions accurately. Therefore, it has been researched in many languages for a long time. However, to the best of our knowledge, no study has been conducted in the domain of Turkish financial and economic text mining. To fill this gap, we have created an ontology and presented a well-defined and high-quality company-specific event corpus of Turkish economic and financial news. Using our dataset, we conducted a preliminary evaluation of the event extraction model to serve as a baseline for further work. Most approaches in the event extraction domain rely on machine learning and require large amounts of labeled data. However, building a training corpus with manually annotated events is a very time-consuming and intensive process. To solve this problem, we tried active learning and weak supervision methods to reduce human effort and automatically produce more labeled data without degrading machine learning performance. Experiments on our dataset show that both methods are useful. Furthermore, when we combined the manually annotated dataset with the automatically labeled dataset and used it in model training, we demonstrated that the performance increased by %2,91 for event classification, %13,76 for argument classification.