From Statistical to Deep Learning Models: A Comparative Sentiment Analysis Over Commodity News

Sivri M. S., Korkmaz B. S., Üstündağ A.

International Conference on Intelligent and Fuzzy Systems, INFUS 2021, İstanbul, Turkey, 24 - 26 August 2021, vol.308, pp.155-162 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 308
  • Doi Number: 10.1007/978-3-030-85577-2_18
  • City: İstanbul
  • Country: Turkey
  • Page Numbers: pp.155-162
  • Keywords: Commodity news analysis, Financial sentiment analysis, Natural language processing, Sentiment analysis
  • Istanbul Technical University Affiliated: Yes


© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.The sentiment analysis of news and social media posts is a growing research area with advancements in natural language processing and deep learning techniques. Although various studies addressing the extraction of the sentiment score from news and other resources for specified stocks or a stock index, still there is a lack of an analysis of the sentiment in more specialized topics such as commodity news. In this paper, several natural language processing techniques with a varying range from statistical methods to deep learning-based methods were applied on the commodity news. Firstly, the dictionary-based methods were investigated with the most common dictionaries in financial sentiment analysis such as Loughran & McDonald and Harvard dictionaries. Then, statistical models have been applied to the commodity news with count vectorizer and TF-IDF. The compression-based NCD has been also included to test on the labeled data. To improve the results of the sentiment extraction, the news data was processed by deep learning-based state-of-art models such as ULMFit, Flair, Word2Vec, XLNet, and BERT. A comprehensive analysis of all tested models was held. The final analysis indicated the performance difference between the deep learning-based and statistical models for the sentiment analysis task on the commodity news. BERT has achieved superior performance among the deep learning models for the given data.