The Problem of Pos Tagging and Stemming for Agglutinative Languages (Turkish, Uyghur, Uzbek Languages)


Boltayevich E. B., Adali E., Mirdjonovna K. S., Xolmo'minovna A. O., Yuldashevna X. Z., Nizomaddin Uktamboy O'G'li X.

8th International Conference on Computer Science and Engineering, UBMK 2023, Burdur, Turkey, 13 - 15 September 2023, pp.57-62 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk59864.2023.10286792
  • City: Burdur
  • Country: Turkey
  • Page Numbers: pp.57-62
  • Keywords: information retrieval, IR, part-of-speech, POS tagging, stemming, stemming algorithms
  • Istanbul Technical University Affiliated: Yes

Abstract

The number of possible word forms in agglutinative languages is theoretically unlimited. This, in turn, creates the problem of POS tagging (part-of-speech) of out-of-vocabulary (OOV) words in agglutinative languages. In agglutinative languages, words are formed by adding suffixes to the stem. Due to the occurrence of phonetic harmony and disharmony while adding suffixes to the stem, it is necessary to analyze both phonetic and morphological changes. When solving many NLP tasks, it is necessary to reduce word forms to the stem (stemming). Removing all inflectional affixes from a word and lemmatizing the rest of the word is considered one of the important tasks of natural language processing (NLP), and this process is called stemming. The stemming process is important in information retrieval (IR) systems.