The Feasibility Analysis of Re-ranking for N-Best Lists on English-Turkish Machine Translation


Yildirim E., Tantuğ A. C.

IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), Bulgaristan, 19 - 21 Haziran 2013 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Ülke: Bulgaristan
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we present the results of re-ranking for N-best list on machine translations. The main purpose of this research is to determine the upper bound of MT success that can be gained by reordering possible candidate translations. We use Google Translate Research API(1) as our Statistical Machine Translation (SMT) system to get the N-best lists consisting of possible Turkish translations for a given English sentence. We evaluate the effect of reordering using three simple methods: unigram count (UC), unigram ratio (UR), and first four characters match (FFCM). We collected 720 sentences in order to give to the SMT system, and then we used 3 different sets of Turkish translations of them to evaluate our work on the N-best lists. Success of re-ranking is determined by using BLEU metric, besides an inclusive investigation which is necessary especially for agglutinative languages (e. g. Turkish, Czech, Hungarian, and Finnish) is performed by using BLEU+ MT scoring tool. We observe an improvement in BLEU score from 31.71 for the baseline system to 35.46 which is about 11.81% relative for the re-ranked model using UR.