Evaluation of wizard-of-Oz and self-play data collection techniques for turkish goal-oriented dialogue agents


Arslan D. , Eryiğit G.

2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021, Kocaeli, Turkey, 25 - 27 August 2021 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/inista52262.2021.9548636
  • City: Kocaeli
  • Country: Turkey
  • Keywords: Goal-oriented dialogue agent, Self-play, Wizard-of-oz

Abstract

© 2021 IEEE.As with all natural language processing tasks, the lack of open-source training data required for the development of dialogue agents is a major obstacle to research studies in the field. Especially languages that are not widely studied, such as Turkish, suffer more from this problem. This article introduces a comparison of Wizard-of-Oz and self-play data collection techniques for Turkish goal-oriented dialogue system generation. Three data sets have been prepared and introduced to the researchers by using these techniques. Being the first publicly available human-to-human Turkish dialogue data sets, although open for development, the created resources from the restaurant domain are very valuable for further research on Turkish dialogue systems. The mentioned methods are quantitatively compared on the produced data sets, in terms of dialog act classification and slot identification scores. Since it is costly to collect data with methods like Wizard-Of-Oz in every domain, an open-source flexible and easy-to-use framework is also provided implementing self-play which may be used to create machine-to-machine dialogue outlines and speed data collection for low-resource languages like Turkish. Besides, designed templates of annotation screens for crowdsourcing are provided for future studies.