Sponsor
Portland State University. Department of Computer Science
First Advisor
Ameeta Agrawal
Term of Graduation
Spring 2024
Date of Publication
5-29-2024
Document Type
Thesis
Degree Name
Master of Science (M.S.) in Computer Science
Department
Computer Science
Language
English
Subjects
Conversation Models, Large Language Model, Metrics, Multilinguality, Natural Language Processing
Physical Description
1 online resource (ix, 71 pages)
Abstract
Expansive use of large language models (LLMs) as dialogue systems brings increased importance to the evaluation of the responses they generate. Although evaluation of qualities such as coherence and fluency are readily possible with well-established automatic metrics, engagingness is often measured with human evaluation -- a process that can be costly and slows the pace of development. Existing automatic metrics for engagingness have low to moderate correlation with human annotations, evaluate the response without the conversation history, are complicated to implement, or are designed for a specific dataset. Moreover, they have been tested exclusively on English conversations. Given that dialogue systems are increasingly available in languages beyond English, it is important to evaluate systems in more than one language. We propose that LLMs may be used for evaluation of engagingness in dialogue through prompting, and ask how prompt constructs compare in a multilingual setting. Our results give a prompt design taxonomy and indication of which strategies are the most effective. We find that using selected prompt constructs, including our comprehensive definition of engagingness, gives state-of-the-art performance on evaluation of engagingness in dialogue across multiple languages. We conclude that LLMs can be used for evaluation of engagingness in multiple languages through prompting alone.
Rights
© 2024 Amila Ferron
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Persistent Identifier
https://archives.pdx.edu/ds/psu/42359
Recommended Citation
Ferron, Amila, "Automatic Measurement of Dialogue Engagingness in Multilingual Settings" (2024). Dissertations and Theses. Paper 6659.