First Advisor

Ameeta Agrawal

Term of Graduation

Spring 2024

Date of Publication

5-29-2024

Document Type

Thesis

Degree Name

Master of Science (M.S.) in Computer Science

Department

Computer Science

Language

English

Subjects

Conversation Models, Large Language Model, Metrics, Multilinguality, Natural Language Processing

Physical Description

1 online resource (ix, 71 pages)

Abstract

Expansive use of large language models (LLMs) as dialogue systems brings increased importance to the evaluation of the responses they generate. Although evaluation of qualities such as coherence and fluency are readily possible with well-established automatic metrics, engagingness is often measured with human evaluation -- a process that can be costly and slows the pace of development. Existing automatic metrics for engagingness have low to moderate correlation with human annotations, evaluate the response without the conversation history, are complicated to implement, or are designed for a specific dataset. Moreover, they have been tested exclusively on English conversations. Given that dialogue systems are increasingly available in languages beyond English, it is important to evaluate systems in more than one language. We propose that LLMs may be used for evaluation of engagingness in dialogue through prompting, and ask how prompt constructs compare in a multilingual setting. Our results give a prompt design taxonomy and indication of which strategies are the most effective. We find that using selected prompt constructs, including our comprehensive definition of engagingness, gives state-of-the-art performance on evaluation of engagingness in dialogue across multiple languages. We conclude that LLMs can be used for evaluation of engagingness in multiple languages through prompting alone.

Rights

© 2024 Amila Ferron

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/42359

Share

COinS