First Advisor

Ameeta Agrawal

Term of Graduation

Spring 2026

Date of Publication

6-4-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science

Department

Computer Science

Language

English

Subjects

bias, fairness, summarization

Physical Description

1 online resource (xvi, 123 pages)

Abstract

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to improve not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse groups.

First, we introduce a novel dataset, DivSumm, of dialect diverse tweets and human-written extractive and abstractive summaries, and introduce three cluster-based approaches for generating fairer summaries. Our results show that cluster-based preprocessing approaches improve the quality of system-generated summaries without loss in diversity.

Second, we investigate in depth the phenomenon of position bias by analyzing the effect of group ordering in input documents when summarizing tweets from diverse groups. Our results highlight significant position bias, with the models preferring the content in the beginning of the input, and motivate the need to incorporate randomized shuffling in multi-document summarization datasets particularly when summarizing documents from diverse groups.

Third, we propose a fairness metric, FairSummEval to estimate the fairness of generated summaries from diverse social groups. The results of extensive experiments demonstrate that our metric outperforms other known metrics that have been used in measuring the fairness of abstractive summaries.

Lastly, we introduce a novel method, ThreadSumm, a multi-stage LLM pipeline framework for nested discourse summarization. These contributions aim to provide new dataset, methods, and metric for fair summarization of text data from diverse social groups without sacrificing textual quality.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/44914

Recommended Citation

Olabisi, Olubusayo, "Balancing Fairness and Quality in Automatic Text Summarization" (2026). Dissertations and Theses. Paper 7109.

Download

Available for download on Friday, June 04, 2027

COinS

Dissertations and Theses

Balancing Fairness and Quality in Automatic Text Summarization

First Advisor

Term of Graduation

Date of Publication

Document Type

Degree Name

Department

Language

Subjects

Physical Description

Abstract

Rights

Persistent Identifier

Recommended Citation

Find

Connect

Dissertations and Theses

Balancing Fairness and Quality in Automatic Text Summarization

Author

Sponsor

First Advisor

Term of Graduation

Date of Publication

Document Type

Degree Name

Department

Language

Subjects

Physical Description

Abstract

Rights

Persistent Identifier

Recommended Citation

Share

Find

Connect