First Advisor
Ameeta Agrawal
Term of Graduation
January 2026
Date of Publication
6-1-2026
Document Type
Dissertation
Language
English
Subjects
bias, fairness, summarization
Physical Description
1 online resource ( pages)
Abstract
Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to improve not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse groups.
First, we introduce a novel dataset, DivSumm, of dialect diverse tweets and human-written extractive and abstractive summaries, and introduce three cluster-based approaches for generating fairer summaries. Our results show that cluster-based preprocessing approaches improve the quality of system-generated summaries without loss in diversity.
Second, we investigate in depth the phenomenon of position bias by analyzing the effect of group ordering in input documents when summarizing tweets from diverse groups. Our results highlight significant position bias, with the models preferring the content in the beginning of the input, and motivate the need to incorporate randomized shuffling in multi-document summarization datasets particularly when summarizing documents from diverse groups.
Third, we propose a fairness metric, FairSummEval to estimate the fairness of generated summaries from diverse social groups. The results of extensive experiments demonstrate that our metric outperforms other known metrics that have been used in measuring the fairness of abstractive summaries.
Lastly, we introduce a novel method, ThreadSumm, a multi-stage LLM pipeline framework for nested discourse summarization.
These contributions aim to provide new dataset, methods, and metric for fair summarization of text data from diverse social groups without sacrificing textual quality.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Recommended Citation
Olabisi, Olubusayo, "Balancing Fairness and Quality in Automatic Text Summarization" (2026). Dissertations and Theses. Paper 7109.