Published In

Heliyon

Document Type

Article

Publication Date

5-29-2025

Subjects

Machine learning -- Development

Abstract

The widespread use of social media highlights the need to understand its impact, particularly the role of online social support. In this study, we present a dataset of YouTube comments, initially comprising 66,272 entries, which was refined to 42,695, with a subset of 10,000 comments selected for detailed analysis without additional filtering. The dataset is annotated for three classification tasks: (1) distinguishing supportive from non-supportive comments, (2) determining whether the support is directed at an individual or a group, and (3) further categorizing group support into six subtypes (Nation, LGBTQ, Black Community, Women, Religion, and Other). To address data imbalances in these tasks, we employed K-means clustering to balance the dataset and compared the results with the original unbalanced data. We use state-of-the-art transformer models and zero-shot learning techniques—including GPT-3, GPT-4, and GPT-4o. Our approach, evaluated using macro F1-scores, demonstrates strong performance on the imbalanced dataset, with our transformer-based model (roberta-base) achieving scores of 0.78, 0.84, and 0.80, respectively, across the three classification tasks. These results also demonstrate a macro F1-score improvement of 0.2% for Task 2 and 0.7% for Task 3 compared to previous work that used CNN with GloVe embeddings and traditional machine learning baselines based on TF-IDF and LIWC features.

Rights

Copyright (c) 2025 The Authors Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.

Locate the Document

https://doi.org/10.1016/j.heliyon.2025.e43437

DOI

10.1016/j.heliyon.2025.e43437

Persistent Identifier

https://archives.pdx.edu/ds/psu/43683

Publisher

Elsevier BV

Share

COinS