Sponsor
The work was done with partial support from the Mexican Government through the grant A1-S-47854 of CONACYT, Mexico, grants 20241816, 20241819, and 20240951 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD Award.
Published In
Heliyon
Document Type
Article
Publication Date
5-29-2025
Subjects
Machine learning -- Development
Abstract
The widespread use of social media highlights the need to understand its impact, particularly the role of online social support. In this study, we present a dataset of YouTube comments, initially comprising 66,272 entries, which was refined to 42,695, with a subset of 10,000 comments selected for detailed analysis without additional filtering. The dataset is annotated for three classification tasks: (1) distinguishing supportive from non-supportive comments, (2) determining whether the support is directed at an individual or a group, and (3) further categorizing group support into six subtypes (Nation, LGBTQ, Black Community, Women, Religion, and Other). To address data imbalances in these tasks, we employed K-means clustering to balance the dataset and compared the results with the original unbalanced data. We use state-of-the-art transformer models and zero-shot learning techniques—including GPT-3, GPT-4, and GPT-4o. Our approach, evaluated using macro F1-scores, demonstrates strong performance on the imbalanced dataset, with our transformer-based model (roberta-base) achieving scores of 0.78, 0.84, and 0.80, respectively, across the three classification tasks. These results also demonstrate a macro F1-score improvement of 0.2% for Task 2 and 0.7% for Task 3 compared to previous work that used CNN with GloVe embeddings and traditional machine learning baselines based on TF-IDF and LIWC features.
Rights
Copyright (c) 2025 The Authors Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.
Locate the Document
https://doi.org/10.1016/j.heliyon.2025.e43437
DOI
10.1016/j.heliyon.2025.e43437
Persistent Identifier
https://archives.pdx.edu/ds/psu/43683
Publisher
Elsevier BV
Citation Details
Kolesnikova, O., Shahiki Tash, M., Ahani, Z., Agrawal, A., Monroy, R., & Sidorov, G. (2025). Advanced machine learning techniques for social support detection on social media. Heliyon, 11(10), e43437.