Sponsor
This work was supported by the National Science Foundation grant number 2244271 and the Massive Data Institute at Georgetown University.
Published In
EPJ Data Science
Document Type
Article
Publication Date
10-19-2025
Subjects
Computer science, Information storage and retrieval systems
Abstract
Digital trace data play an important role determining where and when people will move during migration crises because of their detailed temporal and spatial granularity. Yet, identifying variables that reliably serve as early indicators of movement remains a challenging task. Within this context, we conduct an in-depth analysis of two types of variables that can be constructed from social media data – sentiment and emotion. Sentiment is conceptually broad and easier to detect from social media posts, while emotion is conceptually nuanced and more difficult to determine. We investigate the potential of both sentiment and emotion of Twitter/X posts as indirect indicators of border crossings from Ukraine by comparing sentiment and emotion to one another at different temporal scales (daily, weekly, monthly), and relating each of them to border crossing counts using a lead-lag analysis. We find that sentiment is a better early warning indicator across temporal scales. We then extend our analysis to consider two other displacement case studies, Sudan and Venezuela, to see how transferable these results are. Finally, there are different approaches for measuring sentiment and emotion, each with different levels of explainability and computational cost (lexicon-based, classic machine learning, and deep learning). We consider all these variations in the three languages associated with our case studies and find that deep learning using Pretrained Language Models, such as mBERT and BETO, performs significantly better than more interpretable approaches, despite relatively small training data sets. In summary, we conclude that migration scholars generally fare better with “simple but broad” as opposed to “nuanced but complex” signals and also find that as the temporal resolution decreases, these types of signals are not well correlated and therefore, may be less useful for determining when people will move during times of crisis.
Rights
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Locate the Document
https://doi.org/10.1140/epjds/s13688-025-00587-1
DOI
10.1140/epjds/s13688-025-00587-1
Persistent Identifier
https://archives.pdx.edu/ds/psu/44193
Citation Details
Marahrens, H., Agrawal, A., Arab, A., Donato, K., Liu, Y., Wycoff, N., Ahmed, M., Hwang, C., Laghzaoui, L., Liggio, K., Medeiros, B., Park, J., Pihlstrom, R., Salamon, E., Whitlow, M., & Singh, L. (2025). Understanding the role of sentiment and emotion for predicting forced displacement. EPJ Data Science, 14(1).
 
				