All Translation Tools Are Not Equal: Investigating the Quality of Language Translation for Forced Migration

Published In

2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)

Document Type


Publication Date



As the volume and complexity of forced movement continues to grow, there is an urgent need to use new data sources to better understand emerging crises. Organic sources, like social media and newspapers, can offer insights in near real time when administrative data are unavailable for timely and detailed analysis. However, in order to flexibly switch to different contexts, we need the ability to contextualize the drivers of movement for different locations and languages. Recent advances in natural language processing and specifically, neural machine translation, have shown impressive results on standard benchmark datasets for well-studied language pairs. However, the effectiveness of these models in a real-world scenario remains less known. To advance our understanding of real-world, contextual translation, we systematically study the performance of multiple widely used off-the-shelf machine translation tools using words associated with drivers of forced movement in both high- and low-resource languages. Our empirical results suggest significant variation between the performance of these machine translation tools in terms of accuracy and efficiency, highlighting a problem that must be faced by those conducting migration research using multilingual contexts. We conclude by suggesting strategies for obtaining reasonable translations from off-the-shelf language tools.





Persistent Identifier