Published In
ACM International Conference Proceeding Series
Document Type
Article
Publication Date
7-24-2025
Subjects
Code-mixing -- Computer Science
Abstract
Code-mixing, a linguistic phenomenon where multiple languages are blended within a single text, has become increasingly prevalent in multilingual societies, particularly in digital communication. The CoLI-Dravidian shared task, organized as part of Forum for Information Retrieval and Evaluation (FIRE) 2024, aimed to address these challenges by inviting researchers to develop models capable of classifying words in code-mixed texts involving Dravidian languages — Tamil, Kannada, Malayalam, and Tulu - interwoven with English. The task presents significant challenges due to the complexity of linguistic structures, mixed-language tokens, and dialectal variations, especially in low-resource languages like those in the Dravidian family. The participating teams employed various methodologies, including traditional Machine Learning (ML), Deep Learning (DL), and transformer-based models, to tackle these challenges. This paper presents important findings of the task, baselines, and an overview of the submitted methodologies. The top-performing models achieved macro F1 scores ranging from 0.7656 for Tamil to 0.9293 for Kannada, demonstrating the capability of advanced computational techniques to process these complex multilingual texts effectively.
Rights
Copyright (c) 2025 The Authors Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
10.1145/3734947.3735663
Persistent Identifier
https://archives.pdx.edu/ds/psu/44048
Citation Details
Hegde, A., Balouchzahi, F., Butt, S., Coelho, S., G, K., Kumar, H. S., D, S., H. L., S., & Agrawal, A. (2024). CoLI@FIRE2024: Findings of Word-level Code-Mixed Language Identification in Dravidian Languages. Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation, 7–10.