Published In

ACM International Conference Proceeding Series

Document Type

Article

Publication Date

7-24-2025

Subjects

Code-mixing -- Computer Science

Abstract

Code-mixing, a linguistic phenomenon where multiple languages are blended within a single text, has become increasingly prevalent in multilingual societies, particularly in digital communication. The CoLI-Dravidian shared task, organized as part of Forum for Information Retrieval and Evaluation (FIRE) 2024, aimed to address these challenges by inviting researchers to develop models capable of classifying words in code-mixed texts involving Dravidian languages — Tamil, Kannada, Malayalam, and Tulu - interwoven with English. The task presents significant challenges due to the complexity of linguistic structures, mixed-language tokens, and dialectal variations, especially in low-resource languages like those in the Dravidian family. The participating teams employed various methodologies, including traditional Machine Learning (ML), Deep Learning (DL), and transformer-based models, to tackle these challenges. This paper presents important findings of the task, baselines, and an overview of the submitted methodologies. The top-performing models achieved macro F1 scores ranging from 0.7656 for Tamil to 0.9293 for Kannada, demonstrating the capability of advanced computational techniques to process these complex multilingual texts effectively.

Rights

Copyright (c) 2025 The Authors Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

10.1145/3734947.3735663

Persistent Identifier

https://archives.pdx.edu/ds/psu/44048

Share

COinS