Computer Science Faculty Publications and Presentations

Coli@fire2024: Findings of Word-Level Code-Mixed Language Identification in Dravidian Languages

Asha Hegde, Mangalore University
Fazlourrahman Balouchzahi, Instituto Politécnico Nacional
Sabur Butt, Institute for the Future Education
Sharal Coelho, Mangalore University
Kavya G, Mangalore University
Harshitha S. Kumar, Mangalore University
Sonith D, Mangalore University
Shashirekha H. L., Mangalore University
Ameeta Agrawal, Portland State UniversityFollow

Published In

ACM International Conference Proceeding Series

Document Type

Article

Publication Date

7-24-2025

Subjects

Code-mixing -- Computer Science

Abstract

Code-mixing, a linguistic phenomenon where multiple languages are blended within a single text, has become increasingly prevalent in multilingual societies, particularly in digital communication. The CoLI-Dravidian shared task, organized as part of Forum for Information Retrieval and Evaluation (FIRE) 2024, aimed to address these challenges by inviting researchers to develop models capable of classifying words in code-mixed texts involving Dravidian languages — Tamil, Kannada, Malayalam, and Tulu - interwoven with English. The task presents significant challenges due to the complexity of linguistic structures, mixed-language tokens, and dialectal variations, especially in low-resource languages like those in the Dravidian family. The participating teams employed various methodologies, including traditional Machine Learning (ML), Deep Learning (DL), and transformer-based models, to tackle these challenges. This paper presents important findings of the task, baselines, and an overview of the submitted methodologies. The top-performing models achieved macro F1 scores ranging from 0.7656 for Tamil to 0.9293 for Kannada, demonstrating the capability of advanced computational techniques to process these complex multilingual texts effectively.

Rights

DOI

10.1145/3734947.3735663

Persistent Identifier

https://archives.pdx.edu/ds/psu/44048

Citation Details

Hegde, A., Balouchzahi, F., Butt, S., Coelho, S., G, K., Kumar, H. S., D, S., H. L., S., & Agrawal, A. (2024). CoLI@FIRE2024: Findings of Word-level Code-Mixed Language Identification in Dravidian Languages. Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation, 7–10.

Download

Included in

Computer Sciences Commons

COinS