Published In
Fire 2025 Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation
Document Type
Conference Proceeding
Publication Date
1-12-2026
Subjects
Code-mixed Language Identification, Dravidian languages
Abstract
Code-mixing is considered as a linguistic phenomenon that combines several languages into one text. It has now become very common in multilingual societies, especially in digital communication. Word-Level Identification of Languages in Dravidian Languages (WILD) - a Code-mixed Language Identification (CoLI) in Dravidian languages shared task, organized as a part of Forum for Information Retrieval and Evaluation (FIRE) 2025, put forward these challenges to the researchers by asking them to develop models capable of classifying words in code-mixed texts involving Dravidian languages - Tamil, Telugu, Malayalam, Kannada, and Tulu, which are interwoven with English. It poses significant challenges due to the complexity of linguistic structures, mixed-language tokens, and dialectal variations in low-resource languages such as those from the Dravidian family. The participating teams used different methodologies, ranging from traditional Machine Learning (ML) to Deep Learning and transformer-based models to address these challenges. This paper presents the important findings of the task and an overview of the submitted methodologies.
Rights
Copyright (c) 2025 The Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.
Locate the Document
DOI
10.1145/3777867.3778258
Persistent Identifier
https://archives.pdx.edu/ds/psu/44470
Citation Details
Agrawal, A., Hegde, A., Coelho, S., Butt, S., Balouchzahi, F., V, S., & Hosahalli Lakshmaiah, S. (2025). WILD@FIRE2025: Overview of Word-level Code-Mixed Language Identification in Dravidian Languages. Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation, 25–27.
