Published In
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
Document Type
Article
Publication Date
10-21-2024
Abstract
Machine learning traditionally emphasizes developing models for given datasets, but real-world data is often messy, making model improvement insufficient for enhancing performance. Data-Centric AI (DCAI) is an emerging field that systematically improves datasets, leading to significant practical ML advancements. While experienced data scientists have manually refined datasets through trial-and-error and intuition, DCAI approaches data enhancement as a systematic engineering discipline. DCAI represents a shift from focusing on models to the underlying data used for training and evaluation. Despite the dominance of common model architectures and predictable scaling rules, building and using datasets remain labor-intensive and costly, lacking infrastructure and best practices. The DCAI movement aims to develop efficient, high-productivity open data engineering tools for modern ML systems. This workshop seeks to foster an interdisciplinary DCAI community to address practical data challenges, including data collection, generation, labeling, preprocessing, augmentation, quality evaluation, debt, and governance. By defining and shaping the DCAI movement, this workshop aims to influence the future of AI and ML, inviting interested parties to contribute through paper submissions.
Rights
This work is licensed under a Creative Commons Attribution International 4.0 License.
© 2024 Copyright held by the owner/author(s)
DOI
10.1145/3627673.3680118
Persistent Identifier
https://archives.pdx.edu/ds/psu/42687
Citation Details
Fu, Y., Liu, K., & Wang, D. (2024, October). DCAI: The 4th International Workshop on Data-Centric AI. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (pp. 5584-5587).