Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective

Published In

Acm Transactions on Knowledge Discovery from Data

Document Type

Citation

Publication Date

5-1-2024

Abstract

Feature transformation aims to reconstruct an effective representation space by mathematically refining the existing features. It serves as a pivotal approach to combat the curse of dimensionality, enhance model generalization, mitigate data sparsity, and extend the applicability of classical models. Existing research predominantly focuses on domain knowledge-based feature engineering or learning latent representations. However, these methods, while insightful, lack full automation and fail to yield a traceable and optimal representation space. An indispensable question arises: Can we concurrently address these limitations when reconstructing a feature space for a machine learning task? Our initial work took a pioneering step towards this challenge by introducing a novel self-optimizing framework. This framework leverages the power of three cascading reinforced agents to automatically select candidate features and operations for generating improved feature transformation combinations. Despite the impressive strides made, there was room for enhancing its effectiveness and generalization capability. In this extended journal version, we advance our initial work from two distinct yet interconnected perspectives: 1) We propose a refinement of the original framework, which integrates a graph-based state representation method to capture the feature interactions more effectively and develop different Q-learning strategies to alleviate Q-value overestimation further. 2) We utilize a new optimization technique (actor-critic) to train the entire self-optimizing framework in order to accelerate the model convergence and improve the feature transformation performance. Finally, to validate the improved effectiveness and generalization capability of our framework, we perform extensive experiments and conduct comprehensive analyses. These provide empirical evidence of the strides made in this journal version over the initial work, solidifying our framework’s standing as a substantial contribution to the field of automated feature transformation. To improve the reproducibility, we have released the associated code and data by the Github link https://github.com/coco11563/TKDD2023_code.

Rights

Copyright © 2024 ACM, Inc.

Locate the Document

https://doi.org/10.1145/3638059

DOI

10.1145/3638059

Persistent Identifier

https://archives.pdx.edu/ds/psu/41683

Share

COinS