Automated Model Selection for Tabular Data
arxiv(2024)
摘要
Structured data in the form of tabular datasets contain features that are
distinct and discrete, with varying individual and relative importances to the
target. Combinations of one or more features may be more predictive and
meaningful than simple individual feature contributions. R's mixed effect
linear models library allows users to provide such interactive feature
combinations in the model design. However, given many features and possible
interactions to select from, model selection becomes an exponentially difficult
task. We aim to automate the model selection process for predictions on tabular
datasets incorporating feature interactions while keeping computational costs
small. The framework includes two distinct approaches for feature selection: a
Priority-based Random Grid Search and a Greedy Search method. The
Priority-based approach efficiently explores feature combinations using prior
probabilities to guide the search. The Greedy method builds the solution
iteratively by adding or removing features based on their impact. Experiments
on synthetic demonstrate the ability to effectively capture predictive feature
combinations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要