Timezone: »
As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems. Our approach combines the advantages of classical ML classifiers (robustness, predictability and a level of interpretability) and LLMs (domain-knowledge and creativity). We introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features for tabular datasets based on the description of the dataset. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets - boosting mean ROC AUC performance from 0.798 to 0.822 across all dataset - similar to the improvement achieved by using a random forest instead of logistic regression on our datasets. Furthermore, CAAFE is interpretable by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive semi-automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML. We release our code, a simple demo and a python package.
Author Information
Noah Hollmann (Albert-Ludwigs-Universität Freiburg)
Samuel Gabriel Müller (Universität Freiburg)
Frank Hutter (University of Freiburg and Bosch Center for Artificial Intelligence)
Frank Hutter is a Full Professor for Machine Learning at the Computer Science Department of the University of Freiburg (Germany), where he has been a faculty member since 2013. Before that, he was at the University of British Columbia (UBC) for eight years, for his PhD and postdoc. Frank's main research interests lie in machine learning, artificial intelligence and automated algorithm design. For his 2009 PhD thesis on algorithm configuration, he received the CAIAC doctoral dissertation award for the best thesis in AI in Canada that year, and with his coauthors, he received several best paper awards and prizes in international competitions on automated machine learning, SAT solving, and AI planning. Since 2016 he holds an ERC Starting Grant for a project on automating deep learning based on Bayesian optimization, Bayesian neural networks, and deep reinforcement learning.
More from the Same Authors
-
2021 : Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization »
Sergio Izquierdo · Julia Guerrero-Viu · Sven Hauns · Guilherme Miotto · Simon Schrodi · André Biedenkapp · Thomas Elsken · Difan Deng · Marius Lindauer · Frank Hutter -
2022 : P30: Meta-Learning Real-Time Bayesian AutoML For Small Tabular Data »
Frank Hutter · Katharina Eggensperger -
2022 : On the Importance of Hyperparameters and Data Augmentation for Self-Supervised Learning »
Diane Wagner · Fabio Ferreira · Danny Stoll · Robin Tibor Schirrmeister · Samuel Gabriel Müller · Frank Hutter -
2023 Poster: PFNs4BO: In-Context Learning for Bayesian Optimization »
Samuel Gabriel Müller · Matthias Feurer · Noah Hollmann · Frank Hutter -
2022 Poster: Zero-shot AutoML with Pretrained Models »
Ekrem Öztürk · Fabio Ferreira · Hadi S Jomaa · Lars Schmidt-Thieme · Josif Grabocka · Frank Hutter -
2022 Spotlight: Zero-shot AutoML with Pretrained Models »
Ekrem Öztürk · Fabio Ferreira · Hadi S Jomaa · Lars Schmidt-Thieme · Josif Grabocka · Frank Hutter -
2021 Workshop: 8th ICML Workshop on Automated Machine Learning (AutoML 2021) »
Gresa Shala · Frank Hutter · Joaquin Vanschoren · Marius Lindauer · Katharina Eggensperger · Colin White · Erin LeDell -
2021 Poster: Self-Paced Context Evaluation for Contextual Reinforcement Learning »
Theresa Eimer · André Biedenkapp · Frank Hutter · Marius Lindauer -
2021 Poster: TempoRL: Learning When to Act »
André Biedenkapp · Raghu Rajan · Frank Hutter · Marius Lindauer -
2021 Spotlight: TempoRL: Learning When to Act »
André Biedenkapp · Raghu Rajan · Frank Hutter · Marius Lindauer -
2021 Spotlight: Self-Paced Context Evaluation for Contextual Reinforcement Learning »
Theresa Eimer · André Biedenkapp · Frank Hutter · Marius Lindauer -
2020 Workshop: 7th ICML Workshop on Automated Machine Learning (AutoML 2020) »
Frank Hutter · Joaquin Vanschoren · Marius Lindauer · Charles Weill · Katharina Eggensperger · Matthias Feurer · Matthias Feurer -
2020 : Welcome »
Frank Hutter -
2019 : Closing Remarks »
Frank Hutter -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 : Welcome »
Frank Hutter -
2019 Workshop: 6th ICML Workshop on Automated Machine Learning (AutoML 2019) »
Frank Hutter · Joaquin Vanschoren · Katharina Eggensperger · Matthias Feurer · Matthias Feurer -
2019 Poster: NAS-Bench-101: Towards Reproducible Neural Architecture Search »
Chris Ying · Aaron Klein · Eric Christiansen · Esteban Real · Kevin Murphy · Frank Hutter -
2019 Oral: NAS-Bench-101: Towards Reproducible Neural Architecture Search »
Chris Ying · Aaron Klein · Eric Christiansen · Esteban Real · Kevin Murphy · Frank Hutter -
2019 Tutorial: Algorithm configuration: learning in the space of algorithm designs »
Kevin Leyton-Brown · Frank Hutter -
2018 Poster: BOHB: Robust and Efficient Hyperparameter Optimization at Scale »
Stefan Falkner · Aaron Klein · Frank Hutter -
2018 Oral: BOHB: Robust and Efficient Hyperparameter Optimization at Scale »
Stefan Falkner · Aaron Klein · Frank Hutter