Timezone: »
We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well. Here we assemble 15 multimodal data tables that each contain some text fields and stem from a real business application. Over this benchmark, we evaluate numerous multimodal AutoML strategies, including a standard two-stage approach where NLP is used to featurize the text such that AutoML for tabular data can then be applied. We propose various practically superior strategies based on multimodal adaptations of Transformer networks and stack ensembling of these networks with classical tabular models. Beyond performing the best in our benchmark, our proposed (fully automated) methodology manages to rank 1st place (against human data scientists) when fit to the raw tabular/text data in two MachineHack prediction competitions and 2nd place (out of 2380 teams) in Kaggle’s Mercari Price Suggestion Challenge.
Author Information
Jonas Mueller (Amazon Web Services)
More from the Same Authors
-
2021 : Multimodal AutoML on Structured Tables with Text Fields »
Xingjian Shi · Jonas Mueller · Nick Erickson · Mu Li · Alex Smola -
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2022 : Adaptive Interest for Emphatic Reinforcement Learning »
Martin Klissarov · Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Taesup Kim · Alex Smola -
2022 : Back to the Basics: Revisiting Out-of-Distribution Detection Baselines »
Johnson Kuan · Jonas Mueller -
2022 : Model-Agnostic Label Quality Scoring to Detect Real-World Label Errors »
Jonas Mueller -
2021 : Q&A Contributed Talk »
Jonas Mueller -
2021 Poster: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2021 Spotlight: Deep Learning for Functional Data Analysis with Adaptive Basis Layers »
Junwen Yao · Jonas Mueller · Jane-Ling Wang -
2020 : 1.2 AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data »
Jonas Mueller -
2020 Poster: Educating Text Autoencoders: Latent Representation Guidance via Denoising »
Tianxiao Shen · Jonas Mueller · Regina Barzilay · Tommi Jaakkola