Timezone: »
Machine learning (ML) has achieved considerable successes in recent years, but this success often relies on human experts, who construct appropriate features, design learning architectures, set their hyperparameters, and develop new learning algorithms. Driven by the demand for robust, off-the-shelf ML methods from an ever-growing community, the research area of AutoML targets the progressive automation of machine learning aiming to make effective methods available to everyone. Hence, the workshop targets a broad audience ranging from core ML researchers in different fields of ML connected to AutoML, such as neural architecture search (NAS), hyperparameter optimization, meta-learning, and learning-to-learn, to domain experts aiming to apply ML to new types of problems.
Fri 6:00 a.m. - 6:05 a.m.
|
Welcome
(
Intro
)
SlidesLive Video » |
🔗 |
Fri 6:06 a.m. - 6:35 a.m.
|
Invited Talk by Matthias Feurer: Towards hands-free AutoML
(
Invited Talk
)
SlidesLive Video » |
Matthias Feurer 🔗 |
Fri 6:35 a.m. - 6:45 a.m.
|
Q&A Matthias Feurer
(
Q&A
)
|
Matthias Feurer 🔗 |
Fri 6:46 a.m. - 7:15 a.m.
|
Invited Talk by Ellen Vitercik: Automated Parameter Optimization for Integer Programming
(
Invited Talk
)
SlidesLive Video » |
Ellen Vitercik 🔗 |
Fri 7:15 a.m. - 7:25 a.m.
|
Q&A Ellen Vitercik
(
Q&A
)
|
Ellen Vitercik 🔗 |
Fri 7:25 a.m. - 7:26 a.m.
|
A resource-efficient method for repeated HPO and NAS problems
(
Spotlight
)
SlidesLive Video » In this work we consider the problem of repeated hyperparameter and neural architecture search (HNAS).We propose an extension of Successive Halving that is able to leverage information gained in previous HNAS problems with the goal of saving computational resources. We empirically demonstrate that our solution is able to drastically decrease costs while maintaining accuracy and being robust to negative transfer. Our method is significantly simpler than competing transfer learning approaches, setting a new baseline for transfer learning in HNAS. |
|
Fri 7:26 a.m. - 7:27 a.m.
|
Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization
(
Spotlight
)
SlidesLive Video » When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook. |
|
Fri 7:27 a.m. - 7:28 a.m.
|
GPy-ABCD: A Configurable Automatic Bayesian Covariance Discovery Implementation
(
Spotlight
)
SlidesLive Video » Gaussian Processes (GPs) are a very flexible class of nonparametric models frequently used in supervised learning tasks because of their ability to fit data with very few assumptions, namely just the type of correlation (kernel) the data is expected to display. Automatic Bayesian Covariance Discovery (ABCD) is an iterative GP regression framework aimed at removing the requirement for even this initial correlation form assumption. An original ABCD implementation exists and is a complex stand-alone system designed to produce long-form text analyses of provided data. This paper presents a lighter, more functional and configurable implementation of the ABCD idea, outputting only fit models and short descriptions: the Python package GPy-ABCD, which was developed as part of an adaptive modelling component for the FRANK query-answering system. It uses a revised model-space search algorithm and removes a search bias which was required in order to retain model explainability in the original system. |
|
Fri 7:28 a.m. - 7:29 a.m.
|
Bandit Limited Discrepancy Search and Application to Machine Learning Pipeline Optimization
(
Spotlight
)
SlidesLive Video » Optimizing a machine learning (ML) pipeline has been an important topic of AI and ML. Despite recent progress, this topic remains a challenging problem, due to potentially many combinations to consider as well as slow training and validation. We present the BLDS algorithm for optimized algorithm selection (ML operations) in a fixed ML pipeline structure. BLDS performs multi-fidelity optimization for selecting ML algorithms trained with smaller computational overhead, while controlling its pipeline search based on multi-armed bandit and limited discrepancy search. Our experiments on well-known benchmarks show that BLDS is superior to competing algorithms. |
|
Fri 7:29 a.m. - 7:30 a.m.
|
Towards Model Selection using Learning Curve Cross-Validation
(
Spotlight
)
SlidesLive Video » Cross-validation (CV) methods such as leave-one-out cross-validation, k-fold cross-validation, and Monte-Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing on the remaining data. These techniques have two drawbacks. First, they can be unnecessarily slow on large datasets. Second, providing only point estimates, they give almost no insights into the learning process of the validated algorithm. In this paper, we propose a new approach for validation based on learning curves (LCCV). Instead of creating train-test splits with a large portion of training data, LCCV iteratively increases the number of training examples used for training. In the context of model selection, it eliminates models that can be safely dismissed from the candidate pool. We run a large scale experiment on the 67 datasets from the AutoML benchmark, and empirically show that LCCV in over 90\% of the cases leads to similar performance (at most 0.5\% difference) as 10-fold CV, but provides additional insights on the behaviour of a given model. On top of this, LCCV results in runtime reductions between 20% and over 50% on half of the 67 datasets from the AutoML benchmark. This can be incorporated in various AutoML frameworks, to speed up the internal evaluation of candidate models. As such, these results can be used orthogonal to other advances in the field of AutoML. |
|
Fri 7:30 a.m. - 7:31 a.m.
|
Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio
(
Spotlight
)
SlidesLive Video » While training highly overparameterized neural networks is common practice in deep learning, research into post-hoc weight-pruning suggests that more than 90% of parameters can be removed without loss in predictive performance. To save resources, zero-shot and one-shot pruning attempt to find such a sparse representation at initialization or at an early stage of training. Though efficient, there is no justification, why the sparsity structure should not change during training. Dynamic sparsity pruning undoes this limitation and allows to adapt the structure of the sparse neural network during training. Recent approaches rely on weight magnitude pruning, which has been shown to be sub-optimal when applied at earlier training stages. In this work we propose to use the gradient noise to make pruning decisions. The procedure enables us to automatically adjust the sparsity during training without imposing a hand-designed sparsity schedule, while at the same time being able to recover from previous pruning decisions by unpruning connections as necessary. We evaluate our new method on image and tabular datasets and demonstrate that we reach similar performance as the dense model from which extract the sparse network, while exposing less hyperparameters than other dynamic sparsity methods. |
|
Fri 7:31 a.m. - 7:32 a.m.
|
AutoML Adoption in ML Software
(
Spotlight
)
SlidesLive Video » Machine learning (ML) has become essential to a vast range of applications, while ML experts are in short supply. To alleviate this problem, AutoML aims to make ML easier and more efficient to use. Even so, it is not clear to which extent AutoML techniques are actually adopted in an engineering context, nor what facilitates or inhibits adoption. To study this, we define AutoML engineering practices, measure their adoption through surveys, and distil first insights into factors influencing adoption from two initial interviews. Depending on the practice, results show around 20 to 30% of the respondents have not adopted it at all and many more only partially, leaving substantial room for increases in adoption. The interviews indicate adoption may in part be inhibited by usability issues with AutoML frameworks and the increased computational resources needed for adoption. |
|
Fri 7:32 a.m. - 7:33 a.m.
|
Leveraging Theoretical Tradeoffs in Hyperparameter Selection for Improved Empirical Performance
(
Spotlight
)
SlidesLive Video »
The tradeoffs in the excess risk incurred from data-driven learning of a single model has been studied by decomposing the excess risk into approximation, estimation and optimization errors. In this paper, we focus on the excess risk incurred in data-driven hyperparameter optimization (HPO) and its interaction with approximate empirical risk minimization (ERM) necessitated by large data. We present novel bounds for the excess risk in various common scenarios in HPO. Based on these results, we propose practical heuristics that allow us to improve performance or reduce computational overhead of data-driven HPO, demonstrating over $2 \times$ speedup with no loss in predictive performance in our preliminary results.
[link to postersession #1](https://eventhosts.gather.town/app/WmNodofE2Oab573H/automl-poster-session-1)
|
Parikshit Ram 🔗 |
Fri 7:33 a.m. - 7:34 a.m.
|
Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization
(
Spotlight
)
SlidesLive Video » While both neural architecture search (NAS) and hyperparameter optimization (HPO) have been studied extensively in recent years, NAS methods typically assume fixed hyperparameters and vice versa. Furthermore, NAS has recently often been framed as a multi-objective optimization problem, in order to take, e.g., resource requirements into account. In this paper, we propose a set of methods that extend current approaches to jointly optimize neural architectures and hyperparameters with respect to multiple objectives. We hope that these methods will serve as simple baselines for future research on multi-objective joint NAS + HPO. |
|
Fri 7:35 a.m. - 7:36 a.m.
|
Towards Explaining Hyperparameter Optimization via Partial Dependence Plots
(
Spotlight
)
SlidesLive Video » Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of comprehensibility and transparency makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO and especially discuss the popular case of Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, like Partial Dependence Plots (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. In addition, we propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions. |
|
Fri 7:36 a.m. - 7:37 a.m.
|
Mutation is all you need
(
Spotlight
)
SlidesLive Video » Neural architecture search (NAS) promises to make deep learning accessible to non-experts by automating architecture engineering of deep neural networks. BANANAS is one state-of-the-art NAS method that is embedded within the Bayesian optimization framework. Recent experimental findings have demonstrated the strong performance of BANANAS on the NAS-Bench-101 benchmark being determined by its path encoding and not its choice of surrogate model. We present experimental results suggesting that the performance of BANANAS on the NAS-Bench-301 benchmark is determined by its acquisition function optimizer, which minimally mutates the incumbent. |
|
Fri 7:37 a.m. - 7:38 a.m.
|
Meta Learning the Step Size in Policy Gradient Methods
(
Spotlight
)
SlidesLive Video » Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance and, as a consequence, they tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on the ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments. |
|
Fri 7:40 a.m. - 9:00 a.m.
|
Poster Session #1 ( Poster Session ) link » | 🔗 |
Fri 9:00 a.m. - 9:10 a.m.
|
Contributed Talk: Discovering Weight Initializers with Meta Learning
(
Contributed Talk
)
SlidesLive Video » Deep neural network training largely depends on the choice of initial weight distribution. However, this choice can often be nontrivial. Existing theoretical results for this problem mostly cover simple architectures, e.g., feedforward networks with ReLU activations. The architectures used for practical problems are more complex and often incorporate many overlapping modules, making them challenging for theoretical analysis. Therefore, practitioners have to use heuristic initializers with questionable optimality and stability. In this study, we propose a task-agnostic approach that discovers initializers for specific network architectures and optimizers by learning the initial weight distributions directly through the use of Meta-Learning. In several supervised and unsupervised learning scenarios, we show the advantage of our initializers in terms of both faster convergence and higher model performance. |
|
Fri 9:10 a.m. - 9:15 a.m.
|
Q&A Contributed Talk
(
Q&A
)
|
Dmitry Baranchuk 🔗 |
Fri 9:15 a.m. - 9:25 a.m.
|
Contributed Talk: Multimodal AutoML on Structured Tables with Text Fields
(
Contributed Talk
)
SlidesLive Video » We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well. Here we assemble 15 multimodal data tables that each contain some text fields and stem from a real business application. Over this benchmark, we evaluate numerous multimodal AutoML strategies, including a standard two-stage approach where NLP is used to featurize the text such that AutoML for tabular data can then be applied. We propose various practically superior strategies based on multimodal adaptations of Transformer networks and stack ensembling of these networks with classical tabular models. Beyond performing the best in our benchmark, our proposed (fully automated) methodology manages to rank 1st place (against human data scientists) when fit to the raw tabular/text data in two MachineHack prediction competitions and 2nd place (out of 2380 teams) in Kaggle’s Mercari Price Suggestion Challenge. |
|
Fri 9:25 a.m. - 9:30 a.m.
|
Q&A Contributed Talk
(
Q&A
)
|
Jonas Mueller 🔗 |
Fri 9:30 a.m. - 9:40 a.m.
|
Contributed Talk: Automated Discovery of Adaptive Attacks on Adversarial Defenses
(
Contributed Talk
)
SlidesLive Video » Reliable evaluation of adversarial defenses is a challenging task, currently limited to an expert who manually crafts attacks that exploit the defense’s inner workings, or to approaches based on ensemble of fixed attacks, none of which may be effective for the specific defense at hand. Our key observation is that custom attacks are composed from a set of reusable building blocks, such as fine-tuning relevant attack parameters, network transformations, and custom loss functions. Based on this observation, we present an extensible framework that defines a search space over these reusable building blocks and automatically discovers an effective attack on a given model with an unknown defense by searching over suitable combinations of these blocks. We evaluated our framework on 23 adversarial defenses and showed it outperforms AutoAttack, the current state-of-the-art tool for reliable evaluation of adversarial defenses: our discovered attacks are either stronger, producing 3.0%-50.8% additional adversarial examples (10 cases), or are typically 2x faster while enjoying similar adversarial robustness (13 cases). |
|
Fri 9:40 a.m. - 9:45 a.m.
|
Q&A Contributed Talk
(
Q&A
)
|
Chengyuan Yao 🔗 |
Fri 9:45 a.m. - 9:46 a.m.
|
Sequential Automated Machine Learning: Bandits-driven Exploration using a Collaborative Filtering Representation
(
Spotlight
)
SlidesLive Video » The goal of Automated Machine Learning (AutoML) is to make Machine Learning (ML) tools more accessible. Collaborative Filtering (CF) methods have shown great success in automating the creation of machine learning pipelines. In this work, we frame the AutoML problem under a sequential setting where datasets arrive one at a time. On each dataset, an agent can try a small number of pipelines (exploration) before recommending a pipeline for this dataset (recommendation). The goal is to maximize the performance of the recommended pipelines over the sequence of datasets. More specifically, we focus on the exploration policy used for selecting the pipelines to explore before making the recommendation. We propose an approach based on the LinUCB bandit algorithm that leverages the latent representations extracted from matrix factorization (MF). We show that the exploration policy impacts the recommendation performance and that MF-based latent representations are more useful for exploration than for recommendation. |
|
Fri 9:46 a.m. - 9:47 a.m.
|
LRTuner: A Learning Rate Tuner for Deep Neural Networks
(
Spotlight
)
SlidesLive Video » One very important hyper-parameter for training deep neural networks is the learning rate schedule of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalization accuracy of the network. Current systems employ hand tuned learning rate schedules, which are painstakingly tuned for each network and dataset. Given that the state space of schedules is huge, finding a satisfactory learning rate schedule can be very time consuming. In this paper, we present LRTuner, a method for tuning the learning rate as training proceeds. Our method works with any optimizer, and we demonstrate results on SGD with Momentum, and Adam optimizers. We extensively evaluate LRTuner on multiple datasets, models, and across optimizers. We compare favorably against standard learning rate schedules for the given dataset and models, including ImageNet on Resnet-50, Cifar-10 on Resnet-18, and SQuAD fine-tuning on BERT. For example on ImageNet with Resnet-50, LRTuner shows up to 0.2% absolute gains in test accuracy compared to the hand-tuned baseline schedule. Moreover, LRTuner can achieve the same accuracy as the baseline schedule in 29% less optimization steps. |
|
Fri 9:47 a.m. - 9:48 a.m.
|
PonderNet: Learning to Ponder
(
Spotlight
)
SlidesLive Video » In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous adaptive computation methods and additionally succeeds at extrapolation tests where traditional neural networks fail. Also, our method matched the current state of the art results on a real world question and answering dataset, but using less compute. Finally, PonderNet reached state of the art results on a complex task designed to test the reasoning capabilities of neural networks. |
|
Fri 9:48 a.m. - 9:49 a.m.
|
Replacing the Ex-Def Baseline in AutoML by Naive AutoML
(
Spotlight
)
SlidesLive Video » Automated Machine Learning (AutoML) is the problem of automatically finding the pipeline with the best generalization performance on some given dataset. AutoML has received enormous attention in the last decade and has been addressed with sophisticated black-box optimization techniques like Bayesian Optimization, Genetic Algorithms, or Tree Search. These approaches are almost never compared to simple baselines to see how much they improve over simple but easy to implement approaches. We present Naive AutoML, a very simple baseline for AutoML that exploits meta-knowledge about machine learning problems and makes simplifying, yet, effective assumptions to quickly come to high-quality solutions. In 1h experiments, state of the art approaches can hardly improve over Naive AutoML which in turn comes along with advantages such as interpretability and flexibility. |
|
Fri 9:49 a.m. - 9:50 a.m.
|
Neural Fixed-Point Acceleration for Convex Optimization
(
Spotlight
)
SlidesLive Video » Fixed-point iterations are at the heart of numerical computing and are often a computational bottleneck in real-time applications, which typically instead need a fast solution of moderate accuracy. Classical acceleration methods for fixed-point problems focus on designing algorithms with theoretical guarantees that apply to any fixed-point problem. We present neural fixed-point acceleration, a framework to automatically learn to accelerate convex fixed-point problems that are drawn from a distribution, using ideas from meta-learning and classical acceleration algorithms. We apply our framework to SCS, the state-of-the-art solver for convex cone programming, and design models and loss functions to overcome the challenges of learning over unrolled optimization and acceleration instabilities. Our work brings neural acceleration into any optimization problem expressible with CVXPY. This is relevant to AutoML as we (meta-)learn improvements to a convex optimization solver that replaces an acceleration component that is traditionally hand-crafted. |
|
Fri 9:50 a.m. - 9:51 a.m.
|
Ranking Architectures by their Feature Extraction Capabilities
(
Spotlight
)
SlidesLive Video » The fundamental problem in Neural Architecture Search (NAS) is to efficiently find highperforming ones from a search space of architectures. We propose a simple but powerful method for ranking architectures FEAR in any search space. FEAR leverages the viewpoint that neural networks are powerful non-linear feature extractors. By training different architectures in the search space to the same training or validation error and subsequently comparing the usefulness of the features extracted on the task-dataset of interest by freezing most of the architecture we obtain quick estimates of the relative performance. We validate FEAR on Natsbench topology search space on three different datasets against competing baselines and show strong ranking correlation especially compared to recently proposed zero-cost methods. FEAR especially excels at ranking high-performance architectures in the search space. When used in the inner loop of discrete search algorithms like random search, FEAR can cut down the search time by approximately 2.4x without losing accuracy. We additionally empirically study very recently proposed zero-cost measures for ranking and find that they breakdown in ranking performance as training proceeds and also that data-agnostic ranking scores which ignore the dataset do not generalize across dissimilar datasets. |
|
Fri 9:51 a.m. - 9:52 a.m.
|
Incorporating domain knowledge into neural-guided search via in situ priors and constraints
(
Spotlight
)
SlidesLive Video »
Many AutoML problems involve optimizing discrete objects under a black-box reward. Neural-guided search provides a flexible means of searching these combinatorial spaces using an autoregressive recurrent neural network. A major benefit of this approach is that builds up objects $\textit{sequentially}$—this provides an opportunity to incorporate domain knowledge into the search by directly modifying the logits emitted during sampling. In this work, we formalize a framework for incorporating such $\textit{in situ}$ priors and constraints into neural-guided search, and provide sufficient conditions for enforcing constraints. We integrate several priors and constraints from existing works into this framework, propose several new ones, and demonstrate their efficacy in informing the task of symbolic regression.
[link to postersession #2](https://eventhosts.gather.town/app/5pkQC6W2dWi4qPMO/automl-poster-session-2)
|
Mikel Landajuela Larma 🔗 |
Fri 9:52 a.m. - 9:53 a.m.
|
Tabular Data: Deep Learning is Not All You Need
(
Spotlight
)
SlidesLive Video » A key element of AutoML systems is setting the types of models that will be used for each type of task. For classification and regression problems with tabular data, the use of tree ensemble models (like XGBoost) is usually recommended. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use-cases. In this paper, we explore whether these deep models should be a recommended option for tabular data, by rigorously comparing the new deep models to XGBoost on a variety of datasets. In addition to systematically comparing their accuracy, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of the deep models and XGBoost performs better on these datasets than XGBoost alone. |
|
Fri 9:53 a.m. - 9:54 a.m.
|
Automated Learning Rate Scheduler for Large-batch Training
(
Spotlight
)
SlidesLive Video » Large-batch training has been essential in leveraging large-scale datasets and models in deep learning. While it is computationally beneficial to use large batch sizes, it often requires a specially designed learning rate (LR) schedule to achieve a comparable level of performance as in smaller batch training. Especially, when the number of training epochs is constrained, the use of a large LR and a warmup strategy is critical in the final performance of large-batch training due to the reduced number of updating steps. In this work, we propose an automated LR scheduling algorithm which is effective for neural network training with a large batch size under the given epoch budget. In specific, the whole schedule consists of two phases: adaptive warmup and predefined decay, where the LR is increased until the training loss no longer decreases and decreased to zero until the end of training. Here, whether the training loss has reached the minimum value is robustly checked with Gaussian process smoothing in an online manner with a low computational burden. Coupled with adaptive stochastic optimizers such as AdamP and LAMB, the proposed scheduler successfully adjusts the LRs without cumbersome hyperparameter tuning and achieves comparable or better performances than tuned baselines on various image classification benchmarks and architectures with a wide range of batch sizes. |
|
Fri 9:54 a.m. - 9:55 a.m.
|
Adaptation-Agnostic Meta-Training
(
Spotlight
)
SlidesLive Video » Many meta-learning algorithms can be formulated into an interleaved process, in the sense that task-specific predictors are learned during inner-task adaptation and meta-parameters are updated during meta-update. The normal meta-training strategy needs to differentiate through the inner-task adaptation procedure to optimize the meta-parameters. This leads to a constraint that the inner-task algorithms should be solved analytically. Under this constraint, only simple algorithms with analytical solutions can be applied as the inner-task algorithms, limiting the model expressiveness. To lift the limitation, we propose an adaptation-agnostic meta-training strategy. Following our proposed strategy, we are capable to apply stronger algorithms (e.g., an ensemble of different types of algorithms) as the inner-task algorithm to achieve superior performance comparing with popular baselines. |
|
Fri 9:55 a.m. - 9:56 a.m.
|
On-the-fly learning of adaptive strategies with bandit algorithms
(
Spotlight
)
SlidesLive Video » Automation of machine learning model development is increasingly becoming an established research area. While automated model selection and automated data pre-processing have been studied in depth, there is, however, a gap concerning automated model adaptation strategies for streaming data with non-stationarities. This has previously been addressed by heuristic generic adaptation strategies in the batch streaming setting. While showing promising performance, these strategies contain some limitations. In this work, we propose using multi-armed bandit algorithms for learning adaptive strategies from incrementally streaming data on-the-fly. Empirical results using established bandit algorithms show a comparable performance to two common stream learning algorithms. |
|
Fri 9:56 a.m. - 11:00 a.m.
|
Poster Session #2 ( Poster Session ) link » | 🔗 |
Fri 11:01 a.m. - 11:30 a.m.
|
Invited Talk by Kim Montgomery: Bias, Controlling Bias, and AutoML
(
Invited Talk
)
SlidesLive Video » |
Kim Montgomery 🔗 |
Fri 11:30 a.m. - 11:40 a.m.
|
Q&A Kim Montgomery
(
Q&A
)
|
Kim Montgomery 🔗 |
Fri 11:41 a.m. - 12:10 p.m.
|
Invited Talk by Mi Zhang: Encoding is an Important Design Decision in Neural Architecture Search
(
Invited Talk
)
SlidesLive Video » |
Mi Zhang 🔗 |
Fri 12:10 p.m. - 12:20 p.m.
|
Q&A Mi Zhang
(
Q&A
)
|
Mi Zhang 🔗 |
Fri 12:40 p.m. - 1:30 p.m.
|
Panel Discussion
SlidesLive Video » |
🔗 |
Author Information
Gresa Shala (University of Freiburg)
Frank Hutter (University of Freiburg and Bosch Center for Artificial Intelligence)
Frank Hutter is a Full Professor for Machine Learning at the Computer Science Department of the University of Freiburg (Germany), where he has been a faculty member since 2013. Before that, he was at the University of British Columbia (UBC) for eight years, for his PhD and postdoc. Frank's main research interests lie in machine learning, artificial intelligence and automated algorithm design. For his 2009 PhD thesis on algorithm configuration, he received the CAIAC doctoral dissertation award for the best thesis in AI in Canada that year, and with his coauthors, he received several best paper awards and prizes in international competitions on automated machine learning, SAT solving, and AI planning. Since 2016 he holds an ERC Starting Grant for a project on automating deep learning based on Bayesian optimization, Bayesian neural networks, and deep reinforcement learning.
Joaquin Vanschoren (Eindhoven University of Technology)
Marius Lindauer (Leibniz University Hannover)
Katharina Eggensperger (University of Freiburg)
Colin White (Abacus.AI)
Erin LeDell (H2O.AI)
More from the Same Authors
-
2021 : Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization »
Sergio Izquierdo · Julia Guerrero-Viu · Sven Hauns · Guilherme Miotto · Simon Schrodi · André Biedenkapp · Thomas Elsken · Difan Deng · Marius Lindauer · Frank Hutter -
2021 : Automatic Risk Adaptation in Distributional Reinforcement Learning »
Frederik Schubert · Theresa Eimer · Bodo Rosenhahn · Marius Lindauer -
2022 : P30: Meta-Learning Real-Time Bayesian AutoML For Small Tabular Data »
Frank Hutter · Katharina Eggensperger -
2022 : On the Importance of Hyperparameters and Data Augmentation for Self-Supervised Learning »
Diane Wagner · Fabio Ferreira · Danny Stoll · Robin Tibor Schirrmeister · Samuel Gabriel Müller · Frank Hutter -
2023 : CAAFE: Combining Large Language Models with Tabular Predictors for Semi-Automated Data Science »
Noah Hollmann · Samuel Gabriel Müller · Frank Hutter -
2023 Poster: PFNs4BO: In-Context Learning for Bayesian Optimization »
Samuel Gabriel Müller · Matthias Feurer · Noah Hollmann · Frank Hutter -
2022 Poster: Zero-shot AutoML with Pretrained Models »
Ekrem Öztürk · Fabio Ferreira · Hadi S Jomaa · Lars Schmidt-Thieme · Josif Grabocka · Frank Hutter -
2022 Spotlight: Zero-shot AutoML with Pretrained Models »
Ekrem Öztürk · Fabio Ferreira · Hadi S Jomaa · Lars Schmidt-Thieme · Josif Grabocka · Frank Hutter -
2021 Poster: Self-Paced Context Evaluation for Contextual Reinforcement Learning »
Theresa Eimer · André Biedenkapp · Frank Hutter · Marius Lindauer -
2021 Poster: TempoRL: Learning When to Act »
André Biedenkapp · Raghu Rajan · Frank Hutter · Marius Lindauer -
2021 Spotlight: TempoRL: Learning When to Act »
André Biedenkapp · Raghu Rajan · Frank Hutter · Marius Lindauer -
2021 Spotlight: Self-Paced Context Evaluation for Contextual Reinforcement Learning »
Theresa Eimer · André Biedenkapp · Frank Hutter · Marius Lindauer -
2020 Workshop: 7th ICML Workshop on Automated Machine Learning (AutoML 2020) »
Frank Hutter · Joaquin Vanschoren · Marius Lindauer · Charles Weill · Katharina Eggensperger · Matthias Feurer · Matthias Feurer -
2020 : Welcome »
Frank Hutter -
2019 : Closing Remarks »
Frank Hutter -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 : Welcome »
Frank Hutter -
2019 Workshop: 6th ICML Workshop on Automated Machine Learning (AutoML 2019) »
Frank Hutter · Joaquin Vanschoren · Katharina Eggensperger · Matthias Feurer · Matthias Feurer -
2019 Poster: NAS-Bench-101: Towards Reproducible Neural Architecture Search »
Chris Ying · Aaron Klein · Eric Christiansen · Esteban Real · Kevin Murphy · Frank Hutter -
2019 Oral: NAS-Bench-101: Towards Reproducible Neural Architecture Search »
Chris Ying · Aaron Klein · Eric Christiansen · Esteban Real · Kevin Murphy · Frank Hutter -
2019 Tutorial: Algorithm configuration: learning in the space of algorithm designs »
Kevin Leyton-Brown · Frank Hutter -
2018 Poster: BOHB: Robust and Efficient Hyperparameter Optimization at Scale »
Stefan Falkner · Aaron Klein · Frank Hutter -
2018 Oral: BOHB: Robust and Efficient Hyperparameter Optimization at Scale »
Stefan Falkner · Aaron Klein · Frank Hutter