## 8th ICML Workshop on Automated Machine Learning (AutoML 2021)

### Gresa Shala · Frank Hutter · Joaquin Vanschoren · Marius Lindauer · Katharina Eggensperger · Colin White · Erin LeDell

Abstract:

Machine learning (ML) has achieved considerable successes in recent years, but this success often relies on human experts, who construct appropriate features, design learning architectures, set their hyperparameters, and develop new learning algorithms. Driven by the demand for robust, off-the-shelf ML methods from an ever-growing community, the research area of AutoML targets the progressive automation of machine learning aiming to make effective methods available to everyone. Hence, the workshop targets a broad audience ranging from core ML researchers in different fields of ML connected to AutoML, such as neural architecture search (NAS), hyperparameter optimization, meta-learning, and learning-to-learn, to domain experts aiming to apply ML to new types of problems.

Chat is not available.
Timezone: America/Los_Angeles »

### Schedule

 Fri 6:00 a.m. - 6:05 a.m. Welcome (Intro) 🔗 Fri 6:06 a.m. - 6:35 a.m. Invited Talk by Matthias Feurer: Towards hands-free AutoML (Invited Talk) Matthias Feurer 🔗 Fri 6:35 a.m. - 6:45 a.m. Q&A Matthias Feurer (Q&A) Matthias Feurer 🔗 Fri 6:46 a.m. - 7:15 a.m. Invited Talk by Ellen Vitercik: Automated Parameter Optimization for Integer Programming (Invited Talk) Ellen Vitercik 🔗 Fri 7:15 a.m. - 7:25 a.m. Q&A Ellen Vitercik (Q&A) Ellen Vitercik 🔗 Fri 7:25 a.m. - 7:26 a.m. A resource-efficient method for repeated HPO and NAS problems (Spotlight)    In this work we consider the problem of repeated hyperparameter and neural architecture search (HNAS).We propose an extension of Successive Halving that is able to leverage information gained in previous HNAS problems with the goal of saving computational resources. We empirically demonstrate that our solution is able to drastically decrease costs while maintaining accuracy and being robust to negative transfer. Our method is significantly simpler than competing transfer learning approaches, setting a new baseline for transfer learning in HNAS. link to postersession #1 🔗 Fri 7:26 a.m. - 7:27 a.m. Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization (Spotlight)    When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook. link to postersession #1 David Eriksson 🔗 Fri 7:27 a.m. - 7:28 a.m. GPy-ABCD: A Configurable Automatic Bayesian Covariance Discovery Implementation (Spotlight)    Gaussian Processes (GPs) are a very flexible class of nonparametric models frequently used in supervised learning tasks because of their ability to fit data with very few assumptions, namely just the type of correlation (kernel) the data is expected to display. Automatic Bayesian Covariance Discovery (ABCD) is an iterative GP regression framework aimed at removing the requirement for even this initial correlation form assumption. An original ABCD implementation exists and is a complex stand-alone system designed to produce long-form text analyses of provided data. This paper presents a lighter, more functional and configurable implementation of the ABCD idea, outputting only fit models and short descriptions: the Python package GPy-ABCD, which was developed as part of an adaptive modelling component for the FRANK query-answering system. It uses a revised model-space search algorithm and removes a search bias which was required in order to retain model explainability in the original system. link to postersession #1 Thomas Fletcher 🔗 Fri 7:28 a.m. - 7:29 a.m. Bandit Limited Discrepancy Search and Application to Machine Learning Pipeline Optimization (Spotlight)    Optimizing a machine learning (ML) pipeline has been an important topic of AI and ML. Despite recent progress, this topic remains a challenging problem, due to potentially many combinations to consider as well as slow training and validation. We present the BLDS algorithm for optimized algorithm selection (ML operations) in a fixed ML pipeline structure. BLDS performs multi-fidelity optimization for selecting ML algorithms trained with smaller computational overhead, while controlling its pipeline search based on multi-armed bandit and limited discrepancy search. Our experiments on well-known benchmarks show that BLDS is superior to competing algorithms. link to postersession #1 Akihiro Kishimoto 🔗 Fri 7:29 a.m. - 7:30 a.m. Towards Model Selection using Learning Curve Cross-Validation (Spotlight)    Cross-validation (CV) methods such as leave-one-out cross-validation, k-fold cross-validation, and Monte-Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing on the remaining data. These techniques have two drawbacks. First, they can be unnecessarily slow on large datasets. Second, providing only point estimates, they give almost no insights into the learning process of the validated algorithm. In this paper, we propose a new approach for validation based on learning curves (LCCV). Instead of creating train-test splits with a large portion of training data, LCCV iteratively increases the number of training examples used for training. In the context of model selection, it eliminates models that can be safely dismissed from the candidate pool. We run a large scale experiment on the 67 datasets from the AutoML benchmark, and empirically show that LCCV in over 90\% of the cases leads to similar performance (at most 0.5\% difference) as 10-fold CV, but provides additional insights on the behaviour of a given model. On top of this, LCCV results in runtime reductions between 20% and over 50% on half of the 67 datasets from the AutoML benchmark. This can be incorporated in various AutoML frameworks, to speed up the internal evaluation of candidate models. As such, these results can be used orthogonal to other advances in the field of AutoML. link to postersession #1 Jan N. van Rijn 🔗 Fri 7:30 a.m. - 7:31 a.m. Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio (Spotlight)    While training highly overparameterized neural networks is common practice in deep learning, research into post-hoc weight-pruning suggests that more than 90% of parameters can be removed without loss in predictive performance. To save resources, zero-shot and one-shot pruning attempt to find such a sparse representation at initialization or at an early stage of training. Though efficient, there is no justification, why the sparsity structure should not change during training. Dynamic sparsity pruning undoes this limitation and allows to adapt the structure of the sparse neural network during training. Recent approaches rely on weight magnitude pruning, which has been shown to be sub-optimal when applied at earlier training stages. In this work we propose to use the gradient noise to make pruning decisions. The procedure enables us to automatically adjust the sparsity during training without imposing a hand-designed sparsity schedule, while at the same time being able to recover from previous pruning decisions by unpruning connections as necessary. We evaluate our new method on image and tabular datasets and demonstrate that we reach similar performance as the dense model from which extract the sparse network, while exposing less hyperparameters than other dynamic sparsity methods. link to postersession #1 Julien Siems 🔗 Fri 7:31 a.m. - 7:32 a.m. AutoML Adoption in ML Software (Spotlight)    Machine learning (ML) has become essential to a vast range of applications, while ML experts are in short supply. To alleviate this problem, AutoML aims to make ML easier and more efficient to use. Even so, it is not clear to which extent AutoML techniques are actually adopted in an engineering context, nor what facilitates or inhibits adoption. To study this, we define AutoML engineering practices, measure their adoption through surveys, and distil first insights into factors influencing adoption from two initial interviews. Depending on the practice, results show around 20 to 30% of the respondents have not adopted it at all and many more only partially, leaving substantial room for increases in adoption. The interviews indicate adoption may in part be inhibited by usability issues with AutoML frameworks and the increased computational resources needed for adoption. link to postersession #1 Koen van der Blom 🔗 Fri 7:32 a.m. - 7:33 a.m. Leveraging Theoretical Tradeoffs in Hyperparameter Selection for Improved Empirical Performance (Spotlight)    The tradeoffs in the excess risk incurred from data-driven learning of a single model has been studied by decomposing the excess risk into approximation, estimation and optimization errors. In this paper, we focus on the excess risk incurred in data-driven hyperparameter optimization (HPO) and its interaction with approximate empirical risk minimization (ERM) necessitated by large data. We present novel bounds for the excess risk in various common scenarios in HPO. Based on these results, we propose practical heuristics that allow us to improve performance or reduce computational overhead of data-driven HPO, demonstrating over $2 \times$ speedup with no loss in predictive performance in our preliminary results. [link to postersession #1](https://eventhosts.gather.town/app/WmNodofE2Oab573H/automl-poster-session-1) Parikshit Ram 🔗 Fri 7:33 a.m. - 7:34 a.m. Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization (Spotlight)    While both neural architecture search (NAS) and hyperparameter optimization (HPO) have been studied extensively in recent years, NAS methods typically assume fixed hyperparameters and vice versa. Furthermore, NAS has recently often been framed as a multi-objective optimization problem, in order to take, e.g., resource requirements into account. In this paper, we propose a set of methods that extend current approaches to jointly optimize neural architectures and hyperparameters with respect to multiple objectives. We hope that these methods will serve as simple baselines for future research on multi-objective joint NAS + HPO. link to postersession #1 Thomas Elsken · Difan Deng 🔗 Fri 7:35 a.m. - 7:36 a.m. Towards Explaining Hyperparameter Optimization via Partial Dependence Plots (Spotlight)    Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of comprehensibility and transparency makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO and especially discuss the popular case of Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, like Partial Dependence Plots (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. In addition, we propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions. link to postersession #1 Julia Moosbauer · Julia Herbinger 🔗 Fri 7:36 a.m. - 7:37 a.m. Mutation is all you need (Spotlight)    Neural architecture search (NAS) promises to make deep learning accessible to non-experts by automating architecture engineering of deep neural networks. BANANAS is one state-of-the-art NAS method that is embedded within the Bayesian optimization framework. Recent experimental findings have demonstrated the strong performance of BANANAS on the NAS-Bench-101 benchmark being determined by its path encoding and not its choice of surrogate model. We present experimental results suggesting that the performance of BANANAS on the NAS-Bench-301 benchmark is determined by its acquisition function optimizer, which minimally mutates the incumbent. link to postersession #1 Lennart Schneider 🔗 Fri 7:37 a.m. - 7:38 a.m. Meta Learning the Step Size in Policy Gradient Methods (Spotlight)    Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance and, as a consequence, they tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on the ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments. link to postersession #1 Luca Sabbioni 🔗 Fri 7:40 a.m. - 9:00 a.m. Poster Session #1 (Poster Session)  link » All papers presented as spotlights in the session before, see also here Link » 🔗 Fri 9:00 a.m. - 9:10 a.m. Contributed Talk: Discovering Weight Initializers with Meta Learning (Contributed Talk)    Deep neural network training largely depends on the choice of initial weight distribution. However, this choice can often be nontrivial. Existing theoretical results for this problem mostly cover simple architectures, e.g., feedforward networks with ReLU activations. The architectures used for practical problems are more complex and often incorporate many overlapping modules, making them challenging for theoretical analysis. Therefore, practitioners have to use heuristic initializers with questionable optimality and stability. In this study, we propose a task-agnostic approach that discovers initializers for specific network architectures and optimizers by learning the initial weight distributions directly through the use of Meta-Learning. In several supervised and unsupervised learning scenarios, we show the advantage of our initializers in terms of both faster convergence and higher model performance. link to postersession #2 Dmitry Baranchuk 🔗 Fri 9:10 a.m. - 9:15 a.m. Q&A Contributed Talk (Q&A) Dmitry Baranchuk 🔗 Fri 9:15 a.m. - 9:25 a.m. Contributed Talk: Multimodal AutoML on Structured Tables with Text Fields (Contributed Talk)    We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well. Here we assemble 15 multimodal data tables that each contain some text fields and stem from a real business application. Over this benchmark, we evaluate numerous multimodal AutoML strategies, including a standard two-stage approach where NLP is used to featurize the text such that AutoML for tabular data can then be applied. We propose various practically superior strategies based on multimodal adaptations of Transformer networks and stack ensembling of these networks with classical tabular models. Beyond performing the best in our benchmark, our proposed (fully automated) methodology manages to rank 1st place (against human data scientists) when fit to the raw tabular/text data in two MachineHack prediction competitions and 2nd place (out of 2380 teams) in Kaggle’s Mercari Price Suggestion Challenge. link to postersession #2 Jonas Mueller 🔗 Fri 9:25 a.m. - 9:30 a.m. Q&A Contributed Talk (Q&A) Jonas Mueller 🔗 Fri 9:30 a.m. - 9:40 a.m. Contributed Talk: Automated Discovery of Adaptive Attacks on Adversarial Defenses (Contributed Talk)    Reliable evaluation of adversarial defenses is a challenging task, currently limited to an expert who manually crafts attacks that exploit the defense’s inner workings, or to approaches based on ensemble of fixed attacks, none of which may be effective for the specific defense at hand. Our key observation is that custom attacks are composed from a set of reusable building blocks, such as fine-tuning relevant attack parameters, network transformations, and custom loss functions. Based on this observation, we present an extensible framework that defines a search space over these reusable building blocks and automatically discovers an effective attack on a given model with an unknown defense by searching over suitable combinations of these blocks. We evaluated our framework on 23 adversarial defenses and showed it outperforms AutoAttack, the current state-of-the-art tool for reliable evaluation of adversarial defenses: our discovered attacks are either stronger, producing 3.0%-50.8% additional adversarial examples (10 cases), or are typically 2x faster while enjoying similar adversarial robustness (13 cases). link to postersession #2 Chengyuan Yao 🔗 Fri 9:40 a.m. - 9:45 a.m. Q&A Contributed Talk (Q&A) Chengyuan Yao 🔗 Fri 9:45 a.m. - 9:46 a.m. Sequential Automated Machine Learning: Bandits-driven Exploration using a Collaborative Filtering Representation (Spotlight)    The goal of Automated Machine Learning (AutoML) is to make Machine Learning (ML) tools more accessible. Collaborative Filtering (CF) methods have shown great success in automating the creation of machine learning pipelines. In this work, we frame the AutoML problem under a sequential setting where datasets arrive one at a time. On each dataset, an agent can try a small number of pipelines (exploration) before recommending a pipeline for this dataset (recommendation). The goal is to maximize the performance of the recommended pipelines over the sequence of datasets. More specifically, we focus on the exploration policy used for selecting the pipelines to explore before making the recommendation. We propose an approach based on the LinUCB bandit algorithm that leverages the latent representations extracted from matrix factorization (MF). We show that the exploration policy impacts the recommendation performance and that MF-based latent representations are more useful for exploration than for recommendation. link to postersession #2 Maxime Heuillet 🔗 Fri 9:46 a.m. - 9:47 a.m. LRTuner: A Learning Rate Tuner for Deep Neural Networks (Spotlight)    One very important hyper-parameter for training deep neural networks is the learning rate schedule of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalization accuracy of the network. Current systems employ hand tuned learning rate schedules, which are painstakingly tuned for each network and dataset. Given that the state space of schedules is huge, finding a satisfactory learning rate schedule can be very time consuming. In this paper, we present LRTuner, a method for tuning the learning rate as training proceeds. Our method works with any optimizer, and we demonstrate results on SGD with Momentum, and Adam optimizers. We extensively evaluate LRTuner on multiple datasets, models, and across optimizers. We compare favorably against standard learning rate schedules for the given dataset and models, including ImageNet on Resnet-50, Cifar-10 on Resnet-18, and SQuAD fine-tuning on BERT. For example on ImageNet with Resnet-50, LRTuner shows up to 0.2% absolute gains in test accuracy compared to the hand-tuned baseline schedule. Moreover, LRTuner can achieve the same accuracy as the baseline schedule in 29% less optimization steps. link to postersession #2 Nipun Kwatra 🔗 Fri 9:47 a.m. - 9:48 a.m. PonderNet: Learning to Ponder (Spotlight)    In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous adaptive computation methods and additionally succeeds at extrapolation tests where traditional neural networks fail. Also, our method matched the current state of the art results on a real world question and answering dataset, but using less compute. Finally, PonderNet reached state of the art results on a complex task designed to test the reasoning capabilities of neural networks. link to postersession #2 Andrea Banino 🔗 Fri 9:48 a.m. - 9:49 a.m. Replacing the Ex-Def Baseline in AutoML by Naive AutoML (Spotlight)    Automated Machine Learning (AutoML) is the problem of automatically finding the pipeline with the best generalization performance on some given dataset. AutoML has received enormous attention in the last decade and has been addressed with sophisticated black-box optimization techniques like Bayesian Optimization, Genetic Algorithms, or Tree Search. These approaches are almost never compared to simple baselines to see how much they improve over simple but easy to implement approaches. We present Naive AutoML, a very simple baseline for AutoML that exploits meta-knowledge about machine learning problems and makes simplifying, yet, effective assumptions to quickly come to high-quality solutions. In 1h experiments, state of the art approaches can hardly improve over Naive AutoML which in turn comes along with advantages such as interpretability and flexibility. link to postersession #2 Felix Mohr 🔗 Fri 9:49 a.m. - 9:50 a.m. Neural Fixed-Point Acceleration for Convex Optimization (Spotlight)    Fixed-point iterations are at the heart of numerical computing and are often a computational bottleneck in real-time applications, which typically instead need a fast solution of moderate accuracy. Classical acceleration methods for fixed-point problems focus on designing algorithms with theoretical guarantees that apply to any fixed-point problem. We present neural fixed-point acceleration, a framework to automatically learn to accelerate convex fixed-point problems that are drawn from a distribution, using ideas from meta-learning and classical acceleration algorithms. We apply our framework to SCS, the state-of-the-art solver for convex cone programming, and design models and loss functions to overcome the challenges of learning over unrolled optimization and acceleration instabilities. Our work brings neural acceleration into any optimization problem expressible with CVXPY. This is relevant to AutoML as we (meta-)learn improvements to a convex optimization solver that replaces an acceleration component that is traditionally hand-crafted. link to postersession #2 Shobha Venkataraman 🔗 Fri 9:50 a.m. - 9:51 a.m. Ranking Architectures by their Feature Extraction Capabilities (Spotlight)    The fundamental problem in Neural Architecture Search (NAS) is to efficiently ﬁnd highperforming ones from a search space of architectures. We propose a simple but powerful method for ranking architectures FEAR in any search space. FEAR leverages the viewpoint that neural networks are powerful non-linear feature extractors. By training different architectures in the search space to the same training or validation error and subsequently comparing the usefulness of the features extracted on the task-dataset of interest by freezing most of the architecture we obtain quick estimates of the relative performance. We validate FEAR on Natsbench topology search space on three diﬀerent datasets against competing baselines and show strong ranking correlation especially compared to recently proposed zero-cost methods. FEAR especially excels at ranking high-performance architectures in the search space. When used in the inner loop of discrete search algorithms like random search, FEAR can cut down the search time by approximately 2.4x without losing accuracy. We additionally empirically study very recently proposed zero-cost measures for ranking and ﬁnd that they breakdown in ranking performance as training proceeds and also that data-agnostic ranking scores which ignore the dataset do not generalize across dissimilar datasets. link to postersession #2 Debadeepta Dey 🔗 Fri 9:51 a.m. - 9:52 a.m. Incorporating domain knowledge into neural-guided search via in situ priors and constraints (Spotlight)    Many AutoML problems involve optimizing discrete objects under a black-box reward. Neural-guided search provides a flexible means of searching these combinatorial spaces using an autoregressive recurrent neural network. A major benefit of this approach is that builds up objects $\textit{sequentially}$—this provides an opportunity to incorporate domain knowledge into the search by directly modifying the logits emitted during sampling. In this work, we formalize a framework for incorporating such $\textit{in situ}$ priors and constraints into neural-guided search, and provide sufficient conditions for enforcing constraints. We integrate several priors and constraints from existing works into this framework, propose several new ones, and demonstrate their efficacy in informing the task of symbolic regression. [link to postersession #2](https://eventhosts.gather.town/app/5pkQC6W2dWi4qPMO/automl-poster-session-2) Mikel Landajuela Larma 🔗 Fri 9:52 a.m. - 9:53 a.m. Tabular Data: Deep Learning is Not All You Need (Spotlight)    A key element of AutoML systems is setting the types of models that will be used for each type of task. For classification and regression problems with tabular data, the use of tree ensemble models (like XGBoost) is usually recommended. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use-cases. In this paper, we explore whether these deep models should be a recommended option for tabular data, by rigorously comparing the new deep models to XGBoost on a variety of datasets. In addition to systematically comparing their accuracy, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of the deep models and XGBoost performs better on these datasets than XGBoost alone. link to postersession #2 Ravid Shwartz-Ziv 🔗 Fri 9:53 a.m. - 9:54 a.m. Automated Learning Rate Scheduler for Large-batch Training (Spotlight)    Large-batch training has been essential in leveraging large-scale datasets and models in deep learning. While it is computationally beneficial to use large batch sizes, it often requires a specially designed learning rate (LR) schedule to achieve a comparable level of performance as in smaller batch training. Especially, when the number of training epochs is constrained, the use of a large LR and a warmup strategy is critical in the final performance of large-batch training due to the reduced number of updating steps. In this work, we propose an automated LR scheduling algorithm which is effective for neural network training with a large batch size under the given epoch budget. In specific, the whole schedule consists of two phases: adaptive warmup and predefined decay, where the LR is increased until the training loss no longer decreases and decreased to zero until the end of training. Here, whether the training loss has reached the minimum value is robustly checked with Gaussian process smoothing in an online manner with a low computational burden. Coupled with adaptive stochastic optimizers such as AdamP and LAMB, the proposed scheduler successfully adjusts the LRs without cumbersome hyperparameter tuning and achieves comparable or better performances than tuned baselines on various image classification benchmarks and architectures with a wide range of batch sizes. link to postersession #2 Chiheon Kim 🔗 Fri 9:54 a.m. - 9:55 a.m. Adaptation-Agnostic Meta-Training (Spotlight)    Many meta-learning algorithms can be formulated into an interleaved process, in the sense that task-specific predictors are learned during inner-task adaptation and meta-parameters are updated during meta-update. The normal meta-training strategy needs to differentiate through the inner-task adaptation procedure to optimize the meta-parameters. This leads to a constraint that the inner-task algorithms should be solved analytically. Under this constraint, only simple algorithms with analytical solutions can be applied as the inner-task algorithms, limiting the model expressiveness. To lift the limitation, we propose an adaptation-agnostic meta-training strategy. Following our proposed strategy, we are capable to apply stronger algorithms (e.g., an ensemble of different types of algorithms) as the inner-task algorithm to achieve superior performance comparing with popular baselines. link to postersession #2 Jiaxin Chen 🔗 Fri 9:55 a.m. - 9:56 a.m. On-the-fly learning of adaptive strategies with bandit algorithms (Spotlight)    Automation of machine learning model development is increasingly becoming an established research area. While automated model selection and automated data pre-processing have been studied in depth, there is, however, a gap concerning automated model adaptation strategies for streaming data with non-stationarities. This has previously been addressed by heuristic generic adaptation strategies in the batch streaming setting. While showing promising performance, these strategies contain some limitations. In this work, we propose using multi-armed bandit algorithms for learning adaptive strategies from incrementally streaming data on-the-fly. Empirical results using established bandit algorithms show a comparable performance to two common stream learning algorithms. link to postersession #2 Rashid Bakirov 🔗 Fri 9:56 a.m. - 11:00 a.m. Poster Session #2 (Poster Session)  link » All papers presented as spotlights and contributed talks in the session before (see also here and here) Link » 🔗 Fri 11:01 a.m. - 11:30 a.m. Invited Talk by Kim Montgomery: Bias, Controlling Bias, and AutoML (Invited Talk) Kim Montgomery 🔗 Fri 11:30 a.m. - 11:40 a.m. Q&A Kim Montgomery (Q&A) Kim Montgomery 🔗 Fri 11:41 a.m. - 12:10 p.m. Invited Talk by Mi Zhang: Encoding is an Important Design Decision in Neural Architecture Search (Invited Talk) Mi Zhang 🔗 Fri 12:10 p.m. - 12:20 p.m. Q&A Mi Zhang (Q&A) Mi Zhang 🔗 Fri 12:40 p.m. - 1:30 p.m. Panel Discussion 🔗