Invited Talk: Assessing Quality of Information without Ground Truth
Yiling Chen
Invited Talk: Ruth Misener
Invited talk from Ruth Misener.
Title: Autonomous research machines: Self-optimizing new chemistry
Abstract: Our research seeks to boost R&D efficiency in the chemicals industry. As an example, consider "micro reactor flow systems", which are transforming chemical manufacturing by enabling flexible prototyping. Because these high-throughput microfluidic devices can control reaction conditions online, they are ideal for quantitatively characterizing diverse chemical synthesis techniques along new reaction pathways. The challenge is: How do we automate the design of experiments to "self-optimise" new chemistry?
Together with the BASF Data Science for Materials & Chemistry teams, we’re interested to solve Bayesian optimization challenges which may simultaneously exhibit: multiple objectives, mixed-feature spaces, asynchronous decisions, large batch sizes, input constraints, multi-fidelity observations, hierarchical choices, and costs associated with switching between experimental points. We review the machine learning contributions that we’ve found useful towards achieving these goals and discuss our own methodological and software contributions.
This work is a collaboration between Imperial (Jose Pablo Folch, Alexander Thebelt, Shiqiang Zhang, Jan Kronqvist, Calvin Tsay, Ruth Misener) and BASF (Robert Lee, Behrang Shafei, Nathan Sudermann-Merx, David Walz).
Scientists have recognized the need to build bottom-up models for socio-economic systems. Such models are often framed as heterogeneous agents interacting in a network following the rules of a dynamical system. However, the available data is often aggregated and incomplete, so even if we have a good model of reality, inferring the state of the individual agents remains a big open challenge. Moreover, these models are usually costly to simulate because one has to compute the individual interactions of all the agents in the system. We present a methodology to infer the latent states of agents embedded in a network when the data available is sparse, noisy, and low-dimensional. The methodology is based on the ensemble Kalman filter extended with a network localization technique that uses the system’s topology to improve the accuracy of the estimations. Our methodology has the following desired properties for bottom-up socioeconomic models: i) it treats the model as a black box, so it does not assume any closed-form mathematical form of the model a priori, ii) it requires a minimal number of simulations compared to state-of-the-art methods, iii) it exploits the underlying topology of the system to improve its predictions, iv) it works for nonlinear systems, v) it is well-justified from a Bayesian perspective, and vi) it is easy to implement. We validate our methodology in two informative examples: 1) a high-dimensional approximation of the Mackey-Glass chaotic system and 2) the Hegselmann-Krause bounded confidence (nonlinear) model of opinion dynamics embedded in a social network. While we do not use real-world data to showcase our methodology, we add noise and exogenous shocks to the observations, obtaining accurate predictions in both the observation and the latent state spaces. We aim to help bridge the gap between bottom-up modeling and data assimilation techniques in a computationally efficient way.
Blas Kolic
Physisist doing social and economic complex systems. I'm developing data assimilation methods to infer the latent states of agent-based models. I also study social media and their interaction networks.
Invited Talk: The Value Equivalence Principle for Model-Based RL
Christopher Grimm
Invited talk: Self-supervised learning for speech generation
Self-supervised learning (SSL) for speech has demonstrated great success on inference tasks such as speech recognition. However, it is less studied for generative tasks where the goal is to synthesize speech. In this talk, I will share our recent work on building unconditional and conditional generative speech models leveraging SSL. Instead of representing speech with traditional features like spectrogram, we showed that discrete units derived from self-supervised models serve as better generative modeling targets for several tasks. Specifically, we presented the first text-free spoken language models for prosodically rich speech as well as spoken dialogues, and achieved SOTA performance on speech-to-speech translation without intermediate text output.
Wei-Ning Hsu
Invited Talk: Embracing Subjectivity In Machine Learning Benchmarks
Kurt Bollacker
Invited Talk: Solving the Right Problems: Making ML Models Relevant to Healthcare and the Life Sciences
The first-generation models for drug discovery and clinical applications were mostly direct modifications of algorithms developed for NLP, computer vision, and other well-established application areas. However, deployment of these models revealed the significant mismatch between their basic assumptions and the needs of these new life sciences applications. Examples include challenging generalization scenarios, unknown biases in the collected data, and the inability of domain experts to validate model predictions. In my talk, I will illustrate some of these problems, and introduce our initial solutions to them.
Regina Barzilay
Regina Barzilay is an Israeli-American computer scientist. She is a professor at the Massachusetts Institute of Technology and a faculty lead for artificial intelligence at the MIT Jameel Clinic. Her research interests are in natural language processing and applications of deep learning to chemistry and oncology.
Invited Talk: Physics-infused learning with ABM
Research at the intersection of physics and machine learning has outsized potential to advance both fields: models and algorithms can be embedded with, or informed by, physics knowledge, and learned models as well as simulators of complex processes and systems can advance experimentation towards understanding. In practice, this physics-ML intersection is almost entirely ML surrogates for accelerating an existing numerical simulator. But can we look to ML & AI to discover physics knowledge we don't yet have governing equations for, to recover missing physics and fill gaps in human-expert understanding? It is this perspective we explore in this talk, in particular the use of agent-based modeling (ABM) as a new abstraction for computational fluid dynamics (CFD), pulling in advanced reinforcement learning (RL) methods from the AI field. We introduce multi-agent RL as an automated discovery tool of turbulence and other fluid dynamics models, leveraging the emergent phenomena of ABM to surface the unresolved subgrid-scale physics. These methods, although nascent, can significantly advance prediction and control of industrial aerodynamics and environmental flows in critical areas like nuclear fusion plasma and atmospheric transport of contaminants.
Alexander Lavin
Founder, CEO of Pasteur Labs & Institute for Simulation Intelligence
Invited talk: Cooperative conversational AI
The development of machines that effectively converse with humans is a challenging problem that requires combining complex technologies, such as speech recognition, dialogue systems, and speech synthesis. Current solutions mainly rely on independent modules combined in plain unidirectional pipelines. To reach higher levels of human-computer interactions, we have to radically rethink current conversational AI architectures with a novel cooperative framework. We need to replace standard pipelines with "cooperative networks of deep networks" where all the modules automatically learn how to cooperate, communicate, and interact. This keynote will discuss some novel ideas toward this ambitious goal and will introduce a novel toolkit called SpeechBrain designed to easily implement this holistic approach to Conversational AI.
Mirco Ravanelli
Invited Talk: Christopher Langmead
Invited talk from Christopher Langmead.
Title: Active Learning for the Design of Therapeutic Proteins
Abstract: I will discuss ongoing work at Amgen to use Active Learning to train predictive models relevant to the design of therapeutic proteins.
DiffWave is a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. DiffWave produces high-fidelity audios in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity.
Zhifeng Kong
Invited Talk: Design for Inference in Drug Discovery and Development
Abstract coming soon...
Aviv Regev
Aviv Regev is a computational biologist and systems biologist and Executive Vice President and Head of Genentech Research and Early Development in Genentech/Roche. She is a core member at the Broad Institute of MIT and Harvard and professor at the Department of Biology of the Massachusetts Institute of Technology.
In 2020, Regev became the Head and Executive Vice President of Genentech Research and Early Development, based in South San Francisco, and a member of the extended Corporate Executive Committee of Roche. Previously, she was a Core Institute Member (now on leave), Chair of the Faculty, Founding Director of the Klarman Cell Observatory and co-Director Cell Circuits Program at the Broad Institute of MIT and Harvard. She was also a professor in the Department of Biology at the Massachusetts Institute of Technology (now on leave), as well as an Investigator at the Howard Hughes Medical Institute. Regev's research includes work on gene expression (with Eran Segal and David Botstein), and the use of π-calculus to represent biochemical processes. Regev’s team has been a leading pioneer of single-cell genomics experimental and computational methods.
Invited Talk: Synthetic Control Methods and Difference-In-Differences
There is a fast growing literature in econometrics on estimating causal effects in settings with panel or longitudinal data, building on the recent difference-in-differences and synthetic control literatures. This is driven by an empirical literature with many applications. These range from settings with few cross-sectional units and many or few time periods, to many cross-sectional units. Sometimes there are few, or even only one treated unit, and sometimes many. I will review some of this recent literature including some of the examples, focusing in particular on the synthetic difference in differences estimator, and some of the relations with matrix completion literature. I will also discuss implications for randomized experiments. I will discuss some of the remaining challenges in the literature.
Guido Imbens
Guido Imbens is The Applied Econometrics Professor at the Stanford Graduate School of Business and Professor of Economics in the Economics Department at Stanford University. Currently he is also the Amman Mineral Faculty Fellow at the GSB. He has held tenured positions at UCLA, UC Berkeley, and Harvard University before joining Stanford in 2012. Imbens specializes in econometrics, and in particular methods for drawing causal inferences from experimental and observational data. He has published extensively in the leading economics and statistics journals. Together with Donald Rubin he has published a book, Causal Inference in Statistics, Social and Biomedical Sciences. Guido Imbens is a fellow of the Econometric Society, the Royal Holland Society of Sciences and Humanities, the Royal Netherlands Academy of Sciences, the American Academy of Arts and Sciences, and the American Statistical Association. He holds honorary doctorates from the University of St. Gallen and Brown University. In 2017 he received the Horace Mann medal at Brown University. In 2021 he shared the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel with David Card and Joshua Angrist for
methodological contributions to the analysis of causal relationship.’’ Currently Imbens is Editor of Econometrica.
Mohit Bansal
Invited talk: Invited talks 3, Amy Zhang, Rich Zemel and Liting Sun
Amy Zhang
Liting Sun
Richard Zemel
Christina Heinze-Deml - Marzyeh Ghassemi -
Christina Heinze-Deml
Marzyeh Ghassemi
Dr. Marzyeh Ghassemi is an Assistant Professor at MIT in Electrical Engineering and Computer Science (EECS) and Institute for Medical Engineering & Science (IMES), and a Vector Institute faculty member holding a Canadian CIFAR AI Chair and Canada Research Chair. She holds MIT affiliations with the Jameel Clinic and CSAIL.
Professor Ghassemi holds a Herman L. F. von Helmholtz Career Development Professorship, and was named a CIFAR Azrieli Global Scholar and one of MIT Tech Review’s 35 Innovators Under 35. Previously, she was a Visiting Researcher with Alphabet’s Verily. She is currently on leave from the University of Toronto Departments of Computer Science and Medicine. Prior to her PhD in Computer Science at MIT, she received an MSc. degree in biomedical engineering from Oxford University as a Marshall Scholar, and B.S. degrees in computer science and electrical engineering as a Goldwater Scholar at New Mexico State University.
Causal representation learning tackles the problem of discovering high-level variables from low-level observations. In this talk, I will discuss how modular architectures such as Neural Interpreters and Neural Attentive Circuits implement inductive biases from the causal principle of independent mechanisms. Leveraging dynamic connectivity graphs and conditional computatations, I will showcase their scalability and interesting properties for robust recognition, efficient transfer, and reasoning.
Francesco Locatello
Invited talk: Reinforcement learning in continuous-time and space
In this talk, we will introduce a continuous-time reinforcement learning (CTRL) framework. Our talk starts with a categorization of RL problems and naturally motivates a continuous-time perspective to RL. We then introduce a model-based CTRL approach, which solves physical control tasks using neural ordinary differential equations as a sub-routine. We conclude by briefly introducing recent approaches to CTRL.
Invited Talk: Deep neural network approximations for PDEs
Most of the numerical approximation methods for PDEs in the scientific literature suffer from the so-called curse of dimensionality (CoD) in the sense that the number of computational operations and/or the number of parameters employed in the corresponding approximation scheme grows exponentially in the PDE dimension and/or the reciprocal of the desired approximation precision. Recently, certain deep learning-based approximation methods for PDEs have been proposed and various numerical simulations for such methods suggest that deep neural network (DNN) approximations might have the capacity to indeed overcome the CoD in the sense that the number of real parameters used to describe the approximating DNNs grows at most polynomially in both the PDE dimension and the reciprocal of the prescribed approximation accuracy. In this talk, we show that solutions of suitable Kolmogorov PDEs can be approximated by DNNs without the CoD.
Michelle Girvan
High-value payments systems (HVPSs) are used to settle transactions between large financial institutions and are considered the core national financial infrastructure. In collaboration with the Bank of Canada, we have been exploring the use of reinforcement learning techniques to understand the behaviour of banks participating in the Canadian HVPS. This understanding could help regulators design policies to ensure the safety and efficiency of these systems.
Pablo Samuel Castro
Pablo was born and raised in Quito, Ecuador, and moved to Montreal after high school to study at McGill. He stayed in Montreal for the next 10 years, finished his bachelors, worked at a flight simulator company, and then eventually obtained his masters and PhD at McGill, focusing on Reinforcement Learning. After his PhD Pablo did a 10-month postdoc in Paris before moving to Pittsburgh to join Google. He has worked at Google for almost 6 years, and is currently a research Software Engineer in Google Brain in Montreal, focusing on fundamental Reinforcement Learning research, as well as Machine Learning and Music. Aside from his interest in coding/AI/math, Pablo is an active musician (https://www.psctrio.com), loves running (5 marathons so far, including Boston!), and discussing politics and activism.
Invited Talk: Towards a Mathematical Theory of Machine Learning
Given a machine learning model, what are the class of functions that can be approximated by this particular model efficiently, in the sense that the convergence rate for the approximation, estimation and optimization errors does not deteriorate as dimensionality goes up? We address this question for three classes of machine learning models: The random feature model, two-layer neural networks and the residual neural network model. During the process, we will also summarize the current status of the theoretical foundation of deep learning, and discuss some of the key open questions.
Weinan E
Weinan E is a professor at the Center for Machine Learning Research (CMLR) and the School of Mathematical Sciences at Peking University. He is also a professor at the Department of Mathematics and Program in Applied and Computational Mathematics at Princeton University. His main research interest is numerical algorithms, machine learning and multi-scale modeling, with applications to chemistry, material sciences and fluid mechanics.
Weinan E was awarded the ICIAM Collatz Prize in 2003, the SIAM Kleinman Prize in 2009 and the SIAM von Karman Prize in 2014, the SIAM-ETH Peter Henrici Prize in 2019, and the ACM Gordon-Bell Prize in 2020. He is a member of the Chinese Academy of Sciences, a fellow of SIAM, AMS and IOP. Weinan E is an invited plenary speaker at ICM 2022. He has also been an invited speaker at ICM 2002, ICIAM 2007 as well as the AMS National Meeting in 2003. In addition, he has been an invited speaker at APS, ACS, AIChe annual meetings, the World Congress of Computational Mechanics, and the American Conference of Theoretical Chemistry.
Invited talk: Frontiers and challenges in music audio generation
Despite notable recent progress on generative modeling of text, images, and speech, generative modeling of music audio remains a challenging frontier for machine learning. A primary obstacle of modeling audio is the extreme sequence lengths of audio waveforms, which are impractical to model directly with standard methods. A challenge more specific to modeling music audio is scaling to critical capacity, an elusive threshold of model size beyond which coherent generation emerges. In this talk, I will present strategies from my work which seek to overcome the practical challenges of modeling audio by either (1) exploring featurizations which reduce superfluous information in waveforms, or (2) proposing new methods which can process waveforms directly. I will also share insights from ongoing work on achieving critical capacity for generating broad music audio, i.e., music audio not constrained to a particular instrument or genre.
Invited talk: A hierarchical representation learning approach for source separation, transcription, and music generation
With interpretable music representation learning, music source separation problems are well connected with transcription problems, and transcription problems can be transformed into music arrangement problems. In particular, Gus will discuss two recently developed models. The first one used a pitch-timbre disentanglement to achieve source separation, transcription, and synthesis. The second one used cross-modal chord-texture disentanglement to solve audio-to-symbolic piano arrangement. In the end, Gus will show his vision of a unified hierarchical representation-learning framework that bridges music understanding and generation.
Gus xia
Invited Talk: A Model-Based Reinforcement Learning Wishlist
Erin Talvitie
Erin Talvitie is an associate professor of Computer Science at Harvey Mudd College. She graduated from Oberlin College in 2004 with majors in Computer Science and Mathematics and received her Ph.D. in Artificial Intelligence from the University of Michigan in 2010. She was a founding member of the Department of Computer Science at Franklin & Marshall College before moving on to Harvey Mudd College in 2019. Her research interests focus on model-based reinforcement learning -- specifically scaling model-based approaches up to complex, high-dimensional problems -- with he aim of working toward artificial autonomous agents that can learn to act flexibly and competently in unknown environments. She is the recipient of an NSF Graduate Research Fellowship, an NSF CAREER grant, outstanding reviewer awards from AAAI and NeurIPS, a best paper nomination from AAMAS, and a best paper award from RLDM.
Brandon Amos
We require systems to monitor species in real time and in greater detail to quickly understand which conservation and sustainability efforts are most effective and take corrective action. Current ecological monitoring systems generate data far faster than researchers can analyze it, making scaling up impossible without automated data processing. Pre-training, particularly methods that require minimal human supervision, is clearly well-aligned with this problem setting where large amounts of unlabeled data are available. However, ecological data collected in the field presents a number of challenges that current pre-training methods are often not designed to tackle. These include strong spatiotemporal correlations and domain shifts, imperfect data quality, fine-grained categories, and long-tailed distributions. I will discuss gaps between the current pre-training paradigm and what is needed for usable, impactful computer vision based environmental monitoring systems, and outline several interesting future directions at the intersection of pre-training and environmental monitoring.
Sara Beery
In this talk, I am going to cover our recent works in the self-supervised learning space for visual representation pre-training. First is SimSiam, a non-contrastive, momentum-free framework that to our supervise, can successfully avoid trivial solutions and achieve very competitive performance to more complicated methods like MoCo. Second is Masked Autoencoder (MAE), which simply and directly reconstructs input signals by predicting natural image patches as a further simplification of self-supervised frameworks for computer vision. MAE adopts a BERT-like algorithm with crucial changes for images, and exhibits BERT-like scaling behaviors, among other intriguing properties different from contrastive learning.
Xinlei Chen
Invited Talk: Exploring the Limits of Large Scale Pre-training
Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks. In this work, we systematically study this phenomena and establish that, as we increase the upstream accuracy, the performance of downstream tasks saturates. In particular, we investigate more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with number of parameters ranging from ten million to ten billion, trained on the largest scale of available image data (JFT, ImageNet21K) and evaluated on more than 20 downstream image recognition tasks. We propose a model for downstream performance that reflects the saturation phenomena and captures the nonlinear relationship in performance of upstream and downstream tasks. Delving deeper to understand the reasons that give rise to these phenomena, we show that the saturation behavior we observe is closely related to the way that representations evolve through the layers of the models.
Hanie Sedghi
Hanie Sedghi a Senior Research Scientist at Google DeepMind where she leads the DeepPhenomena team. The focus of her research has been understanding deep learning models to push their boundaries; not just for (out-of-distribution) generalization, but also the broader sense of algorithmic and scientific reasoning capabilities (of large language models). She is a workshop chair for NeurIPS 2022 as well as tutorial chair for ICML 2022 and 2023, a program chair for CoLLAs 2023 and has been an area chair for NeurIPS, ICLR and ICML and a member of JMLR Editorial board for the last few years. Prior to Google, Hanie was a Research Scientist at Allen Institute for Artificial Intelligence and before that, a postdoctoral fellow at UC Irvine. She received her PhD from University of Southern California with a minor in Mathematics.
I will describe our experience with two generations of large language models for code at Google. These models show a range of abilities, including generating small programs from natural language descriptions and engaging in dialog about code, incorporating human feedback to improve solutions. However, in a deeper sense, these models seem not to understand the code that they write, in the sense that they are generally unable to predict the output of a program given a specific input. I will discuss our subsequent efforts to improve the "code understanding" abilities of LMs, by asking them to emit intermediate computation steps as tokens onto a "scratchpad". These same models are able to perform complex multi-step computations when asked to perform the operation "step by step", showing the results of intermediate computations, even operations that the LM could not perform directly.
Charles Sutton
Invited Talk: How Neural Networks See, Learn and Forget
Neural networks have been at the heart of machine learning breakthroughs for over a decade. But in just the past couple of years, new advances in model architectures, pretraining and scaling challenge our assumptions on how they function. In this talk I provide some insights into the workings of modern machine learning. Motivated by the ubiquity of Transformer architectures across tasks and data modalities, I discuss the recent successes of Transformers in computer vision and key similarities and differences to convolutional architectures. Next, I overview some of the salient properties of pretraining on Transformer representations and the effect of scale. I draw connections to results on catastrophic forgetting, the way in which forgetting manifests in representations and new mitigation methods suggested by these insights. I conclude with some open questions in these directions.
Invited Talk: Neural Scaling of Deep Chemical Models
Massive scale, both in terms of data availability and computation, enables significant breakthroughs in key application areas of deep learning such as natural language processing (NLP) and computer vision. There is emerging evidence that scale may be a key ingredient in scientific deep learning, but the importance of physical priors in scientific domains makes the strategies and benefits of scaling uncertain. Here, we investigate neural scaling behavior in large chemical models by varying model and dataset sizes over many orders of magnitude, studying models with over one billion parameters, pre-trained on datasets of up to ten million datapoints. We consider large language models for generative chemistry and graph neural networks for machine-learned interatomic potentials. To enable large-scale scientific deep learning studies under resource constraints, we develop the Training Performance Estimation (TPE) framework to reduce the costs of scalable hyperparameter optimization by up to 90%. Using this framework, we discover empirical neural scaling relations for deep chemical models and investigate the interplay between physical priors and scale. Potential applications of large, pre-trained models for "prompt engineering" and unsupervised representation learning of molecules are shown.
Connor Coley
Nathan C. Frey
Invited Talk: Caroline Uhler
Invited talk from Caroline Uhler.
Title: Optimal Design of Interventions for Causal Discovery in Genomics
Caroline Uhler
Caroline Uhler joined the MIT faculty in 2015 as the Henry L. and Grace Doherty assistant professor in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society. She holds an MSc in mathematics, a BSc in biology, and an MEd in high school mathematics education from the University of Zurich. She obtained her PhD in statistics, with a designated emphasis in computational and genomic biology, from the University of California, Berkeley. Before joining MIT, she spent a semester as a research fellow in the program on Theoretical Foundations of Big Data Analysis at the Simons Institute at UC Berkeley, postdoctoral positions at the Institute for Mathematics and its Applications at the University of Minnesota and at ETH Zurich, and 3 years as an assistant professor at IST Austria. She is an elected member of the International Statistical Institute, a Sloan Research Fellow, and she received an NSF Career Award, a Sofja Kovalevskaja Award from the Humboldt Foundation and a START Award from the Austrian Science Foundation. Her research focuses on mathematical statistics and computational biology, in particular on graphical models and causal inference.
Invited Talk: TBA
J.D. Maddox
J.D. Maddox is an independent consultant serving as an advisor to the U.S. Global Engagement Center. He is an expert on the subjects of influence and political violence. He is the CEO of Inventive Insights LLC, a small national security consultancy, and is an adjunct professor of national security studies at George Mason University’s Schar School. He is also a frequent author of national security commentaries. J.D. recently developed the US Government’s premier counter-disinformation technology testing and implementation effort.
In 2021, J.D. ran a strong centrist campaign for the Virginia House of Delegates, asserting common principles of security, economic opportunity, and human dignity.
Prior to his role at Inventive Insights, J.D. served as a CIA branch chief, Deputy Coordinator of the U.S. Global Engagement Center, an advisor to the Secretary of Homeland Security, and a U.S. Army Psychological Operations team leader. J.D. has participated as a leader in many of his generation’s most important national security operations, such as combating the terrorist group ISIL in Iraq, responding to the 9/11 attacks, and devising feasible solutions to disinformation. He also has regularly advised US Presidents, Members of Congress, Cabinet Secretaries and other world leaders on high-consequence security issues, such as nuclear threats and counterterrorism operations.
As an adjunct professor, J.D. teaches the graduate-level course "Disinformation and Policy Responses" at George Mason University, and has taught “National Security Challenges” since 2011. He has also lectured at Georgetown University, New York University, George Washington University, National Defense University, National Intelligence University, American University, the Center for Strategic and International Studies, the U.S. Foreign Service Institute and more.
J.D.’s analytic insights have recently appeared in news media including 𝘛𝘩𝘦 𝘕𝘦𝘸 𝘠𝘰𝘳𝘬 𝘛𝘪𝘮𝘦𝘴, 𝘜𝘚𝘈 𝘛𝘰𝘥𝘢𝘺, 𝘛𝘩𝘦 𝘊𝘰𝘭𝘶𝘮𝘣𝘪𝘢 𝘑𝘰𝘶𝘳𝘯𝘢𝘭𝘪𝘴𝘮 𝘙𝘦𝘷𝘪𝘦𝘸, MSNBC, Reuters, CBS 60 Minutes (upcoming), and more. He often speaks at national and international forums.
His policy scholarship has been published by 𝘓𝘢𝘸𝘧𝘢𝘳𝘦, West Point’s Modern War Institute, George Washington University’s Program on Extremism, The University of Southern California’s 𝘗𝘶𝘣𝘭𝘪𝘤 𝘋𝘪𝘱𝘭𝘰𝘮𝘢𝘤𝘺 𝘔𝘢𝘨𝘢𝘻𝘪𝘯𝘦, George Mason University’s 𝘑𝘰𝘶𝘳𝘯𝘢𝘭 𝘰𝘧 𝘕𝘢𝘳𝘳𝘢𝘵𝘪𝘷𝘦 & 𝘊𝘰𝘯𝘧𝘭𝘪𝘤𝘵, and elsewhere.
Three of our recent sequence models - Chinchilla, Flamingo, and Gato - leveraged one another and combined several state-of-the-art pre-training techniques. This talk will describe how this combination yielded even stronger capabilities to achieving complex tasks, in the few-shot setting, beyond what could have been expected from their training regimes.
Oriol Vinyals
Oriol Vinyals is a Research Scientist at Google. He works in deep learning with the Google Brain team. Oriol holds a Ph.D. in EECS from University of California, Berkeley, and a Masters degree from University of California, San Diego. He is a recipient of the 2011 Microsoft Research PhD Fellowship. He was an early adopter of the new deep learning wave at Berkeley, and in his thesis he focused on non-convex optimization and recurrent neural networks. At Google Brain he continues working on his areas of interest, which include artificial intelligence, with particular emphasis on machine learning, language, and vision.
Invited Talk: Invited Talks 1, Bernhard Schölkopf and David Lopez-Paz
Bernhard Schölkopf - What is a causal representation? David Lopez-Paz - On invariance
Bernhard Schölkopf
Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.
David Lopez-Paz
Jean Fan
Tamara Broderick
Invited Talk: Invited Talk #1 - How to communicate your results
Andrew Fitzgibbon
Andrew Fitzgibbon has been closely involved in the delivery of three groundbreaking computer vision systems over two decades. In 2000, he was computer vision lead on the Emmy-award-winning 3D camera tracker “Boujou”; in 2009 he introduced large-scale synthetic training data to Kinect for Xbox 360, and in 2019 was science lead on the team that shipped fully articulated hand tracking on HoloLens 2. His passion is bringing the power of mathematics to the crucible of real-world engineering. He has numerous research awards, including ten “best paper” prizes at leading conferences, and is a Fellow of the UK’s Royal Academy of Engineering.
Invited Talk: Invited Talk: Emma Brunskill
Emma Brunskill
Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).
Deborah Raji
Invited Talk: Time Value of Data and AI Strategy
Ehsan Valavi
Ehsan is a Ph.D. candidate in the Technology and Operations Management at Harvard Business School. His research interest is at the interface of digitization, strategy, and operations management. He is currently interested in studying the growth of digital firms and the challenges they face in various business areas. His recent research has focused on the scalability of Artificial Intelligence (AI) based solutions and the value of data for digital firms.
He completed his undergraduate studies in Electrical Engineering (Telecommunications) at the University of Tehran and has a master's degree in communication systems from the Swiss Federal Institute of Technology at Lausanne (EPFL). He also holds another master's degree in Decision, Risk, and Operations Management from Columbia Business School.
Xavier Bouthillier
Invited Talk: How Will Interactive Theorem Provers Develop? Sir Timothy Gowers (Recorded Talk, but with Live Q&A at 13:30!)
In my talk, I will present two ideas related to human-machine collaboration which emphasize the human-element. The objective is to develop human+AI systems that support and enhance the human experience.
In particular, I will advocate for the development of intelligent systems that
(1) help us embrace boredom and
(2) understand, model and possibly mimic human cognitive biases
We have implemented the first idea whereas the second idea is a new research area that we started just a couple of months ago.
Invited Talk: Creating Human-Computer Partnerships. Wendy Mackay
One of the key differences between AI and HCI research is that AI measures success in terms of more effective algorithms, whereas HCI focuses on improving interaction and enhancing human skills over time. I argue that better AI algorithms are neither necessary nor sufficient for creating more effective intelligent systems. Instead, we need to create human-computer partnerships that take advantage of machine learning but leave the user in control. I describe several projects that use generative theories of interaction to design intelligent interactive systems that users find discoverable, expressive and appropriable.
Invited Talk: Machine-only to human-machine collaboration from practical AI deployments. Ernest Mwebaze
A strong temptation in the "AI for social good" space is a bias towards more efficient solutions through automation or a machine-only intervention. In this talk I give several examples where going in with this assumption results in sub-optimal solutions and the need for a human-machine collaboration becomes evident. A side effect of this is usually a move from simple automation to a more context-aware design of a potential solution. I highlight the need for considering context as an integral factor in human-machine collaboration through examples from practical deployments of AI solutions in the developing world context.
Theresa Reiker
Invited Talk: A Case Study of Real-World Kernel Exploitation
A walk-through of the process security researchers go through to find modern kernel exploits, and a discussion of potential ways to improve bug finding and categorization with AI. We present CVE-2022-29968, an original Linux kernel exploit we developed, and discuss the current challenges researchers face with respect to exploit categorization and automated discovery.
Joseph Ravichandran
Michael Wang
Generative models are typically based on explicit representations of probability distributions (e.g., autoregressive or VAEs) or implicit sampling procedures (e.g., GANs). We propose an alternative approach based on modeling directly the vector field of gradients of the data distribution (scores). Our framework allows flexible architectures, requires no sampling during training or the use of adversarial training methods. Additionally, score-based generative models enable exact likelihood evaluation through connections with continuous time normalizing flows and stochastic differential equations. We produce samples comparable to GANs, achieving new state-of-the-art inception scores, and excellent likelihoods on image datasets.
Stefano Ermon
Can Neural ODE architectures provide a continuous-time extension of residual neural networks? I will show that this depends on the specific numerical solver chosen for training Neural ODE models. If the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance. But if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. In such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the continuous-in-time concept. I will examine the specific effects which lead to this breakdown and discuss how to ensure that the model maintains continuous-time properties.
Katharina Ott
Existing analyses of optimization in deep learning are either continuous, focusing on variants of gradient flow (GF), or discrete, directly treating variants of gradient descent (GD). GF is amenable to theoretical analysis, but is stylized and disregards computational efficiency. The extent to which it represents GD is an open question in deep learning theory. My talk will present a recent study of this question. Viewing GD as an approximate numerical solution to the initial value problem of GF, I will show that the degree of approximation depends on the curvature around the GF trajectory, and that over deep neural networks (NNs) with homogeneous activations, GF trajectories enjoy favorable curvature, suggesting they are well approximated by GD. I will then use this finding to translate an analysis of GF over deep linear NNs into a guarantee that GD efficiently converges to global minimum almost surely under random initialization. Finally, I will present experiments suggesting that over simple deep NNs, GD with conventional step size is indeed close to GF. An underlying theme of the talk will be the possibility of GF (or modifications thereof) to unravel mysteries behind deep learning.
Nadav Cohen
Kyle Cranmer
Professor of Physics and Data Science at NYU. Executive director of Moore-Sloan data science environment at NYU. Member of ATLAS collaboration at CERN’s Large Hadron Collider (LHC). NeurIPS2016 keynote. Organizer of Deep Learning for Physical Sciences workshop at NeurIPS 2017.
Open Images is a large-scale public dataset of 9M images richly annotated with 16M boxes, 2.8M instance segmentations, 3.3M relationship annotations, 675k localized narratives, and 60M image-level labels. Collecting and annotating such a large and varied dataset has been a gargantuan effort not free of significant challenges. In this talk I'll present and discuss some of the learnings from creating Open Images, the challenges we found, and the solutions we came up with.