Timezone: »

Workshop
Beyond Bayes: Paths Towards Universal Reasoning Systems
Zenna Tavares · Emily Mackevicius · Elias Bingham · Nan Rosemary Ke · Talia Ringer · Armando Solar-Lezama · Nada Amin · John Krakauer · Robert O Ness · Alexis Avedisian

Fri Jul 22 05:45 AM -- 03:00 PM (PDT) @ Ballroom 2

A long-standing objective of AI research has been to discover theories of reasoning that are general: accommodating various forms of knowledge and applicable across a diversity of domains. The last two decades have brought steady advances toward this goal, notably in the form of mature theories of probabilistic and causal inference, and in the explosion of reasoning methods built upon the deep learning revolution. However, these advances have only further exposed gaps in both our basic understanding of reasoning and in limitations in the flexibility and composability of automated reasoning technologies. This workshop aims to reinvigorate work on the grand challenge of developing a computational foundation for reasoning in minds, brains, and machines.

 Fri 5:45 a.m. - 6:00 a.m. Doors Open & Welcome ( Opening Remarks ) 🔗 Fri 6:00 a.m. - 6:45 a.m. Opening Keynote: Cognitive Science of Reasoning ( Keynote )    Keynote Talk: Thomas Icard, Associate Professor of Philosophy and (by courtesy) of Computer Science at Stanford University Bio: Thomas Icard works at the intersection of philosophy, cognitive science, and computer science, especially on topics that sit near the boundary between the normative (how we ought to think and act) and the descriptive (how we in fact do think and act). Much of his research concerns the theory and application of logic, probability, and causal modeling and inference. Some current topics of interest include explanation, the quantitative/qualitative interface, and reasoning with limited resources. Moderator/MC: Zenna Tavares Zenna Tavares · Thomas Icard 🔗 Fri 6:45 a.m. - 7:00 a.m. Contributed Spotlight Talks: Part 1 ( Spotlight Talks )    Beyond Bayes Spotlight Talks will highlight research that is participating in the afternoon poster session. Each presenter will give a brief, 5 minute talk. Attendees are welcome to ask questions to the presenters during the ICML plenary break immediately following. 9:45 AM: Talk 1 (P23): Language Model Cascades (David Dohan, Winnie Xu) 9:50 AM: Talk 2 (P08): Map Induction: Compositional Spatial Submap Learning for Efficient Exploration in Novel Environments (Sugandha Sharma) 9:55 AM: Talk 3 (P18): Abstract Interpretation for Generalized Heuristic Search in Model-Based Planning (Tan Zhi-Xuan, Vikash K. Mansinghka) David Dohan · Winnie Xu · Sugandha Sharma · Tan Zhi-Xuan 🔗 Fri 7:00 a.m. - 7:30 a.m. ICML Plenary Break 🔗 Fri 7:30 a.m. - 8:35 a.m. Session 1: New Reasoning Problems and Modes of Reasoning ( Talks & Panel Discussion )    Moderator/MC: Robert Ness Individual Talks (40 min) Talk 1 (20 min): Nan Rosemary Ke Talk 2 (20 min): Armando Solar-Lezama Q+A and Group Panel Discussion (25 min) - Thomas Icard will join this panel discussion. Robert Ness · Rosemary Nan Ke · Armando Solar-Lezama 🔗 Fri 8:35 a.m. - 9:00 a.m. Contributed Spotlight Talks: Part 2 ( Spotlight Talks )    Beyond Bayes Spotlight Talks will highlight research that is participating in the afternoon poster session. Each presenter will give a brief, 5 minute talk. Attendees are welcome to ask questions to the presenters during the ICML plenary break immediately following. 11:35 AM: Talk 4 (P26): Biological Mechanisms for Learning Predictive Models of the World and Generating Flexible Predictions (Ching Fang) 11:40 AM: Talk 5 (P02): Designing Perceptual Puzzles by Differentiating Probabilistic Programs (Kartik Chandra) 11:45 AM: Talk 6 (P10): Combining Functional and Automata Synthesis to Discover Causal Reactive Programs (Ria Das) 11:50 AM: Talk 7 (P14): Logical Activation Functions (Scott Lowe) Ching Fang · Kartik Chandra · Scott C Lowe · Ria Das 🔗 Fri 9:00 a.m. - 10:30 a.m. ICML Plenary Lunch Break ( Lunch Break ) 🔗 Fri 10:30 a.m. - 12:00 p.m. Session 2: Reasoning in Brains vs Machines ( Talks & Panel Discussion )    Moderator/MC: Emily Mackevicius Individual Talks (60 min) Talk 1 (20 min): Kim Stachenfeld, Senior Research Scientist, DeepMind Talk 2 (20 min): Tyler Bonnen, PhD Student, Department of Psychology, Stanford University Talk 3 (20 min): Ishita Dasgupta, Research Scientist at DeepMind Q+A and Group Panel Discussion (30 min) Emily Mackevicius · Kim Stachenfeld · tyler bonnen · Ishita Dasgupta 🔗 Fri 12:00 p.m. - 12:30 p.m. ICML Plenary Break 🔗 Fri 12:30 p.m. - 1:55 p.m. Session 3: New Computational Technologies for Reasoning ( Talks & Panel Discussion )    Moderator/MC: Armando Solar-Lezama Individual Talks (60 min) Talk 1 (20 min): Guy Van den Broeck, Associate Professor and Samueli Fellow, Computer Science Department, UCLA Talk 2 (20 min): Jan-Willem van de Meent, Associate Professor (Universitair Hoofddocent), University of Amsterdam Talk 3: Charles Sutton, Research Scientist at Google Brain; and a Reader (equivalent to Associate Professor: http://bit.ly/1W9UhqT) in Machine Learning at the University of Edinburgh - JOINING VIRTUALLY Q+A and Group Panel Discussion (25 min) Armando Solar-Lezama · Guy Van den Broeck · Jan-Willem van de Meent · Charles Sutton 🔗 Fri 1:55 p.m. - 2:00 p.m. Closing Remarks ( Stage Program Concludes ) 🔗 Fri 2:00 p.m. - 3:00 p.m. Poster Session ( In Person Only - Attending Authors in Sidebar ) 🔗 - P01: Maximum Entropy Function Learning ( Poster )  link » Authors: Simon Segert, Jonathan Cohen Abstract: Understanding how people generalize and extrapolate from limited amounts of data remains an outstanding challenge. We study this question in the domain of scalar function learning, and propose a simple model based on the Principle of Maximum Entropy (Jaynes, 1957). Through computational modeling, we demonstrate that the theory makes two specific predictions about peoples’ extrapolation judgments, that we validate through experiments. Moreover, we show that existing Gaussian Process models of function learning cannot account for these effects. Link » Simon Segert 🔗 - P02: Designing Perceptual Puzzles by Differentiating Probabilistic Programs ( Poster )  link » Authors: Kartik Chandra, Tzu-Mao Li, Joshua B. Tenenbaum, Jonathan Ragan-Kelley Abstract: We design new visual illusions by finding adversarial examples'' for principled models of human perception --- specifically, for probabilistic models, which treat vision as Bayesian inference. To perform this search efficiently, we design a \emph{differentiable} probabilistic programming language, whose API exposes MCMC inference as a first-class differentiable function. We demonstrate our method by automatically creating illusions for three features of human vision: color constancy, size constancy, and face perception. Link » Kartik Chandra 🔗 - P03: Interoception as Modeling, Allostasis as Control ( Poster )  link » Authors: Eli Zachary Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen Quigley Abstract: The brain regulates the body by anticipating its needs and attempting to meet them before they arise – a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory’s applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses. Link » Eli Sennesh 🔗 - P04: People Construct Simplified Mental Representations to Plan ( Poster )  link » Authors: Mark K Ho, David Abel, Carlos G. Correa, Michael Littman, Jonathan Cohen, Thomas L. Griffiths Abstract: One of the most striking features of human cognition is the ability to plan. Two aspects of human planning stand out—its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to many everyday problems despite having limited cognitive resources. Standard accounts in psychology, economics and artificial intelligence have suggested that human planning succeeds because people have a complete representation of a task and then use heuristics to plan future actions in that representation. However, this approach generally assumes that task representations are fixed. Here we propose that task representations can be controlled and that such control provides opportunities to quickly simplify problems and more easily reason about them. We propose a computational account of this simplification process and, in a series of preregistered behavioural experiments, show that it is subject to online cognitive control and that people optimally balance the complexity of a task representation and its utility for planning and acting. These results demonstrate how strategically perceiving and conceiving problems facilitates the effective use of limited cognitive resources. Link » Mark K Ho 🔗 - P05: Using Language and Programs to Instill Human Inductive Biases in Machines ( Poster )  link » Authors: Sreejan Kumar, Carlos G. Correa, Ishita Dasgupta, Raja Marjieh, Michael Hu, Robert D. Hawkins, Nathaniel Daw, Jonathan Cohen, Karthik R Narasimhan, Thomas L. Griffiths Abstract: Strong inductive biases are a key component of human intelligence, allowing people to quickly learn a variety of tasks. Although meta-learning has emerged as an approach for endowing neural networks with useful inductive biases, agents trained by meta-learning may acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and from programs induced to generate such tasks guides them toward human-like inductive biases. Human-generated language descriptions and program induction with library learning both result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without library learning), suggesting that the abstraction supported by these representations is key. Link » Sreejan Kumar 🔗 - P06: Automatic Inference with Pseudo-Marginal Hamiltonian Monte Carlo ( Poster )  link » Authors: Jinlin Lai, Daniel Sheldon Abstract: Pseudo-marginal Hamiltonian Monte Carlo (PM-HMC) is a technique for sampling the parameters from the posterior of Bayesian models. However, its usage within probabilistic programming frameworks is under-explored. We show that PM-HMC can be used to simplify the sampling problem for non-reparameterizable models, which complements existing methods in this area. Link » Jinlin Lai 🔗 - P07: MoCa: Cognitive Scaffolding for Language Models in Causal and Moral Judgment Tasks ( Poster )  link » Authors: Allen Nie, Atharva Amdekar, Christopher J Piech, Tatsunori Hashimoto, Tobias Gerstenberg Abstract: Human common sense understanding of the physical and social world is organized around intuitive theories. Two key building blocks of these intuitive theories are causality and morality. Causal and moral judgments come naturally to people: who did what, and why? There is a rich literature in psychology and cognitive science that has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the presence of norms, and whether the agent was aware of their action's potential consequences. Here, we investigate whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. We find that without any annotations, LLMs and human participants are misaligned (only 56%-60% agreement). However, LLMs can accurately annotate what factors are present in a scenario with simple expert-written instructions. We show how these annotations can guide LLMs to match participants' judgments more closely (69.7%-72% agreement). These results suggest that insights from cognitive science can help scaffold language models to align more closely with human intuitions in a challenging common-sense evaluation task. Link » Allen Nie 🔗 - P08: Map Induction: Compositional Spatial Submap Learning for Efficient Exploration in Novel Environments ( Poster )  link » Authors: Sugandha Sharma, Aidan Curtis, Marta Kryven, Joshua B. Tenenbaum, Ila R Fiete Abstract: Humans are expert explorers and foragers. Understanding the computational cognitive mechanisms that support this capability can advance the study of the human mind and enable more efficient exploration algorithms. We hypothesize that humans explore new environments by inferring the structure of unobserved spaces through re-use of spatial information collected from previously explored spaces. Taking inspiration from the neuroscience of repeating map fragments and ideas about program induction, we present a novel “Map Induction” framework, which involves the generation of novel map proposals for unseen environments based on compositions of already-seen spaces in a Hierarchical Bayesian framework. The model thus explicitly reasons about unseen spaces through a distribution of strong spatial priors. We introduce a new behavioral Map Induction Task (MIT) that involves foraging for rewards to compare human performance with state-of-the-art existing models and Map Induction. We show that Map Induction better predicts human behavior than the non-inductive baselines. We also show that Map Induction, when used to augment state-of-the-art approximate planning algorithms, improves their performance. Link » Sugandha Sharma 🔗 - P09: Towards a Neuroscience of "Stories”: Metric Space Learning in the Hippocampus ( Poster )  link » Authors: Zhenrui Liao, Attila Losonczy Abstract: The ability to recall, structure, and reason about learned knowledge is the {\it sine qua non} of general intelligence. Memory is not solely a task of faithfully replaying past experiences, but of constructing models (stories'') to understand them. We present a theory of how sophisticated world models can be constructed from the selfsame primitives of place cells, sequences, and cognitive maps well-known from rodent studies. Our central hypothesis is that the hippocampus is able to learn general metric spaces as cognitive maps. We test this theory by training mice to learn and solve relational queries in a virtual reality (VR) concept space which cannot be embedded in 2D Euclidean space. By performing two-photon calcium imaging of hippocampal area CA1 during this task, we find that neural representations agree with the predictions of our theory. This work experimentally tests a formalization of the widely-used conceptual model of thecognitive map'' decoupled from Euclidean space, with implications for how such maps are used to solve abstract reasoning tasks. Link » Zhenrui Liao 🔗 - P10: Combining Functional and Automata Synthesis to Discover Causal Reactive Programs ( Poster )  link » Authors: Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavares Abstract: While program synthesis has recently garnered interest as an alternative to deep-learning-based approaches to AI, it still faces several limitations. One is that existing methods cannot learn models with time-varying latent state, a common feature of real-world systems. We develop a new synthesis approach that overcomes this challenge by uniting two disparate communities within synthesis: functional synthesis and automata synthesis. We instantiate our algorithm in the domain of causal learning in 2D, Atari-style grid worlds, and our ongoing evaluation shows promising results. Link » Ria Das 🔗 - P11: MetaCOG: Learning a Meta-cognition to Recover what Objects are Actually There ( Poster )  link » Authors: Marlene Berke, Zhangir Azerbayev, Mario Belledonne, Zenna Tavares, Julian Jara-Ettinger Abstract: Humans do not unconditionally trust what they see, but instead use their meta-cognition to recognize when a percept might be unreliable or false, such as when we realize that we mistook one object for another. Inspired by this capacity, we propose a formalization of meta-cognition for object detection and we present MetaCOG, an instantiation of this approach. MetaCOG is a probabilistic model that learns, without supervision, a meta-cognition for object detection systems and uses this meta-cognition to refine beliefs about the locations and semantic labels of objects in a scene. We find that MetaCOG can quickly learn an accurate meta-cognitive representation of object detectors and use this meta-cognition to infer the objects in the world responsible for the detections. Link » Marlene Berke 🔗 - P12: Desiderata for Abstraction ( Poster )  link » Authors: Simon Alford, Zenna Tavares, Kevin Ellis Abstract: The concept of abstraction has been a recurring focus within artificial intelligence research. This paper makes the case for the continued importance of abstraction as a foundation of general learning and reasoning systems. We clarify a notion of and purpose for abstraction as yet unsatisfied by pure deep learning approaches, briefly review approaches to abstraction for program analysis, list a number of areas where state-of-the-art deep learning approaches stand to be improved by incorporating abstraction learning, and discuss some considerations of what it means to learn good abstractions. Link » Simon Alford 🔗 - P13: Estimating Categorical Counterfactuals via Deep Twin Networks ( Poster )  link » Authors: Athanasios Vlontzos, Bernhard Kainz, Ciarán Mark Gilligan-Lee Abstract: Counterfactual inference is a powerful tool, capable of solving challenging problems in high-profile sectors. To perform counterfactual inference, one requires knowledge of the underlying causal mechanisms. However, causal mechanisms cannot be uniquely determined from observations and interventions alone. This raises the question of how to choose the causal mechanisms so that resulting counterfactual inference is trustworthy in a given domain. This question has been addressed in causal models with binary variables, but the case of categorical variables remains unanswered. We address this challenge by introducing for causal models with categorical variables the notion of \emph{counterfactual ordering}, a principle that posits desirable properties causal mechanisms should posses, and prove that it is equivalent to specific functional constraints on the causal mechanisms. To learn causal mechanisms satisfying these constraints, and perform counterfactual inference with them, we introduce \emph{deep twin networks}. These are deep neural networks that, when trained, are capable of \emph{twin network} counterfactual inference---an alternative to the \emph{abduction, action, \& prediction} method. We empirically test our approach on diverse real-world and semi-synthetic data from medicine, epidemiology, and finance, reporting accurate estimation of counterfactual probabilities while demonstrating the issues that arise with counterfactual reasoning when counterfactual ordering is not enforced. Link » Alexander Fabian Spies 🔗 - P14: Logical Activation Functions: Logit-space Equivalents of Probabilistic Boolean Operators ( Poster )  link » Authors: Scott C Lowe, Robert Earle, Jason d'Eon, Thomas Trappenberg, Sageev Oore Abstract: The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independent probabilities. Such theories are important to formalize more complex dendritic operations in real neurons, and these operations can be used as activation functions within a neural network, introducing probabilistic Boolean-logic as the core operation of the neural network. Since these functions involve taking multiple exponents and logarithms, they are computationally expensive and not well suited to be directly used within neural networks. Consequently, we construct efficient approximations named $\text{AND}_\text{AIL}$ (the AND operator Approximate for Independent Logits), $\text{OR}_\text{AIL}$, and $\text{XNOR}_\text{AIL}$, which utilize only comparison and addition operations, have well-behaved gradients, and can be deployed as activation functions in neural networks. Like MaxOut, $\text{AND}_\text{AIL}$ and $\text{OR}_\text{AIL}$ are generalizations of ReLU to two-dimensions. While our primary aim is to formalize dendritic computations within a logit-space probabilistic-Boolean framework, we deploy these new activation functions, both in isolation and in conjunction to demonstrate their effectiveness on a variety of tasks including image classification, transfer learning, abstract reasoning, and compositional zero-shot learning. Link » Scott C Lowe 🔗 - P15: Bias of Causal Identification using Non-IID Data ( Poster )  link » Authors: Chi Zhang, Karthika Mohan, Judea Pearl Abstract: Traditional causal inference techniques assume data are independent and identically distributed (IID) and thus ignores interactions among units. In this paper, we analyze the bias of causal identification in linear models if IIDness is falsely assumed. Specifically, we discuss 1) when it is safe to apply traditional IID methods on non-IID data, 2) how large the bias is if IID methods are blindly applied, and 3) how to correct the bias. We present the results through a real-world example about estimating the causal effect of vaccination on sickness. Link » Chi Zhang 🔗 - P16: Bayesian Reasoning with Trained Neural Networks ( Poster )  link » Authors: Jakob Knollmüller, Torsten Ensslin Abstract: We show how to use trained neural networks to perform Bayesian reasoning in order to solve tasks outside their initial scope. Deep generative models provide prior knowledge, and classification/regression networks impose constraints. The tasks at hand are formulated as Bayesian inference problems, which we approximately solve through variational or sampling techniques. The approach builds on top of already trained networks, and the addressable questions grow super-exponentially with the number of available networks. In its simplest form, the approach yielded conditional generative models. However, multiple simultaneous constraints constitute elaborate questions. We compare the approach to specifically trained generators, show how to solve riddles, and demonstrate its compatibility with state-of-the-art architectures. Link » Jakob Knollmüller 🔗 - P17: Correcting Model Bias with Sparse Implicit Processes ( Poster )  link » Authors: Simon Rodriguez Santana, Luis A. Ortega, Daniel Hernández-Lobato, Bryan Zaldivar Abstract: Model selection in machine learning (ML) is a crucial part of the Bayesian learning procedure. Model choice may impose strong biases on the resulting predictions, which can hinder the performance of methods such as Bayesian neural networks and neural samplers. On the other hand, newly proposed approaches for Bayesian ML exploit features of approximate inference in function space with implicit stochastic processes (a generalization of Gaussian processes). The approach of Sparse Implicit Processes (SIP) is particularly successful in this regard, since it is fully trainable and achieves flexible predictions. Here, we expand on the original experiments to show that SIP is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model. Link » Simon R Santana 🔗 - P18: Abstract Interpretation for Generalized Heuristic Search in Model-Based Planning ( Poster )  link » Authors: Tan Zhi-Xuan, Joshua B. Tenenbaum, Vikash Mansinghka Abstract: Domain-general model-based planners often derive their generality by constructing search heuristics through formal analysis of symbolic world models. One approach to constructing these heuristics is to plan in a relaxed or abstracted model: by computing the cost of a solution in a relaxed model, it can be used as an (optimistic) estimate of the true cost, providing guidance in heuristic search algorithms. Some of the abstractions used by these heuristics are also used in model checking, while others are similar to those used in abstract interpretation of program semantics. However, they have typically been limited to propositional variables, with a few numeric extensions. Here we illustrate how abstract interpretation can serve as a unifying framework for these abstraction-based heuristics, extending the reach of heuristic search to richer world models that make use of more complex datatypes (e.g. sets), functions (e.g. trigonometry), and even models with uncertainty and probabilistic effects. Link » Tan Zhi-Xuan 🔗 - P19: Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI ( Poster )  link » Authors: Suzanna Sia, Anton Belyy, Amjad Almahairi, Madian Khabsa, Luke Zettlemoyer, Lambert Mathias Abstract: Evaluating an explanation's faithfulness is desired for many reasons such as trust, interpretability and diagnosing the sources of model's errors. In this work, which focuses on the NLI task, we introduce the methodology of Faithfulness-through-Counterfactuals, which first generates a counterfactual hypothesis based on the logical predicates expressed in the explanation, and then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic (i.e. if the new formula is \textit{logically satisfiable}). In contrast to existing approaches, this does not require any explanations for training a separate verification model. We first validate the efficacy of automatic counterfactual hypothesis generation, leveraging on the few-shot priming paradigm. Next, we conduct a sensitivity analysis to validate that our metric is sensitive to unfaithful explanations. Link » Suzanna Sia 🔗 - P20: Learning to Reason about and to Act on Cascading Events ( Poster )  link » Authors: Yuval Atzmon, Eli Meirom, Shie Mannor, Gal Chechik Abstract: Training agents to control a dynamic environment is a fundamental problem in AI. Many environment can be characterized by a small set of qualitatively distinct events. These events form chains or cascades, that capture the semantic behavior of the system. We often wish to change the system behavior, using a local intervention that propagates through the cascade until reaching a goal. For instance, one may reroute a truck in logistic chains to meet a special delivery, or trigger a biochemical cascade to switch the state of a cell. We introduce a new supervised learning setup called {\em Cascade}. An agent observes a system with a known dynamics evolving from some initial conditions. It is given a structured semantic instruction and needs to make a localized intervention that trigger a cascade of events, such that the system reaches an alternative (counterfactual) behavior. We provide a test-bed for this problem, consisting of physical objects. This problem is hard because the cascades make search space is highly fragmented and discontinuous. We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to efficiently search in exponentially large semantic trees of continuous spaces. We demonstrate that our approach learns to effectively follow instructions to intervene in previously unseen complex scenes. It can also reason about alternative outcomes, when provided an observed cascade of events. Link » Eli Meirom 🔗 - P21: Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme ( Poster )  link » Authors: William S. Moses, Valentin Churavy, Ludger Paehler, Jan Hückelheim, Sri Hari Krishna Narayanan, Michel Schanen, Johannes Doerfert originally published at SC '21 Abstract: Computing derivatives is key to many algorithms in scientific computing and machine learning such as optimization, uncertainty quantification, and stability analysis. Enzyme is a LLVM compiler plugin that performs reverse-mode automatic differentiation (AD) and thus generates high performance gradients of programs in languages including C/C++, Fortran, Julia, and Rust. Prior to this work, Enzyme and other AD tools were not capable of generating gradients of GPU kernels. Our paper presents a combination of novel techniques that make Enzyme the first fully automatic reversemode AD tool to generate gradients of GPU kernels. Since unlike other tools Enzyme performs automatic differentiation within a general-purpose compiler, we are able to introduce several novel GPU and AD-specific optimizations. To show the generality and efficiency of our approach, we compute gradients of five GPU-based HPC applications, executed on NVIDIA and AMD GPUs. All benchmarks run within an order of magnitude of the original program’s execution time. Without GPU and AD-specific optimizations, gradients of GPU kernels either fail to run from a lack of resources or have infeasible overhead. Finally, we demonstrate that increasing the problem size by either increasing the number of threads or increasing the work per thread, does not substantially impact the overhead from differentiation. Link » William Moses 🔗 - P22: Type Theory for Inference and Learning in Minds and Machines ( Poster )  link » Author: Felix Anthony Sosa Abstract: A unique property of human reasoning is generating reasonable hypotheses to novel queries: if I asked you what you ate yesterday, you would respond with a food, not a time. The thought that one would respond otherwise can be seen as humorous at best, or pathological at worst. While understanding how people generate hypotheses is of central importance to cognitive science, no satisfying formal system has been proposed. Motivated by this property of human reasoning, we speculate that a core component of any reasoning system is a type theory: a formal imposition of structure on the kinds of computations an agent can perform and how they're performed. We further motivate this system with three empirical observations: adaptive constraints on learning and inference (i.e. generating reasonable hypotheses), how people draw distinctions between improbability and impossibility, and the ability people have to reason about things at varying levels of abstraction. Link » Felix Sosa 🔗 - P23: Language Model Cascades ( Poster )  link » Authors: David Dohan, Aitor Lewkowycz, Jacob Austin, Winnie Xu, Yuhuai Wu, David Bieber, Raphael Gontijo-Lopes, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-Dickstein, Kevin Patrick Murphy, Charles Sutton Abstract: Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. These compositions are probabilistic models, and may be expressed in the language of graphical models with random variables whose values are complex data types such as strings. Cases with control flow and dynamic structure require techniques from probabilistic programming, and allow implementing disparate model structures and inference strategies in a unified language. We describe several existing techniques from this perspective, including scratchpads and chain of thought, verifiers, STaR, selection-inference, and tool use. We refer to the resulting programs as \emph{language model \cascades}. Link » David Dohan · Winnie Xu 🔗 - P24: Unifying Generative Models with GFlowNets ( Poster )  link » Authors: Dinghuai Zhang, Ricky T. Q. Chen, Nikolay Malkin, Yoshua Bengio Abstract: There are many frameworks for deep generative modeling, each often presented with their own specific training algorithms and inference methods. We present a short note on the connections between existing deep generative models and the GFlowNet framework~\citep{Bengio2021GFlowNetF}, shedding light on their overlapping traits and providing a unifying viewpoint through the lens of learning with Markovian trajectories. This provides a means for unifying training and inference algorithms, and provides a route to construct an agglomeration of generative models. Link » Dinghuai Zhang · Ricky T. Q. Chen 🔗 - P25: Proving Theorems using Incremental Learning and Hindsight Experience Replay ( Poster )  link » Authors: Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Stephen Marcus McAleer, Vlad Firoiu, Lei M Zhang, Doina Precup, Shibl Mourad Abstract: Traditional automated theorem proving systems for first-order logic depend on speed-optimized search and many handcrafted heuristics designed to work over a wide range of domains. Machine learning approaches in the literature either depend on these traditional provers to bootstrap themselves, by leveraging these heuristics, or can struggle due to limited existing proof data. The latter issue can be explained by the lack of a smooth difficulty gradient in theorem proving datasets; large gaps in difficulty between different theorems can make training harder or even impossible. In this paper, we adapt the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proof attempts. We build a first-order logic prover by disabling all the smart clause-scoring heuristics of the state-of-the-art E prover and replacing them with a clause-scoring neural network learned by using hindsight experience replay in an incremental learning setting. Clauses are represented as graphs and presented to transformer networks with spectral features. We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40. The proofs generated by our algorithm are also almost always significantly shorter than E's proofs. Link » Shibl Mourad 🔗 - P26: Biological Mechanisms for Learning Predictive Models of the World and Generating Flexible Predictions ( Poster )  link » Authors: Ching Fang, Dmitriy Aronov, Larry Abbott, Emily L Mackevicius Abstract: The predictive nature of the hippocampus is thought to support many cognitive behaviors, from memory to inferential reasoning. Inspired by the reinforcement learning literature, this notion has been formalized by describing the hippocampus as a predictive map called the successor representation (SR). The SR captures a number of observations about hippocampal activity. However, the algorithm does not provide a neural mechanism for how such representations arise. Here, we show the dynamics of a recurrent neural network naturally calculate the SR when the synaptic weights match the transition probability matrix. Interestingly, the predictive horizon can be flexibly modulated simply by changing the network gain. We derive simple, biologically plausible learning rules to learn the SR in a recurrent network. We show our model matches electrophysiological data. Taken together, our results suggest that predictive maps of the world are accessible in biological circuits and can support a broad range of cognitive functions. Link » Ching Fang 🔗 - P27: Explanatory Paradigms in Neural Networks ( Poster )  link » Authors: Mohit Prabhushankar, Ghassan AlRegib Abstract: In this article, we present a leap-forward expansion to the study of explainability in neural networks by considering explanations as answers to abstract reasoning-based questions. With $\emph{P}$ as the prediction from a neural network, these questions are $\emph{Why P?'}$, $\emph{What if not P?'}$, and $\emph{Why P, rather than Q?'}$ for a given contrast prediction $\emph{Q}$. The answers to these questions are observed correlations, counterfactuals, and contrastive explanations respectively. Together, these explanations constitute the abductive reasoning scheme. The term observed refers to the specific case of $\emph{post-hoc}$ explainability, when an explanatory technique explains the decision $\emph{P}$ after a trained neural network has made the decision. The primary advantage of viewing explanations through the lens of abductive reasoning-based questions is that explanations can be used as reasons while making decisions. The post-hoc field of explainability, that previously only justified decisions, becomes active by being involved in the decision making process and providing limited, but relevant and contextual interventions. The contributions of this article are: ($i$) realizing explanations as reasoning paradigms, ($ii$) providing a probabilistic definition of observed explanations and their completeness, ($iii$) creating a taxonomy for evaluation of explanations, and ($iv$) positioning gradient-based complete explanainability's replicability and reproducibility across multiple applications and data modalities, ($v$) code repositories, publicly available at $\url{https://github.com/olivesgatech/Explanatory-Paradigms}$. Link » Mohit Prabhushankar · Ghassan AlRegib 🔗 - P28: On the Generalization and Adaption Performance of Causal Models ( Poster )  link » Authors: Nino Scherrer, Anirudh Goyal, Stefan Bauer, Yoshua Bengio, Nan Rosemary Ke Abstract: Learning models that offer robust out-of-distribution generalization and fast adaptation is a key challenge in modern machine learning. Modelling causal structure into neural networks holds the promise to accomplish robust zero and few-shot adaptation. Recent advances in differentiable causal discovery have proposed to factorize the data generating process into a set of modules, i.e. one module for the conditional distribution of every variable where only causal parents are used as predictors. Such a modular decomposition of knowledge allows to adapt to distributions shifts by only updating a subset of parameters. In this work, we systematically study the generalization and adaption performance of such causal models by comparing it to monolithic models and structured models where the set of predictors is not constrained to causal parents. Our analysis shows that causal models outperform other models on both zero and few-shot adaptation in low data regimes and offer robust generalization. We also found that the effects are more significant for sparser graphs as compared to denser graphs. Link » Rosemary Nan Ke 🔗 - P29: Predicting Human Similarity Judgments Using Large Language Models ( Poster )  link » Authors: Raja Marjieh, Ilia Sucholutsky, Theodore Sumers, Nori Jacoby, Thomas L. Griffiths Abstract: Similarity judgments provide a well-established method for accessing mental representations, with applications in psychology, neuroscience and machine learning. However, collecting similarity judgments can be prohibitively expensive for naturalistic datasets as the number of comparisons grows quadratically in the number of stimuli. We leverage recent advances in language models and online recruitment, proposing an efficient domain-general procedure for predicting human similarity judgments based on text descriptions. Crucially, the number of descriptions required grows only linearly with the number of stimuli, drastically reducing the amount of data required. We test this procedure on six datasets of naturalistic images and show that our models outperform previous approaches based on visual information. Link » Ilia Sucholutsky 🔗 - P30: Meta-Learning Real-Time Bayesian AutoML For Small Tabular Data ( Poster )  link » Authors: Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter Abstract: We present TabPFN, an AutoML method that is competitive with the state of the art on small tabular datasets while being over 1\,000$\times$ faster. TabPFN not only outperforms boosted trees, the state-of-the-art standalone method, but is en par with complex AutoML systems, that tune and select ensembles of a range of methods. Our method is fully entailed in the weights of a single neural network, and a single forward pass directly yields predictions for a new dataset. TabPFN is meta-learned using the Transformer-based Prior-Data Fitted Network (PFN) architecture and approximates Bayesian inference with a prior that is based on assumptions of simplicity and causal structures. The prior contains a large space of structural causal models with a bias for small architectures and thus low complexity. Furthermore, we extend the PFN approach to differentiably calibrate the prior's hyperparameters on real data. By doing so, we separate our abstract prior assumptions from their heuristic calibration on real data. Afterwards, the calibrated hyperparameters are fixed and TabPFN can be applied to any new tabular dataset at the push of a button. Finally, on 30 datasets from the OpenML-CC18 suite we show that our method outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with predictions produced in less than a second. Our code and pretrained models are available at https://anonymous.4open.science/r/TabPFN-2AEE. Link » Frank Hutter · Katharina Eggensperger 🔗 - P31: Can Humans Do Less-Than-One-Shot Learning? ( Poster )  link » Authors: Maya Malaviya, Ilia Sucholutsky, Kerem Oktar, Thomas L. Griffiths Abstract: Being able to learn from small amounts of data is a key characteristic of human intelligence, but exactly {\em how} small? In this paper, we introduce a novel experimental paradigm that allows us to examine classification in an extremely data-scarce setting, asking whether humans can learn more categories than they have exemplars (i.e., can humans do less-than-one shot'' learning?). An experiment conducted using this paradigm reveals that people are capable of learning in such settings, and provides several insights into underlying mechanisms. First, people can accurately infer and represent high-dimensional feature spaces from very little data. Second, having inferred the relevant spaces, people use a form of prototype-based categorization (as opposed to exemplar-based) to make categorical inferences. Finally, systematic, machine-learnable patterns in responses indicate that people may have efficient inductive biases for dealing with this class of data-scarce problems. Link » Maya Malaviya 🔗 - P32: Collapsed Inference for Bayesian Deep Learning ( Poster )  link » Authors: Zhe Zeng, Guy Van den Broeck Abstract: Bayesian deep learning performs well at providing prediction accuracy and calibrated uncertainty. Current research has been focused on scalability by imposing simplistic assumptions on posteriors and predictive distributions, which harms the prediction performances. While an accurate estimation of the posterior is critical to performance, doing so is computationally expensive and prohibitive in practice since it would require running a long Monte Carlo chain. In this paper, we explore a trade-off between reliable inference and algorithm scalability. The main idea is to use collapsed samples: while doing full Bayesian inference, we sample some of the stochastic weights and maintain tractable conditional distributions for the others, which are applicable to exact inference. This is possible by encoding the Bayesian ReLU neural networks into probabilistic Satisfiability Modulo Theories models and further leveraging a recently proposed tool that is able to perform exact inference for such models. We illustrate our proposed collapsed Bayesian deep learning algorithm on regression tasks. Empirical results show significant improvements over the existing Bayesian deep learning approaches. Link » Zhe Zeng 🔗 - P33: ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy ( Poster )  link » Authors: Daniel Zeng, Tailin Wu, Jure Leskovec Abstract: Visual relations form the basis of understanding our compositional world, as relationships between visual objects capture key information in a scene. It is then advantageous to learn relations automatically from the data, as learning with predefined labels cannot capture all possible relations. However, current relation learning methods typically require supervision, and are not designed to generalize to scenes with more complicated relational structures than those seen during training. Here, we introduce ViRel, a method for unsupervised discovery and learning of $\textbf{Vi}$sual $\textbf{Rel}$ations with graph-level analogy. In a setting where scenes within a task share the same underlying relational subgraph structure, our learning method of contrasting isomorphic and non-isomorphic graphs discovers the relations across tasks in an unsupervised manner. Once the relations are learned, ViRel can then retrieve the shared relational graph structure for each task by parsing the predicted relational structure. Using a dataset based on grid-world and the Abstract Reasoning Corpus, we show that our method achieves above 95% accuracy in relation classification, discovers the relation graph structure for most tasks, and further generalizes to unseen tasks with more complicated relational structures. Link » Daniel Zeng 🔗 - P34: ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time ( Poster )  link » Authors: Tailin Wu, Megan Tjandrasuwita, Zhengxuan Wu, Xuelin Yang, Kevin Liu, Rok Sosic, Jure Leskovec Abstract: Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner. Given a high-level, symbolic description of a novel concept in terms of previously learned visual concepts and their relations, humans can recognize novel concepts without seeing any examples. Moreover, they can acquire new concepts by parsing and communicating symbolic structures using learned visual concepts and relations. Endowing these capabilities in machines is pivotal in improving their generalization capability at inference time. In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way. ZeroC represents concepts as graphs of constituent concept models (as nodes) and their relations (as edges). To allow inference time composition, we employ energy-based models (EBMs) to model concepts and relations. We design ZeroC architecture so that it allows a one-to-one mapping between a symbolic graph structure of a concept and its corresponding EBM, which allows acquiring new concepts, communicating its graph structure, and applying it to classification and detection tasks at inference time. We introduce algorithms for learning and inference with ZeroC. We evaluate ZeroC on a challenging grid-world dataset which is designed to probe zero-shot concept recognition and acquisition, and demonstrate its capability. Link » Xuelin Yang · Tailin Wu 🔗 - P35: Hybrid AI Integration Using Implicit Representations With Scruff ( Poster )  link » Authors: Avi Pfeffer, Michael Harradon, Sanja Cvijic, Joseph Campolongo Abstract: In this paper, we present a general framework for building hybrid AI systems called Scruff. The key idea behind Scruff is \emph{implicit programming} which provides a principled framework for combining different kinds of model components in a joint model. It enables algorithms that typically have been defined only for neural or symbolic components to generalize and jointly execute on all components while maintaining their meaning. PARSNIP’s implicit programming is based on hierarchical predictive processing (HPP). Implicit programming enables non-generative components such as neural networks to be included in generative predictive processing models, as if they were generative. These components are interpreted as participating implicitly in a generative process, through their ability to support well-defined mathematical operations on the process. This powerful abstraction enables algorithms such as importance sampling, belief propagation, and gradient descent, to generalize across representations while maintaining their well-understood properties. Link » Avi Pfeffer 🔗 - P36: Large Language Models are Zero-Shot Reasoners ( Poster )  link » Authors: Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa Abstract: Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding `Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars. Link » Shixiang Gu 🔗 - P37: Structured, Flexible, and Robust: Benchmarking and Improving Large Language Models Towards More Human-like Behavior in Out-of-Distribution Reasoning Tasks ( Poster )  link » Authors: Katherine M. Collins, Catherine Wong, Jiahei Feng, Megan Wei, Joshua B. Tenenbaum Abstract: Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning. Link » Jiahai Feng 🔗