Workshop
Knowledge and Logical Reasoning in the Era of Data-driven Learning
Nezihe Merve Gürel · Bo Li · Theodoros Rekatsinas · Beliz Gunel · Alberto Sngiovanni Vincentelli · Paroma Varma
Meeting Room 301
Thinking fast and automatic vs. slow and deliberate (respectively System I and II) is a popular analogy when comparing data-driven learning to the good old-fashion symbolic reasoning approaches. Underlying this analogy lies the different capabilities of both systems, or lack thereof. While data-driven learning (System I) has striking performance advantages over symbolic reasoning (System II), it lacks abilities such as abstraction, comprehensibility and contextual awareness. Symbolic reasoning, on the other hand, tackles those issues but tends to lag behind data-driven learning when it comes to speedy, efficient and automated decision-making. In the current state of matters to combat issues on both sides, there is an increasing consensus among the machine learning and artificial intelligence communities to draw out the best of both worlds and unify data-driven approaches with rule-based, symbolic, logical and commonsense reasoning. This workshop aims to discuss emerging advances and challenges on this topic, in particular at the intersection of data-driven paradigms and knowledge and logical reasoning. We focus on both directions of this intersection:
Knowledge and Logical Reasoning for Data-driven Learning: In this direction, we will investigate the role of rule-based, knowledge and logical reasoning to enable more deliberate and trustworthy data-driven learning.
Data-driven Learning for Knowledge and Logical Reasoning: In this reverse direction, we will explore the capabilities of data-driven approaches to derive knowledge, logical and commonsense reasoning from data.
Schedule
Fri 12:00 p.m. - 12:15 p.m.
|
Opening Remarks
(
Opening
)
SlidesLive Video |
Nezihe Merve Gürel 🔗 |
Fri 12:15 p.m. - 12:45 p.m.
|
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
(
Invited Talk
)
SlidesLive Video This presentation considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator (MDI) is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky MDIs. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports. |
Samy Bengio 🔗 |
Fri 12:45 p.m. - 1:15 p.m.
|
AI can Learn from Data. But can it Learn to Reason?
(
Invited Talk
)
SlidesLive Video Many expect that AI will go from powering chatbots to providing mental health services. That it will go from advertisement to deciding who is given bail. The expectation is that AI will solve society’s problems by simply being more intelligent than we are. Implicit in this bullish perspective is the assumption that AI will naturally learn to reason from data: that it can form trains of thought that “make sense”, similar to how a mental health professional or judge might reason about a case, or more formally, how a mathematician might prove a theorem. This talk will investigate the question whether this behavior can be learned from data, and how we can design the next generation of AI techniques that can achieve such capabilities, focusing on constrained language generation, neuro-symbolic learning and tractable deep generative models. |
Guy Van den Broeck 🔗 |
Fri 1:15 p.m. - 1:30 p.m.
|
ICML Coffee Break
(
Break
)
|
🔗 |
Fri 1:30 p.m. - 2:00 p.m.
|
Reasoning Biases in Language Models
(
Invited Talk
)
SlidesLive Video While two systems of reasoning have been a useful abstraction, emergent reasoning (in humans and LLMs) seems to be more intertwined. I'll start by presenting some work highlighting the challenges of interpreting emergent reasoning as two distinct systems and present a few directions for unifying the systems -- focusing on using soft supervision signals from system 2 sources, toward improving traditionally system 1 agents. |
Ishita Dasgupta 🔗 |
Fri 2:00 p.m. - 2:15 p.m.
|
Bayesian Neural Networks with Domain Knowledge
(
Contributed Talk
)
SlidesLive Video Prior knowledge about particular domains can help inform deep learning models to perform better and exhibit desirable behavior, combatting some of the issues with unfair or biased datasets. In this paper, we propose a general framework via variational inference to incorporate such prior information into Bayesian neural networks (BNNs). We learn an informative prior over neural network weights that assigns high probability mass to neural network weights that capture our domain knowledge, leading to a predictor (through posterior averaging) that also exhibits this behavior. We demonstrate that this approach improves upon standard BNNs and is comparable to frequentist approaches across many datasets with different types of prior information, including fairness, physics rules, and healthcare knowledge. |
Dylan Sam 🔗 |
Fri 2:15 p.m. - 2:30 p.m.
|
Neural Priority Queues for GNNs
(
Contributed Talk
)
SlidesLive Video Graph Neural Networks (GNNs) have shown considerable success in neural algorithmic reasoning. Many traditional algorithms make use of an explicit memory in the form of a data structure. However, there has been limited exploration on augmenting GNNs with external memory. In this paper, we present Neural Priority Queues, a differentiable analogue to algorithmic priority queues, for GNNs. We propose and motivate a desiderata for memory modules, and show that Neural PQs exhibit the desiderata, and reason about their use with algorithmic reasoning. This is further demonstrated by empirical results on the CLRS-30 dataset. Furthermore, we find the Neural PQs useful in capturing long-range interactions, as empirically shown on a dataset from the Long-Range Graph Benchmark. |
Petar Veličković 🔗 |
Fri 2:30 p.m. - 2:45 p.m.
|
Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts
(
Contributed Talk
)
SlidesLive Video Neuro-Symbolic (NeSy) predictive models hold the promise of improved compliance with given constraints, systematic generalization, and interpretability, as they allow to infer labels that are consistent with some prior knowledge by reasoning over high-level concepts extracted from sub-symbolic inputs. It was recently shown that NeSy predictors are affected by reasoning shortcuts: they can attain high accuracy but by leveraging concepts with unintended semantics, thus coming short of their promised advantages. Yet, a systematic characterization of reasoning shortcuts and of potential mitigation strategies is missing. This work fills this gap by characterizing them as unintended optima of the learning objective and identifying four key conditions behind their occurrence. Based on this, we derive several natural mitigation strategies, and analyze their efficacy both theoretically and empirically. Our analysis shows reasoning shortcuts are difficult to deal with, casting doubts on the trustworthiness and interpretability of existing NeSy solutions. |
Emanuele Marconato 🔗 |
Fri 3:15 p.m. - 4:00 p.m.
|
Lunch Break
(
Break
)
|
🔗 |
Fri 4:00 p.m. - 4:30 p.m.
|
Avenging Polanyi's Revenge: Exploiting the Approximate Omniscience of LLMs in Planning without Deluding Yourself In the Process
(
Invited Talk
)
SlidesLive Video LLMs are on track to reverse what seemed like an inexorable shift of AI from explicit to tacit knowledge tasks. Trained as they are on everything ever written on the web, LLMs exhibit "approximate omniscience"--they can provide answers to all sorts of queries, with nary a guarantee. This could herald a new era for knowledge-based AI systems--with LLMs taking the role of (blowhard?) experts. But first, we have to stop confusing the impressive form of the generated knowledge for correct content, and resist the temptation to ascribe reasoning powers to approximate retrieval by these n-gram models on steroids. We have to focus instead on LLM-Modulo techniques that complement the unfettered idea generation of LLMs with careful vetting by model-based AI systems. In this talk, I will reify this vision and attendant caveats in the context of the role of LLMs in planning tasks. |
Subbarao Kambhampati 🔗 |
Fri 4:30 p.m. - 5:00 p.m.
|
Concept Learning Across Domains and Modalities
(
Invited Talk
)
SlidesLive Video |
Jiajun Wu 🔗 |
Fri 5:00 p.m. - 5:30 p.m.
|
Knowledge and Skill Acquisition through Language Model Pre-training and Instruction-tuning
(
Invited Talk
)
SlidesLive Video |
Xi Victoria Lin 🔗 |
Fri 5:30 p.m. - 6:00 p.m.
|
Large Neural Models' Self-Learning Symbolic Knowledge
(
Invited Talk
)
SlidesLive Video Recent large neural models have shown impressive performance on various data modalities, including natural language, vision, programming language and molecules. However, they still have surprising deficiency (near-random performance) in acquiring certain types of knowledge such as structured knowledge and action knowledge In this talk I propose a two-way knowledge acquisition framework to make symbolic and neural learning approaches mutually enhance each other. In the first stage, we will elicit and acquire explicit symbolic knowledge from Large neural models. In the second stage, we will leverage the acquired symbolic knowledge to augment and enhance these big models. I will present two recent case studies to demonstrate this framework: (1) The first task is to induce event schemas (stereotypical structures of events and their connections) from large language models by incremental prompting and verification [Li et al., ACL2023], and apply the induced schemas to enhance event extraction and event prediction. (2) In the second task, we noticed that current large video-language models rely on object recognition abilities as a shortcut for action understanding. We utilize a Knowledge Patcher network to elicit new action knowledge from the current models and a Knowledge Fuser component to integrate the Patcher into frozen video-language models. |
🔗 |
Fri 6:00 p.m. - 6:15 p.m.
|
ICML Coffee Break
(
Break
)
|
🔗 |
Fri 6:15 p.m. - 7:15 p.m.
|
Panel on Reasoning Capabilities of LLMs
(
Panel
)
SlidesLive Video |
Guy Van den Broeck · Ishita Dasgupta · Subbarao Kambhampati · Jiajun Wu · Xi Victoria Lin · Samy Bengio · Beliz Gunel 🔗 |
Fri 7:15 p.m. - 7:55 p.m.
|
Poster Session 2
(
Poster Session
)
|
🔗 |
Fri 7:55 p.m. - 8:00 p.m.
|
Closing Remarks
(
Remarks
)
|
🔗 |
-
|
SQA3D: Situated Question Answering in 3D Scenes
(
Poster
)
We propose a new task to benchmark scene understanding and knowledge-intensive reasoning of embodied agents: SQA3D: Situated Question Answering in 3D Scenes. Given a scene context (\eg, 3D scan), SQA3D requires the tested agent to first understand its \textbf{situation} (position, orientation, etc.) in the 3D scene as described by text, then reason about its surrounding environment and answer a question under that situation. Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8k unique situations, along with 20.4k descriptions and 33.4k diverse reasoning questions for these situations. These questions examine a wide spectrum of reasoning capabilities for an intelligent agent, ranging from spatial relation comprehension to commonsense understanding, navigation, and multi-hop reasoning. SQA3D imposes a significant challenge to current multi-modal especially 3D reasoning models. We evaluate various state-of-the-art approaches and find that the best one only achieves an overall score of 47.20%, while amateur human participants can reach 90.06\%. We believe SQA3D could facilitate future embodied AI research with stronger situation understanding and reasoning capabilities. Code and data will be released. |
Xiaojian Ma · Silong Yong · Zilong Zheng · Qing Li · Yitao Liang · Song-Chun Zhu · Siyuan Huang 🔗 |
-
|
Retrieval-Augmented Multimodal Language Modeling
(
Poster
)
Recent multimodal models such as DALL-E and CM3 have achieved remarkable progress in text-to-image and image-to-text generation. However, these models store all their knowledge (e.g., the appearance of the Eiffel Tower) in the model parameters, requiring increasingly larger models and training data to capture more knowledge. To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e.g., documents on the web). Specifically, for the retriever, we use a pretrained CLIP, and for the generator, we train a CM3 Transformer on the LAION dataset. Our resulting model, named Retrieval-Augmented CM3 (RA-CM3), is the first multimodal model that can retrieve and generate both text and images. We show that RA-CM3 significantly outperforms baseline multimodal models such as DALL-E and CM3 on both image and caption generation tasks (12 FID and 17 CIDEr improvements on MS-COCO), while requiring much less compute for training (<30% of DALL-E). Moreover, we show that RA-CM3 exhibits novel capabilities such as faithful image generation and multimodal in-context learning (e.g., image generation from demonstrations). |
Michihiro Yasunaga · Armen Aghajanyan · Weijia Shi · Rich James · Jure Leskovec · Percy Liang · Mike Lewis · Luke Zettlemoyer · Wen-tau Yih 🔗 |
-
|
On the Aggregation of Rules for Knowledge Graph Completion
(
Poster
)
Rule learning approaches for knowledge graph completion are efficient, interpretable and competitive to purely neural models. The rule aggregation problem is concerned with finding one plausibility score for a candidate which was simultaneously predicted by multiple rules. Although the problem is ubiquitous, as data-driven rule learning can result in noisy and large rule sets, it is underrepresented in the literature and its theoretical foundations have not been studied before in this context. In this work, we demonstrate that existing aggregation approaches can be expressed with performing marginal inference over the predicting rules. In particular, we show that the common Max-aggregation strategy, which scores candidates based on the rule with the highest confidence, has a probabilistic interpretation. Finally we propose an efficient and overlooked baseline which combines the previous strategies and is competitive to more expensive approaches. |
Patrick Betz · Stefan Lüdtke · Christian Meilicke · Heiner Stuckenschmidt 🔗 |
-
|
Large Language Model Programs
(
Poster
)
In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples. The possibility to parameterise an LLM through such in-context examples widens their capability at a much lower cost than finetuning. We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embedding it within an algorithm or program. To demonstrate the benefits of this approach, we present an illustrative example of evidence-supported question-answering. We obtain a 6.4\% improvement over the chain of thought baseline through a more algorithmic approach without any finetuning. Furthermore, we highlight recent work from this perspective and discuss the advantages and disadvantages in comparison to the standard approaches. |
Imanol Schlag · Sainbayar Sukhbaatar · Asli Celikyilmaz · Wen-tau Yih · Jason Weston · Jürgen Schmidhuber · Xian Li 🔗 |
-
|
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
(
Poster
)
Large language models (LLMs) have shown promise in proving formal theorems in proof assistants such as Lean. However, existing methods are difficult to reproduce or build upon, due to private code, data, interactive environments, and distributed systems with hundreds of GPUs. These have created substantial barriers to research on machine learning for theorem proving. In this paper, we break the barriers through open toolkits, benchmarks, and models. We introduce LeanDojo: A tool for extracting data from Lean and interacting with the proof environment programmatically. LeanDojo features fine-grained annotations of premises in proofs, providing valuable data for premise selection—a key bottleneck in theorem proving. Using LeanDojo, we construct a challenging benchmark for theorem proving. Furthermore, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover augmented with retrieval for selecting premises from a vast math library. And we design novel recipes for training the premise retriever. Experimental results demonstrate the effectiveness of our method over non-retrieval baselines and GPT-4. ReProver is the first LLM-based prover trainable on a single GPU and without proprietary datasets. We will release our data, model, and code to facilitate future research. |
Kaiyu Yang · Aidan Swope · Alexander Gu · Rahul Chalamala · Shixing Yu · Saad Godil · Ryan Prenger · Animashree Anandkumar 🔗 |
-
|
Semantically Adversarial Scene Generation with Explicit Knowledge Guidance for Autonomous Driving
(
Poster
)
Generating adversarial scenes that potentially fail autonomous driving systems provides an effective way to improve their robustness. Extending purely data-driven generative models, recent specialized models satisfy additional controllable requirements such as embedding a traffic sign in a driving scene by manipulating patterns implicitly at the neuron level. In this paper, we introduce a method to incorporate domain knowledge explicitly in the generation process to achieve Semantically Adversarial Generation (SAG). To be consistent with the composition of driving scenes, we first categorize the knowledge into two types, the property of objects and the relationship among objects. We then propose a tree-structured variational auto-encoder (T-VAE) to learn hierarchical scene representation. By imposing semantic rules on the properties of nodes and edges into the tree structure, explicit knowledge integration enables controllable generation. To demonstrate the advantage of structural representation, we construct a synthetic example to illustrate the controllability and explainability of our method in a succinct setting. We further extend to realistic environments for autonomous vehicles, showing that our method efficiently identifies adversarial driving scenes against different state-of-the-art 3D point cloud segmentation models and satisfies the traffic rules specified as explicit knowledge. |
Wenhao Ding · Haohong Lin · Bo Li · Ding Zhao 🔗 |
-
|
Towards true discovery of the differential equations
(
Poster
)
link
Differential equation discovery, a machine learning subfield, is used to develop interpretable models, particularly in nature-related applications. By expertly incorporating the general parametric form of the equation of motion and appropriate differential terms, algorithms can autonomously uncover equations from data. This paper explores the prerequisites and tools for independent equation discovery without expert input, eliminating the need for equation form assumptions. We focus on addressing the challenge of assessing the adequacy of discovered equations when the correct equation is unknown, with the aim of providing insights for reliable equation discovery without prior knowledge of the equation form. |
Alexander Hvatov · Roman Titov 🔗 |
-
|
VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming
(
Poster
)
We present VAEL, a neuro-symbolic generative model integrating variational autoencoders (VAE) with the reasoning capabilities of probabilistic logic (L) programming. Besides standard latent subsymbolic variables, our model exploits a probabilistic logic program to define a further structured representation, which is used for logical reasoning. The entire process is end-to-end differentiable. Once trained, VAEL can solve new unseen generation tasks by (i) leveraging the previously acquired knowledge encoded in the neural component and (ii) exploiting new logical programs on the structured latent space. Our experiments provide support on the benefits of this neuro-symbolic integration both in terms of task generalization and data efficiency. To the best of our knowledge, this work is the first to propose a general-purpose end-to-end framework integrating probabilistic logic programming into a deep generative model. |
Eleonora Misino · Giuseppe Marra · Emanuele Sansone 🔗 |
-
|
Explanatory Learning: Towards Artificial Scientific Discovery
(
Poster
)
Explanations are the fuel of progress, the fundamental tool through which humans have increased their agency, earning more and more control over their future throughout history. So far, the production of explanations has been a unique prerogative of humans, who greatly improved the process over the last centuries with the emergence of the scientific method. In this work, we try to formalize this epistemological breakthrough to make it digestible by a machine, with the ultimate goal of building an artificial scientist and breaking the monopoly of humans in producing new symbolic explanations. Our Explanatory Learning (EL) construction stands over the Machine Learning field. Unlike traditional AI methods based on human-coded interpreters--such as program synthesis--EL builds upon the notion that a true artificial scientist can only emerge when a machine is capable of autonomously interpreting symbols. Consequently, EL necessitates a learned interpreter, trained on a limited set of raw strings hiding explanations, paired with observations of the corresponding phenomena--akin to a science book written in hieroglyphic. To exemplify the challenges of EL, we present Odeen, a basic environment that simulates a small universe full of phenomena to explain. Finally, we introduce Critical Rationalist Networks, a deep learning approach to EL aligned with the Popperian view of knowledge acquisition. Using Odeen as a testbed, we show how CRNs outperform standard empiricist end-to-end approaches of similar size and architecture (Transformers) in discovering explanations for unseen phenomena. |
Antonio Norelli · Giorgio Mariani · Luca Moschella · Andrea Santilli · Giambattista Parascandolo · Simone Melzi · Emanuele Rodola 🔗 |
-
|
A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs
(
Poster
)
Reasoning on large-scale knowledge graphs has been long dominated by embedding methods. While path-based methods possess the inductive capacity that embeddings lack, their scalability is limited by the exponential number of paths. Here we present ANet, a scalable path-based method for knowledge graph reasoning. Inspired by the A algorithm for shortest path problems, our ANet learns a priority function to select important nodes and edges at each iteration, to reduce time and memory footprint for both training and inference. The ratio of selected nodes and edges can be specified to trade off between performance and efficiency. Experiments on both transductive and inductive knowledge graph reasoning benchmarks show that ANet achieves competitive performance with existing state-of-the-art path-based methods, while merely visiting 10% nodes and 10% edges at each iteration. On a million-scale dataset ogbl-wikikg2, ANet not only achieves a new state-of-the-art result, but also converges faster than embedding methods. ANet is the first path-based method for knowledge graph reasoning at such scale. |
Zhaocheng Zhu · Xinyu Yuan · Mikhail Galkin · Louis-Pascal Xhonneux · Ming Zhang · Maxime Gazeau · Jian Tang 🔗 |
-
|
Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal
(
Poster
)
We introduce Neuro-Symbolic Continual Learning, where a model has to solve a sequence of neuro-symbolic tasks, that is, it has to map sub-symbolic inputs to high-level concepts and compute predictions by reasoning consistently withprior knowledge. Our key observation is that neuro-symbolic tasks, although different, often share concepts whose semantics remains stable over time. Traditional approaches fall short: existing continual strategies ignore knowledge altogether, while stock neuro-symbolic architectures suffer from catastrophic forgetting. We show that leveraging prior knowledge by combining neuro-symbolic architectures with continual strategies does help avoid catastrophic forgetting, but also that doing so can yield models affected by reasoning shortcuts. These undermine the semantics of the acquired concepts, even when detailedprior knowledge is provided upfront and inference is exact, and in turn continual performance. To overcome these issues, we introduce COOL, a COncept-level cOntinual Learning strategy tailored for neuro-symbolic continual problems thatacquires high-quality concepts and remembers them over time. Our experiments on three novel benchmarks highlights how COOL attains sustained high performance on neuro-symbolic continual learning tasks in which other strategies fail. |
Emanuele Marconato · Gianpaolo Bontempo · ELISA FICARRA · Simone Calderara · Andrea Passerini · Stefano Teso 🔗 |
-
|
Modeling Human Few-Shot Learning using Bayesian Inference over Natural Language
(
Poster
)
We give a computational model of how humans learn abstract symbolic concepts from few examples. Our model performs Bayesian inference over utterances in natural language. For efficient inference, it uses a large language model as a proposal distribution, and can be fit to human data in order to tune its prior to match human patterns of generalization. We evaluate our model on a generative concept learning setup, as well as a logical concept learning domain. |
Kevin Ellis 🔗 |
-
|
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models
(
Poster
)
In this paper, we introduce DiversiGATE, a unified framework that consolidates diverse methodologies for LLM verification. The proposed framework comprises two main components: Diversification and Aggregation which provide a holistic perspective on existing verification approaches, such as Self-Consistency, Math Prompter and WebGPT. Furthermore, we propose a novel SelfLearner model that conforms to the SelfLearner framework which can learn from its own outputs and refine its performance over time, leading to improved accuracy. To evaluate the effectiveness of SelfLearner, we conducted a rigorous series of experiments, including tests on synthetic data as well as on popular arithmetic reasoning benchmarks such as GSM8K. Our results demonstrate that our approach outperforms traditional LLMs, achieving a considerable 54.8%->61.8% improvement on the GSM8K benchmark. |
Shima Imani · Ali Beyram · Harsh Shrivastava 🔗 |
-
|
OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning
(
Poster
)
A key aspect of human intelligence is the ability to imagine --- composing learned concepts in novel ways --- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositonal Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that our modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization. We compare our model to existing and new baselines in proposed visual reasoning benchmark that consists of applying arithmetic operations to MNIST digits. |
Rim Assouel · Pau Rodriguez · Perouz Taslakian · David Vazquez · Yoshua Bengio 🔗 |
-
|
Look, Remember and Reason: Visual Reasoning with Grounded Rationales
(
Poster
)
Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of "Look, Remember, Reason": visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks. |
Apratim Bhattacharyya · Sunny Panchal · Reza Pourreza · Pulkit Madan · Mingu Lee · Roland Memisevic 🔗 |
-
|
Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
(
Poster
)
We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi- step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose “Describe, Explain, Plan and Select” (DEPS), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan by integrating description of the plan execution process and providing self-explanation of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly doubles the overall performances. Further testing reveals our method’s general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the ObtainDiamond grand challenge with our approach. |
Zihao Wang · Shaofei Cai · Guanzhou Chen · Anji Liu · Xiaojian Ma · Yitao Liang 🔗 |
-
|
Recursive Algorithmic Reasoning
(
Poster
)
link
Learning models that execute algorithms can enable us to address a key problem in deep learning: generalizing to out-of-distribution data. However, neural networks are currently unable to execute recursive algorithms because they do not have arbitrarily large memory to store and recall state. To address this, we (1) propose a way to augment graph neural networks (GNNs) with a stack, and (2) develop an approach for capturing intermediate algorithm trajectories that improves algorithmic alignment with recursive algorithms over previous methods. The stack allows the network to learn to store and recall a portion of the state of the network at a particular time, analogous to the action of a call stack in a recursive algorithm. This augmentation permits the network to reason recursively. We empirically demonstrate that our proposals significantly improve generalization to larger input graphs over prior work on depth-first search (DFS). |
Dulhan Jayalath · Jonas Jürß · Petar Veličković 🔗 |
-
|
EXPLAIN, AGREE and LEARN: A Recipe for Scalable Neural-Symbolic Learning
(
Poster
)
Recent progress in the field of neural-symbolic AI (NeSy) has demonstrated that neural networks can benefit greatly from an integration with symbolic reasoning methods in terms of interpretability, data-efficiency and generalisation performance. Unfortunately, the symbolic component can lead to intractable computations for more complicated domains. This computational bottleneck has prevented the successful application of NeSy to more practical problems. We present EXPLAIN, AGREE and LEARN, an alternative paradigm that addresses the scalability problem of logic-based NeSy learning. EXPLAIN leverages sampling to obtain a representative set of possible explanations for the logic component driven by a newly introduced diversity criterion. Then AGREE assigns importance to the sampled explanations based on the neural predictions. This defines the learning objective, which for sufficiently many samples is guaranteed to coincide with the objective used by exact NeSy approaches, such as DeepProbLog. Using this objective, LEARN updates the neural component with direct supervision on its outputs, without the need to propagate the gradient through the logic component. Our approximate paradigm and its theoretical guarantees are experimentally supported and shown to compete with existing exact NeSy frameworks, while outperforming them in terms of scalability. |
Victor Verreet · Lennert De Smet · Emanuele Sansone 🔗 |
-
|
Semantic Conditioning at Inference : Improving Neural-based Systems with Logical Background Knowledge
(
Poster
)
Neuro-symbolic AI is a growing field of research aiming to combine neural networks learning capabilities with the reasoning abilities of symbolic systems. This hybridization can take many shapes. In this paper, we propose an approach that leverages logical background knowledge to improve a neural-based system solving a task of structured multi-label classification. In the literature, two main neuro-symbolic approaches have been proposed for this integration : semantic conditioning and semantic regularization. We introduce a third neuro-symbolic technique called semantic conditioning at inference (SCI), a modification of semantic conditioning which only constrains the system during inference. We also develop a methodology to quantitatively estimate the overall improvements brought by SCI and apply it on several vision datasets. The results indicate that SCI can be used to improve the parameters and data efficiency of neural-based systems while increasing their asymptotic accuracy. |
Arthur Ledaguenel · Céline Hudelot · Mostepha Khouadjia 🔗 |
-
|
Continuous-Discrete Message Passing for Graph Logic Reasoning
(
Poster
)
The message-passing principle is used in the most popular neural networks for graph-structured data. However, current message-passing approaches use black-box neural models that transform features over continuous domain, thus limiting the reasoning capability of GNNs. Traditional neural networks fail to model reasoning over discrete variables.In this work, we explore a novel type of message passing based on a differentiable satisfiability solver. Our model learns logical rules that encode which and how messages are passed from one node to another node. The rules are learned in a relaxed continuous space, which renders the training process end-to-end differentiable and thus enables standard gradient-based training. Our experiments show that MaxSAT-GNN learns arithmetic operations and that is on par with state-of-the-art GNNs, when tested on graph structured data. |
Cristóbal Corvalán Morbiducci · Francesco Alesiani · Markus Zopf 🔗 |
-
|
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
(
Poster
)
Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to longer and compositional proofs. However, they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction. |
Abulhair Saparov · Richard Yuanzhe Pang · Vishakh Padmakumar · Nitish Joshi · Seyed Mehran Kazemi · Najoung Kim · He He 🔗 |
-
|
Evidence of Meaning in Language Models Trained on Programs
(
Poster
)
We present evidence that language models can learn meaning despite being trained only to perform next token prediction on text, specifically a corpus of programs. Each program is preceded by a specification in the form of (textual) input-output examples. Working with programs enables us to precisely define concepts relevant to meaning in language (e.g., correctness and semantics), making program synthesis well-suited as an intermediate testbed for characterizing the presence (or absence) of meaning in language models.We first train a Transformer model on the corpus of programs, then probe the trained model's hidden states as it completes a program given a specification. Despite providing no inductive bias toward learning the semantics of the language, we find that a linear probe is able to extract abstractions of both current and future program states from the model states. Moreover, there is a strong, statistically significant correlation between the accuracy of the probe and the model's ability to generate a program that implements the specification. To evaluate whether the semantics are represented in the model states rather than learned by the probe, we design a novel experimental procedure that intervenes on the semantics of the language while preserving the lexicon and syntax. We also demonstrate that the model learns to generate correct programs that are, on average, shorter than those in the training set, which is evidence that language model outputs may differ from the training distribution in semantically meaningful ways. In summary, this paper does not propose any new techniques for training language models, but develops an experimental framework for and provides insights into the acquisition and representation of (formal) meaning in language models. |
Charles Jin · Martin Rinard 🔗 |
-
|
Neurosymbolic AI for Reasoning on Biomedical Knowledge Graphs
(
Poster
)
Biomedical datasets are often modeled as knowledge graphs (KGs) because they capture the multi-relational, heterogeneous, and dynamic natures of biomedical systems. KG completion (KGC), can, therefore, help researchers make predictions to inform tasks like drug repositioning. While previous approaches for KGC were either rule-based or embedding-based, hybrid approaches based on neurosymbolic artificial intelligence are becoming more popular. Many of these methods possess unique characteristics which make them even better suited toward biomedical challenges. Here, we survey such approaches with an emphasis on their utilities and prospective benefits for biomedicine. |
Lauren Nicole DeLong · Ramon Fernández Mir · Zonglin Ji · Fiona Niamh Coulter Smith · Jacques D. Fleuriot 🔗 |
-
|
Neural Priority Queues for GNNs
(
Poster
)
We present Neural Priority Queues, a differentiable analogue to algorithmic priority queues for GNNs. We propose and motivate a desiderata for memory modules, and show that Neural PQs exhibit the desiderata, and reason about their use with algorithmic reasoning. This is further demonstrated by empirical results on the CLRS-30 dataset. Furthermore, we empirically show the effectiveness of Neural PQs with long-range reasoning. |
Rishabh Jain · Petar Veličković · Pietro Lió 🔗 |
-
|
Exposing Attention Glitches with Flip-Flop Language Modeling
(
Poster
)
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of \emph{attention glitches},in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning.To isolate the issue, we introduce \emph{flip-flop language modeling} (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs. |
Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang 🔗 |
-
|
Does End-to-End Visual Pretraining Help Reasoning?
(
Poster
)
We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual abstraction (e.g. object detection) is essential for compositional generalization on visual reasoning, and confirm the feasibility of a neural network |
Chen Sun · Calvin Luo · Xingyi Zhou · Anurag Arnab · Cordelia Schmid 🔗 |
-
|
On the Planning Abilities of Large Language Models - A Critical Investigation
(
Poster
)
Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs as a source of heuristic guidance for other agents (AI planners) in their planning tasks. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs’ ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the heuristic mode show more promise. In the heuristic mode, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners and additionally show that external verifiers can help provide feedback on the generated plans and back-prompt the LLM for better plan generation. |
Karthik Valmeekam · Matthew Marquez · Sarath Sreedharan · Subbarao Kambhampati 🔗 |
-
|
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
(
Poster
)
There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and human domain experts. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. |
Lin Guan · Karthik Valmeekam · Sarath Sreedharan · Subbarao Kambhampati 🔗 |
-
|
On The Ability of Transformers To Learn Recursive Patterns
(
Poster
)
Neural networks have in recent years shown promise for helping software engineers write programs and even formally verify them. While semantic information plays a crucial part in these processes, it remains unclear to what degree popular neural architectures like transformers are capable of modeling that information.This paper examines the behavior of neural networks learning algorithms relevant to programs and formal verification proofs through the lens of mechanistic interpretability, focusing in particular on structural recursion. Structural recursion is at the heart of tasks on which symbolic tools currently outperform neural models, like inferring semantic relations between datatypes and emulating program behavior. We evaluate the ability of transformer models to learn to emulate the behavior of structurally recursive functions from input-output examples. Our evaluation includes empirical and conceptual analyses of the limitations and capabilities of transformer models in approximating these functions, as well as reconstructions of the ``shortcut'' algorithms the model learns. By reconstructing these algorithms, we are able to \textit{correctly predict} 91\% of failure cases for one of the approximated functions. Our work provides a new foundation for understanding the behavior of neural networks that fail to solve the very tasks they are trained for. |
Dylan Zhang · Curt Tigges · Talia Ringer · Stella Biderman · Maxim Raginsky 🔗 |
-
|
Reasoning Ability Emerges in Large Language Models as Aggregation of Reasoning Paths
(
Poster
)
This study focuses on the emergence of reasoning abilities in large language models (LLMs). While LLMs have shown remarkable capabilities in complex reasoning tasks, the exact origin of this ability and its relationship to pre-training and fine-tuning stages remain unclear. Previous research has explored in-context learning but has not fully addressed reasoning abilities such as logical reasoning or math deduction. The paper proposes investigating reasoning in LLMs through reasoning over knowledge graphs. The experiments demonstrate the importance of the pre-training sequence in enabling effective reasoning. The findings suggest that LLMs acquire reasoning abilities during pre-training rather than fine-tuning. Furthermore, training LLMs with next-token prediction enables them to aggregate relevant reasoning paths and derive new conclusions. The empirical results support the explanation of LLMs predicting unseen facts using a path ranking algorithm. |
Xinyi Wang · William Wang 🔗 |
-
|
Exploring the Impact of Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning
(
Poster
)
Spatial reasoning over text is challenging as the models not only need to extract the direct spatial information from the text but also reason over those and infer implicit spatial relations. Recent studies highlight the struggles even large-scale language models encounter when it comes to performing spatial reasoning over text.In this paper, we explore the potential benefits of disentangling the processes of information extraction and reasoning to address this challenge.To explore this, we design various models that disentangle extraction and reasoning~(either symbolic or neural) and compare them with pretrained language model baselines, which have state-of-the-art results. Our experimental results consistently demonstrate the efficacy of disentangling, showcasing its ability to enhance models' generalizability within realistic data domains. |
Roshanak Mirzaee · Parisa Kordjamshidi 🔗 |
-
|
Plan, Eliminate, and Track --- Language Models are Good Teachers for Embodied Agents.
(
Poster
)
Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications. |
Yue Wu · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Yuanzhi Li · Tom Mitchell · Shrimai Prabhumoye 🔗 |
-
|
SPRING: Studying Papers and Reasoning to play Games
(
Poster
)
Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. |
Yue Wu · Shrimai Prabhumoye · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Tom Mitchell · Yuanzhi Li 🔗 |
-
|
Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts
(
Poster
)
Neuro-Symbolic (NeSy) predictive models hold the promise of improved compliance with given constraints, systematic generalization, and interpretability, as they allow to infer labels that are consistent with some prior knowledge by reasoning over high-level concepts extracted from sub-symbolic inputs. It was recently shown that NeSy predictors are affected by reasoning shortcuts: they can attain high accuracy but by leveraging concepts with unintended semantics, thus coming short of their promised advantages. Yet, a systematic characterization of reasoning shortcuts and of potential mitigation strategies is missing. This work fills this gap by characterizing them as unintended optima of the learning objective and identifying four key conditions behind their occurrence. Based on this, we derive several natural mitigation strategies, and analyze their efficacy both theoretically and empirically. Our analysis shows reasoning shortcuts are difficult to deal with, casting doubts on the trustworthiness and interpretability of existing NeSy solutions. |
Emanuele Marconato · Stefano Teso · Antonio Vergari · Andrea Passerini 🔗 |
-
|
Parallel Algorithms Align with Neural Execution
(
Poster
)
Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve strongly superior predictive performance in most cases. |
Valerie Engelmayer · Dobrik Georgiev · Petar Veličković 🔗 |
-
|
Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models
(
Poster
)
There have been wide spread claims in the literature about the emergent reasoning capabilities of Pretrained Large Language Models. However, recent studies, have found that their ability to plan remains questionable. Through our experiments using GPT-2, we empirically demonstrate that the performance of a finetuned baseline remains poor because it violates pre-conditions of actions in the plans that it generates. To improve the planning capabilities of a finetuned LLM, we train a verifier, which can classify actions as being valid or invalid in a particular state. By randomly sampling actions from the same dataset, we generate examples of invalid actions which are then used to train a verifier which can check for action applicability. In the presence of diverse sampling from a generator and a verifier which can prune invalid trajectories, we show significant gains in the success rate on the Blocksworld domain. Additionally, we show that finetuning the GPT-2 generator itself to create the verifier generalizes better than finetuning the base GPT-2. Lastly, we investigate the role of the sampling temperature which can be used to control the exploration-exploitation tradeoff. |
Daman Arora · Subbarao Kambhampati 🔗 |
-
|
Latent Space Representations of Neural Algorithmic Reasoners
(
Poster
)
Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, largely by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed over the execution of the algorithm. In this work we perform a detailed analysis in order to understand the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure models: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms from CLRS-30 when using state-of-the-art Triplet-GMPNN processor. |
Vladimir V. Mirjanić · Razvan Pascanu · Petar Veličković 🔗 |
-
|
Addressing Descrepancies in Semantic and Visual Alignment in Neural Networks
(
Poster
)
For the task of image classification, neural networks primarily rely on visual patterns. In robust networks, we would expect for visually similar classes to be represented similarly. We consider the problem of when semantically similar classes are visually dissimilar, and when visual similarity is present among non-similar classes. We propose a data augmentation technique with the goal of better aligning semantically similar classes with arbitrary (non-visual) semantic relationships. We leverage recent work in diffusion-based semantic mixing to generate semantic hybrids of two classes, and these hybrids are added to the training set as augmented data. We evaluate whether the method increases semantic alignment by evaluating model performance on adversarially perturbed data, with the idea that it should be easier for an adversary to switch one class to a similarly represented class. Results demonstrate that there is an increase in alignment of semantically similar classes when using our proposed data augmentation method. |
Natalie Abreu · Nathan Vaska · Victoria Helus 🔗 |
-
|
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data
(
Poster
)
The impressive advances and applications of large language and joint language-and-visual understanding models has led to an increased need for methods of probing their potential reasoning capabilities. However, the difficulty of gather naturally-occurring data for complex multi-modal reasoning tasks bottlenecks the evaluation of AI methods on tasks which are not already covered by an academic dataset. In this work, we leverage recent advances in high resolution text-to-image generation to develop a framework for generating evaluation data for multi-modal reasoning tasks. We apply this framework to generate context-dependent anomaly data, creating a synthetic dataset on a challenging task which is not well covered by existing datasets. We benchmark the performance of a state-of-the-art visual question answering (VQA) model against data generated with this method, and demonstrate that while the task is tractable, the model performs significantly worse on the context-dependent anomaly detection task than on standard VQA tasks. |
Nathan Vaska · Victoria Helus 🔗 |
-
|
Towards More Likely Models for AI Planning
(
Poster
)
This is the first work to look at the application of large language models (LLMs) for the purpose of model space edits in automated planning tasks. To set the stage for this sangam, we start by enumerating the different flavors of model space problems that have been studied so far in the AI planning literature and explore the effect of an LLM on those tasks with detailed illustrative examples. We also empirically demonstrate how the performance of an LLM contrasts with combinatorial search (CS) -- an approach that has been traditionally used to solve model space tasks in planning, both with the LLM in the role of a standalone model space reasoner as well as in the role of a statistical modeling tool in concert with the CS approach as part of a two-stage process. Our experiments show promising results suggesting further forays of LLMs into the exciting world of model space reasoning for planning tasks in the future. |
Turgay Caglar · sirine belhaj · Tathagata Chakraborti · Michael Katz · Sarath Sreedharan 🔗 |
-
|
A Pseudo-Semantic Loss for Deep Generative Models with Logical Constraints
(
Poster
)
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. This often requires maximizing the likelihood of a symbolic constraint w.r.t. the neural net- work’s output distribution. Such output distributions are typically assumed to be fully-factorized. This limits the applicability of neuro-symbolic learning to the more expressive auto-regressive distributions, e.g., transformers. Under such distributions, computing the likelihood of even simple constraints is #P-hard. Instead of attempting to enforce the constraint on the entire output distribution, we propose to do so on a random, local approximation thereof. More precisely, we optimize the likelihood of the constraint under a pseudolikelihood-based approximation centered around a model sample. Our approximation is factorized, allowing the reuse of solutions to sub-problems—a main tenet for efficiently computing neuro-symbolic losses. Moreover, it is a local, high-fidelity approximation of the likelihood, exhibiting low entropy and KL-divergence around the model sample. We evaluate our approach on Sudoku and shortest-path prediction cast as auto-regressive generation, and observe that we greatly improve upon the base model’s ability to predict logically-consistent outputs. We also evaluate on the task of detoxifying large language models. Using a simple constraint disallowing a list of toxic words, we are able to steer the model’s out- puts away from toxic generations, achieving SoTA detoxification compared to previous approaches. |
Kareem Ahmed · Kai-Wei Chang · Guy Van den Broeck 🔗 |
-
|
Asynchronous Algorithmic Alignment with Cocycles
(
Poster
)
State-of-the-art neural algorithmic reasoners make use of message passing in graph neural networks (GNNs). But typical GNNs blur the distinction between the definition and invocation of the message function, forcing a node to send messages to its neighbours at every layer, synchronously. When applying GNNs to learn to execute dynamic programming algorithms, however, on most steps only a handful of the nodes would have meaningful updates to send. One, hence, runs the risk of inefficiencies by sending too much irrelevant data across the graph---with many intermediate GNN steps having to learn identity functions. In this work, we explicitly separate the concepts of node state update and message function invocation. With this separation, we obtain a mathematical formulation that allows us to reason about asynchronous computation in both algorithms and neural networks. |
Andrew Dudzik · Tamara von Glehn · Razvan Pascanu · Petar Veličković 🔗 |
-
|
Learning with Explanation Constraints
(
Poster
)
In addition to the labeled data that we use to train models in supervised learning, we may have prior information about how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of our models. For what models would explanations be helpful? Our first key contribution addresses this question via the definition of what we call EPAC models (models that satisfy these constraints in expectation over new data), and we analyze this class of models using standard learning theoretic tools. Our second key contribution is to characterize these restrictions (in terms of their Rademacher complexities) for a canonical class of explanations given by gradient information for linear models and two layer neural networks. Finally, we provide an algorithmic solution for our framework, via a variational approximation that achieves better performance and satisfies these constraints more frequently, when compared to simpler augmented Lagrangian methods to incorporate these explanations. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments. |
Rattana Pukdee · Dylan Sam · Nina Balcan · Pradeep Ravikumar 🔗 |
-
|
BoardgameQA: Natural Language Reasoning with Contradictory Information
(
Poster
)
Automated reasoning with unstructured natural text is an important field of research with many potential applications, and is rapidly growing thanks to recent advancements in Language Models (LMs).Existing benchmarks for automated reasoning assume access to a consistent set of knowledge over which a model reasons, which does not capture common real-world scenarios where information is noisy and sometimes contradictory. In many applications, conflicts can often be resolved by imposing preferences over information sources (e.g., based on the credibility of the source). In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical AI problem of defeasible reasoning, and develop a question-answering benchmark called BoardgameQA for measuring the defeasible reasoning capacity of LMs. Additionally, BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. Experiments with three types of models (finetuning with and without proof steps and few-shot prompting) show that LMs perform poorly when reasoning with conflicting information, especially in the few-shot case, and the amount of background knowledge required compounds this difficulty even further. |
Mehran Kazemi · Quan Yuan · Deepti Bhatia · Najoung Kim · Xin Xu · Vaiva Imbrasaite · Deepak Ramachandran 🔗 |
-
|
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
(
Poster
)
Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} reason-based prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting achieves almost the same performance gains as logically valid CoT prompting and that editing CoT prompts to replace problem-specific information with either abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easy tasks to draw broad generalizations. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest subset of tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that confounders beyond logically valid reasoning are responsible for performance improvements. |
Rylan Schaeffer · Kateryna Pistunova · Samar Khanna · Sarthak Consul · Sanmi Koyejo 🔗 |
-
|
(Un)interpretability of Transformers: a case study with Dyck grammars
(
Poster
)
Transformers are typically trained on large datasets using the next-token prediction or masked language modeling objectives. Do these \emph{data-driven} training approaches guide Transformers to approximately implement some known \emph{rule-based} algorithms? Prior works seek to understand the algorithm implemented by a learned Transformer by peering and probing individual aspects of the model, such as the weight matrices or the attention patterns. In this work, through a combination of theoretical results and carefully controlled experiments on synthetic data, we take a critical view of methods that exclusively focus on individual parts of the model, rather than consider the network as a whole. We consider a simple synthetic setup of learning a Dyck language. Theoretically, we show that the set of models that can solve this task (exactly or approximately) satisfy a structural characterization derived from ideas in formal languages (the pumping lemma). We use this characterization to show that the set of optima is qualitatively rich: in particular, the attention pattern of a single layer can be ``nearly randomized'', while preserving the functionality of the network. We also show via extensive experiments that these constructions are not merely a theoretical artifact: even with severe constraints to the architecture of the model, vastly different solutions can be reached via standard training. Thus, interpretability claims based on individual heads or weight matrices in the Transformer can be misleading. |
Kaiyue Wen · Yuchen Li · Bingbin Liu · Andrej Risteski 🔗 |
-
|
dPASP: A Comprehensive Differentiable Probabilistic Answer Set Programming Environment For Neurosymbolic Learning and Reasoning
(
Poster
)
We present dPASP, a novel declarative probabilistic logic programming framework for differentiable neuro-symbolic reasoning. The framework allows for the specification of discrete probabilistic models with neural predicates, logic constraints and interval-valued probabilistic choices, thus supporting models that combine low-level perception (images, texts, etc), common-sense reasoning, and (vague) statistical knowledge. To support all such features, we discuss the several semantics for probabilistic logic programs that can express nondeterministic, contradictory, incomplete and/or statistical knowledge. We also discuss how gradient-based learning can be performed with neural predicates and probabilistic choices under selected semantics. We then describe an implemented package that supports inference and learning in the language, along with several example programs. The package requires minimal user knowledge of deep learning system's inner workings, while allowing end-to-end training of rather sophisticated models and loss functions. |
Renato Geh · Jonas Goncalves · Igor Silveira · Denis D Maua · Fabio Cozman 🔗 |
-
|
Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models
(
Poster
)
link
We focus on the challenge of out-of-distribution (OOD) detection in deep learning models, a crucial aspect in ensuring reliability. Despite considerable effort, the problem remains significantly challenging in deep learning models due to their propensity to output over-confident predictions for OOD inputs. We propose a novel one-class open-set OOD detector that leverages text-image pre-trained models in a zero-shot fashion and incorporates various descriptions of in-domain and OOD. Our approach is designed to detect anything not in-domain and offers the flexibility to detect a wide variety of OOD, defined via fine- or coarse-grained labels, or even in natural language. We evaluate our approach on challenging benchmarks including large-scale datasets containing fine-grained, semantically similar classes, distributionally shifted images, and multi-object images containing a mixture of in-domain and OOD objects. Our method shows superior performance over previous methods on all benchmarks. |
Yunhao Ge · Jie Ren · Jiaping Zhao · Kaifeng Chen · Andrew Gallagher · Laurent Itti · Balaji Lakshminarayanan 🔗 |
-
|
Bayesian Neural Networks with Domain Knowledge
(
Poster
)
Prior knowledge about particular domains can help inform deep learning models to perform better and exhibit desirable behavior, combatting some of the issues with unfair or biased datasets. In this paper, we propose a general framework via variational inference to incorporate such prior information into Bayesian neural networks (BNNs). We learn an informative prior over neural network weights that assigns high probability mass to neural network weights that capture our domain knowledge, leading to a predictor (through posterior averaging) that also exhibits this behavior. We demonstrate that this approach improves upon standard BNNs and is comparable to frequentist approaches across many datasets with different types of prior information, including fairness, physics rules, and healthcare knowledge. |
Dylan Sam · Rattana Pukdee · Daniel Jeong · Yewon Byun · Zico Kolter 🔗 |
-
|
A Model-Theoretic Approach to Natural Language Inference
(
Poster
)
Natural Language Inference (NLI) is a natural language processing task that seeks to identify whether one sentence entails another. The traditional approach to NLI has been to train a model on a large corpora of data. However, these models are often black-boxes and offer no explanation about their decisions. In this work, we apply model-theoretics, a framework from adopted from formal logic and linguistics, to solving NLI tasks. To simulate the model-theoretic hypothesis of entailment, we use a language model to generate natural language contexts for a pair of sentences and then define a new classification method to evaluates these contexts and determine entailment relations. Because this approach applies a logical framework to language, it provides much more interpretability than traditional NLI methods. This work-in-progress paper applies this method to preexisting NLI datasets and demonstrates that our method shows promise in achieving high-levels of accuracy without requiring model training or the availability of a large corpora of training data. |
Dennis Tang 🔗 |
-
|
Disaster Occurrence Detection through GNN Models using Disaster Knowledge Graphs
(
Poster
)
In the context of the increasing scale and complexity of disasters caused by rapid climate change, a comprehensive understanding of disaster big data is essential for effective detection and response. The disaster knowledge graph proposed in this paper fills this gap by capturing the connections between various disaster-related data sources and their potential for growth across heterogeneous datasets. We generate time-series disaster graphs every minute using SNS data (e.g, Twitter) with a specific focus on disasters. Then, we create disaster knowledge graphs to represent the relationships between various data sources and try to predict their potential developments. For disaster detection, we label and annotate knowledge graphs and then detect sudden changes in time-series disaster knowledge graphs. To that end, we assess the effectiveness of three state-of-the-art GNN models for graph-based event classification using Graph Convolutional Network (GCN), Graph Attention Network (GAT), and SageConv. Our experiments show promising results in detecting disaster events using structural data and connectivity patterns within disaster graphs. As a result, our approach can combine the strength of GNNs with a curated disaster knowledge graph to allow for a thorough analysis of real-time social media data for better disaster management and response strategies. |
Seonhyeong Kim · Irshad Khan · Young-Woo Kwon 🔗 |
-
|
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
(
Poster
)
Generating a sequence of intermediate steps, \emph{a.k.a.}, a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Formally, given input length $n$, we show that constant-depth transformers can solve $\mathsf{NC}^1$-complete problems like wording problem of $S_5$ provided with $O(n)$ steps of CoT and $\mathsf{poly}(n)$ embedding size. We further show constant-depth transformers can solve any problem in $\mathsf{P/poly}$ provided with $O(\mathsf{poly}(n))$ steps of CoT and $O(\log(n))$ embedding size. In contrast, it is shown (Liu et al., 2022) that constant-depth transformers without CoT can only solve problems in $\mathsf{TC}^0$. Under the unproven but widely believed assumption that $\mathsf{TC}^0\subsetneq \mathsf{NC}^1 \subsetneq \mathsf{Ppoly}$, allowing a longer chain of thought fundamentally increases the expressiveness of transformers. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
|
Zhiyuan Li · Hong Liu · Denny Zhou · Tengyu Ma 🔗 |
-
|
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
(
Poster
)
Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik. |
Hejie Cui · Xinyu Fang · Zihan Zhang · Ran Xu · Xuan Kan · Xin Liu · Manling Li · Yangqiu Song · Carl Yang 🔗 |
-
|
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
(
Poster
)
Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been comparably much less success in solving visual reasoning. Intuitively, unifying these two tasks under a singular framework is desirable, as they are mutually dependent and beneficial. Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning tasks with a generic interface (e.g., tokens) for all tasks. Our framework enables the principled investigation of how different visual recognition tasks, datasets, and inductive biases can help enable spatiotemporal reasoning capabilities. Noticeably, we find that object detection, which requires spatial localization of individual objects, is the most beneficial recognition task for reasoning. We further demonstrate via probing that implicit object-centric representations emerge automatically inside our framework. We also discover that visual reasoning and object detection respond to drastically different model components; certain architectural choices such as the backbone model of the visual encoder have a significant impact on visual reasoning, but little on object detection. Given the results of our experiments, we believe that a fruitful direction forward is to consider visual reasoning a first-class citizen alongside visual recognition, as they are strongly correlated but benefit from potentially different design choices. |
Calvin Luo · Boqing Gong · Ting Chen · Chen Sun 🔗 |
-
|
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
(
Poster
)
While the successes of transformers across many domains are indisputable, accurate understanding of what knowledge transformers learn to encode is still largely lacking. Their capabilities have been probed on benchmarks which include a variety of structured and reasoning tasks -- but mathematical understanding is lagging substantially behind. Recent lines of work have begun studying representational aspects of this question: that is, the size/depth/complexity of attention-based networks to perform certain tasks. However, there is no guarantee the learning dynamics will converge to the constructions proposed. In our paper, we provide fine-grained mechanistic understanding of how transformers learn "semantic structure", understood as capturing co-occurrence structure of words. Precisely, we show, through a combination of mathematical analysis and experiments on Wikipedia data and synthetic data modeled by Latent Dirichlet Allocation (LDA), that the embedding layer and the self-attention layer encode the topical structure. In the former case, this manifests as higher average inner product of embeddings between same-topic words. In the latter, it manifests as higher average pairwise attention between same-topic words. The mathematical results involve several assumptions to make the analysis tractable, which we verify on data, and might be of independent interest as well. |
Yuchen Li · Yuanzhi Li · Andrej Risteski 🔗 |
-
|
Large Language Models are Zero-Shot Multi-Tool Users
(
Poster
)
We introduce Actions, a framework and programming environment to facilitate the implementation of tool-augmented language models (LMs). Concretely, we augment LMs with the ability to call actions (arbitrary Python functions), and experiment with different ways of tool discovery and invocation. We find that, while previous works heavily rely on few-shot prompting to teach tool use, a zero-shot, instruction-only approach is enough to achieve competitive performance. At the same time, Actions zero-shot approach also offers a much simpler programming interface, not requiring any involved demonstrations. Building on this, we show how Actions enables LLMs to automatically discover and combine multiple tools to solve complex tasks. Overall, we find that inline tool use as enabled by Actions, outperforms existing tool augmentation approaches, both in arithmetic reasoning tasks and text-based question answering. Our implementation extends the open source LMQL programming language for LM interaction and is available at ANONYMIZED (upon publication). |
Luca Beurer-Kellner · Marc Fischer · Martin Vechev 🔗 |
-
|
Training LLMs with Noisy Algorithmic Chain of Thought
(
Poster
)
Much recent effort has gone into distilling large language model chain of thought (CoT) capabilities by training smaller models directly on sampled traces. However less attention is paid to the quality or \textit{noisiness} of distilled CoT and how this impacts supervised performance. We begin study on this problem in the highly controlled setting of algorithmically solvable tasks on lists of integers. To do so we develop the \textit{TInt} framework to generate highly customizable noisy algorithmic chains of thought for evaluating arbitrary functions on lists of integers. Using this framework, we first benchmark performance baselines for arithmetic and list median finding tasks with and without CoT, while studying best practices for designing good algorithmic CoT. We then introduce three types of noise to the tasks, studying the effect on performance. We find training with algorithmic CoT is remarkably robust to \textit{static noise}, which preserves CoT form while mutating content, even when the entire dataset is contaminated. However \textit{dynamic noise}, which alters both the form and content of CoT, is more destructive even at lower dataset noise levels. |
Alex Havrilla 🔗 |
-
|
The Role of Semantic Parsing in Understanding Procedural Text
(
Poster
)
In this paper, we investigate whether symbolic semantic representations, extracted from deep semantic parsers, can help to reason over the states of involved entities in a procedural text. We consider a deep semantic parser~(TRIPS) and semantic role labeling as two sources of semantic parsing knowledge. First, we propose PROPOLIS, a symbolic parsing-based procedural reasoning framework.Second, we integrate semantic parsing information into state-of-the-art neural models to conduct procedural reasoning.Our experiments indicate that explicitly incorporating such semantic knowledge improves procedural understanding. This paper presents new metrics for evaluating procedural reasoning tasks that clarify the challenges and identify differences among neural, symbolic, and integrated models. |
Hossein Rajaby Faghihi · Parisa Kordjamshidi · Choh Man Teng · James Allen 🔗 |
-
|
Partial Label Learning meets Active Learning: Enhancing Annotation Efficiency through Binary Questioning
(
Poster
)
Supervised learning is an effective approach to machine learning, but it can be expensive to acquire labeled data. Active learning (AL) and partial label learning (PLL) are two techniques that can be used to reduce the annotation costs of supervised learning. AL is a strategy for reducing the annotation budget by selecting and labeling the most informative samples, while PLL is a weakly supervised learning approach to learn from partially annotated data by identifying the true hidden label. In this paper, we propose a novel approach that combines AL and PLL techniques to improve annotation efficiency. Our method leverages AL to select informative binary questions and PLL to identify the true label from the set of possible answers. We conduct extensive experiments on various benchmark datasets and show that our method achieves state-of-the-art (SoTA) performance with significantly reduced annotation costs. Our findings suggest that our method is a promising solution for cost-effective annotation in real-world applications. |
Shivangana Rawat · Chaitanya Devaguptapu · Vineeth Balasubramanian 🔗 |
-
|
Learning to Initiate and Reason in Event-Driven Cascading Processes
(
Poster
)
Training agents to control a dynamic environment is a fundamental task in AI. In many environments, the dynamics can be summarized by a small set of events that capture the semantic behavior of the system. Typically, these events formchains or cascades. We often wish to change the system behavior using a single intervention that propagates through the cascade. For instance, one may trigger a biochemical cascade to switch the state of a cell or, in logistics, reroute a truck tomeet an unexpected, urgent delivery. We introduce a new supervised learning setup called Cascade. An agent observes a system with known dynamics evolving from some initial state. The agent is given a structured semantic instruction and needs to make an intervention that triggers a cascade of events, such that the system reaches an alternative (counterfactual) behavior. We provide a test-bed for this problem, consisting of physical objects.We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to efficiently search in exponentially large semantic trees. We demonstrate that our approach learns to follow instructions to intervenein new complex scenes. When provided with an observed cascade of events, it can also reason about alternative outcomes. |
Yuval Atzmon · Eli Meirom · Shie Mannor · Gal Chechik 🔗 |
-
|
LLM-grounded Text-to-Image Diffusion Models
(
Poster
)
link
Recent advancements in text-to-image generation with diffusion models have yielded remarkable results in synthesizing highly realistic and diverse images. However, these models still encounter difficulties when it comes to generating images based on prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with a prompt describing the image to be generated, the LLM outputs a scene layout in the form of captioned bounding boxes along with a background caption. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that require both language and spatial reasoning. Furthermore, our method naturally supports dialog-based scene specification and is able to handle prompts in languages that are not well-supported by the underlying diffusion model. |
Long (Tony) Lian · Boyi Li · Adam Yala · Trevor Darrell 🔗 |
-
|
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
(
Poster
)
Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction accuracy of event sequence models. We design a modeling and prediction framework where a large language model performs abductive reasoning to assist an event sequence model: the event model proposes predictions on future events given the past; instructed by a few expert-annotated demonstrations, the language model learns to suggest possible causes for each proposal; a search module finds out the previous events that match the causes; a scoring function learns to examine whether the retrieved events could actually cause the proposal. Through extensive experiments on two challenging real-world datasets (Amazon Review and GDELT), we demonstrate that our framework---thanks to the reasoning ability of language models---could significantly outperform the state-of-the-art event sequence models. |
Xiaoming Shi · Siqiao Xue · Kangrui Wang · Fan Zhou · James Zhang · Jun Zhou · Chenhao Tan · Hongyuan Mei 🔗 |
-
|
What’s left can’t be right - The remaining positional incompetence of contrastive vision-language models
(
Poster
)
Contrastive vision-language models like CLIPhave been found to lack spatial understandingcapabilites. In this paper we discuss the possi-ble causes of this phenomena by analysing bothdatasets and embedding space. By focusing onsimple left-right positional relations, we show thatthis behaviour is entirely predictable, even withlarge-scale datasets, demonstrate that these rela-tions can be taught using synthetic data and showthat this approach can generalise well to naturalimages - improving the performance on left-rightrelations on Visual Genome Relationships. Thecode for all our experiments and analysis can befound on GitHub. |
Nils Hoehing · Ellen Rushe · Anthony Ventresque 🔗 |
-
|
Deep Neuro-Symbolic Weight Learning in Neural Probabilistic Soft Logic
(
Poster
)
In this work, we extend the expressive power of the neuro-symbolic framework Neural Probabilistic Soft Logic (NeuPSL). We introduce NeuPSL Deep Weights, which uses deep neural network predictions to parameterize the weights of symbolic rules. To demonstrate NeuPSL Deep Weights applicability, we introduce a unique synthetic dataset specifically designed to challenge learning methods that do not utilize both data-driven learning (System 1) and deliberate symbolic reasoning (System 2). Across variations of this synthetic dataset, we show how NeuPSL Deep Weights outperforms traditional PSL rule weights and existing joint System 1 and System 2 neural methods, such as graph neural networks. |
Connor Pryor · Charles Dickens · Lise Getoor 🔗 |
-
|
Equivariance Is Not All You Need: Characterizing the Utility of Equivariant Graph Neural Networks for Particle Physics Tasks
(
Poster
)
Incorporating inductive biases into ML models is an active area of ML research, especially when ML models are applied to data about the physical world. Equivariant Graph Neural Networks (GNNs) have recently become a popular method for learning from physics data because they directly incorporate the symmetries of the underlying physical system. Drawing from the relevant literature around group equivariant networks, this paper presents a comprehensive evaluation of the proposed benefits of equivariant GNNs by using real-world particle physics reconstruction tasks as an evaluation test-bed. We demonstrate that many of the theoretical benefits generally associated with equivariant networks may not hold for realistic systems and introduce compelling directions for future research that will benefit both the scientific theory of ML and physics applications. |
Savannah Thais · Daniel Murnane 🔗 |
-
|
Revealing the Intrinsic Ability of Generative Language Models in Relation Prediction
(
Poster
)
Traditional paradigms for the relation prediction usually require the concatenation of a pre-trained architecture with a specialized relation predictor, and further fine-tuning to adapt to the new domain. Recently, large generative language models (GLMs) have exhibited powerful capabilities in text generation across general domains without the need of further fine-tuning. Then a natural question arises: can we develop an accurate relation predictor using pre-trained GLMs without further fine-tuning? To answer this question, we first establish a data pipeline to obtain four relation prediction datasets from text generation datasets and that GLMs are further pre-trained on the same domain. Second, we propose a closed-form relation predictor, which do not require additional fine-tuning. Finally, we conduct experiments using BART and T5 models of different sizes to compare our method with the baseline. We observe significant improvements in performance. For example, on the Delve (1K) dataset and with the BART-large model, our method achieves an FPR of 5.30% at 95% TPR, whereas the baseline yields approximately 40% FPR. |
Qi Li · Lyuwen Wu · Luoyi Fu · Xinbing Wang · SHIYU LIANG 🔗 |
-
|
Augmenting the Knowledge to Large Model from Federated Small Models
(
Poster
)
Personalized Federated Learning (pFL) is a type of Federated Learning (FL) that divides a model into personalized and shared parts to address heterogeneity problems in distributed data environments. The pFL can optimize each distributed data using a personalized part. However, since the model of participating clients in pFL is usually shallow and narrow, it can limit the potential for performance improvement like System \RomanNumeralCaps 1 of dual system theory. In this paper, we aim to address the performance constraints caused by the limited capacity of clients while transferring knowledge in the opposite direction of conventional knowledge distillation methods. The proposed approach {\emph{Knowledge Augmentation} transfers the knowledge of the clients’ small models to a large model, which operates like System \RomanNumeralCaps 2 in dual system theory, in the central server. To guarantee client privacy, the large model uses the output of the personalized section as input data rather than sharing local data. |
Miru Kim · Minhae Kwon 🔗 |
-
|
Explicit Planning Helps Language Models in Logical Reasoning
(
Poster
)
Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose a novel system that uses language models to perform multi-step logical reasoning. Our system incorporates explicit planning into its inference procedure, thus able to make more informed reasoning decisions at each step by looking ahead into their future effects. Moreover, we propose a training strategy that safeguards the planning process from being led astray by spurious features. Our full system significantly outperforms other competing methods on multiple standard datasets. When using a T5 model as its core component, our system performs competitively compared to GPT-3 despite having only about 1B parameters (i.e., 175 times smaller than GPT-3). When using GPT-3.5, it significantly outperforms chain-of-thought prompting on the challenging PrOntoQA dataset. We have conducted extensive empirical studies to demonstrate that explicit planning plays a crucial role in the system's performance. |
Hongyu Zhao · Kangrui Wang · Mo Yu · Hongyuan Mei 🔗 |
-
|
Evaluating the Casual Reasoning Abilities of Large Language Models
(
Poster
)
link
Large language models have developed at a breathtaking pace, quickly advancing in their ability to generate, summarize, and work with long and short-form text. As these advances become further integrated into society, however, it becomes necessary to question and evaluate how well these models are actually capable of true reasoning, rather than simply mimicking their large training corpora. We argue that eliciting reasoning from language models is the new "explainability method" and introduce CReDETS, a novel and first-of-its-kind causal reasoning dataset with annotated hand-written explanations. We benchmark the latest and most powerful generation of transformer neural network models GPT-3, GPT-3.5 (chatGPT), and GPT-4 and discuss their accuracy, coherence, and consistency. Our staggering results show that even the most recent LLMs have stark weaknesses in reasoning ability that must be ameliorated before they can be integrated with public-facing applications worldwide. |
Isha Puri · Hima Lakkaraju 🔗 |