Workshop
Localized Learning: Decentralized Model Updates via Non-Global Objectives
David I. Inouye · Mengye Ren · Mateusz Malinowski · Michael Eickenberg · Gao Huang · Eugene Belilovsky
Meeting Room 310
Despite being widely used, global end-to-end learning has several key limitations. It requires centralized computation, making it feasible only on a single device or a carefully synchronized cluster. This restricts its use on unreliable or resource-constrained devices, such as commodity hardware clusters or edge computing networks. As the model size increases, synchronized training across devices will impact all types of parallelism. Global learning also requires a large memory footprint, which is costly and limits the learning capability of single devices. Moreover, end-to-end learning updates have high latency, which may prevent their use in real-time applications such as learning on streaming video. Finally, global backpropagation is thought to be biologically implausible, as biological synapses update in a local and asynchronous manner. To overcome these limitations, this workshop will delve into the fundamentals of localized learning, which is broadly defined as any training method that updates model parts through non-global objectives.
Schedule
Sat 12:00 p.m. - 12:05 p.m.
|
Opening Remarks
(
Opening Remarks
)
SlidesLive Video |
David I. Inouye 🔗 |
Sat 12:05 p.m. - 12:50 p.m.
|
Geoffrey Hinton: Can the brain do weight-sharing?
(
Keynote
)
SlidesLive Video Many different forms of weight-sharing have proved to be important in artificial neural networks. Backpropagation shares weights between the forward and backward passes. Convolutional nets share weights between spatial filters at different locations. Recurrent nets share weights over time. Self-supervised contrastive learning shares weights between different image patches. Transformers share weights between word fragments at different positions within a document. Most significantly, multiple copies of a model running on different hardware share weights to allow very large datasets to be processed in parallel. I will discuss attempts to achieve the effect of weight-sharing in biologically plausible models. These attempts typically try to achieve weight-sharing by re-using the same weights at different times and I will conclude that the main feature that makes digital intelligence superior to biological intelligence is the ability to have many copies of exactly the same model running on different hardware -- something that biology cannot do. |
🔗 |
Sat 12:50 p.m. - 1:15 p.m.
|
Emergent learning that outperforms global objectives
(
Invited Talk
)
SlidesLive Video Learning algorithms are often top-down and prescriptive, directly descending the gradient of a prescribed loss function. This includes backpropagation, its more localized approximations such as Equilibrium Propagation or Predictive Coding, as well as local self-supervised objectives, as in the Forward-Forward algorithm. Other algorithms could instead be characterized as emergent or descriptive, where network-wide function is learned from the bottom up, from mere descriptions of processes in synapses (i.e. connections) and neuronal units. This latter type of learning, which results e.g. from so-called Hebbian plastic synapses, spike timing-dependent plasticity (STDP), and short-term plasticity, fully satisfies the constraints of biological and neuromorphic circuitry, because neuroscience textbook mechanisms local to each synapse are the entire seeding premise. However, such emergent learning rules have struggled to be useful in difficult tasks for modern machine learning standards. In contrast, our recent work shows that learning resulting from plasticity is applicable to previously unattainable problem settings and can even outperform global loss-driven networks under certain conditions. Specifically, the talk will focus on short-term STDP, short-term plasticity neurons (STPN), SoftHebb, i.e. our version of Hebbian learning in circuits with soft competition, and on their advantages in sequence modelling, adversarial robustness, learning speed, and unsupervised deep learning. The picture will be completed with a mention of our related works on neuromorphic nanodevices that emulate the biophysics of plastic synapses through the physics of analog electronics and photonics. |
Timoleon (Timos) Moraitis 🔗 |
Sat 1:15 p.m. - 2:00 p.m.
|
Morning Poster Session
(
Poster Session
)
|
🔗 |
Sat 2:00 p.m. - 2:25 p.m.
|
Local learning in recurrent networks modelling motor cortex
(
Invited Talk
)
Animals use afferent feedback to rapidly correct ongoing movements in the presence of a perturbation. Repeated exposure to a predictable perturbation leads to behavioural adaptation that counteracts its effects. Primary motor cortex (M1) is intimately involved in both processes, integrating inputs from various sensorimotor brain regions to update the motor output. Here, we investigate whether feedback-based motor control and motor adaptation may share a common implementation in M1 circuits. We trained a recurrent neural network to control its own output through an error feedback signal, which allowed it to recover rapidly from external perturbations. Implementing a biologically plausible plasticity rule based on this same feedback signal also enabled the network to learn to counteract persistent perturbations through a trial-by-trial process, in a manner that reproduced several key aspects of human adaptation. Moreover, the resultant network activity changes were also present in neural population recordings from monkey M1. Online movement correction and longer-term motor adaptation may thus share a common implementation in neural circuits. |
Claudia Clopath 🔗 |
Sat 2:25 p.m. - 2:50 p.m.
|
Local Learning for Higher Parallelism
(
Invited Talk
)
SlidesLive Video |
Edouard Oyallon 🔗 |
Sat 2:50 p.m. - 3:00 p.m.
|
Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons
(
Best Contributed Paper
)
link
SlidesLive Video Activity difference based learning algorithms---such as contrastive Hebbian learning and equilibrium propagation---have been proposed as biologically plausible alternatives to error back-propagation. However, on traditional digital chips these algorithms suffer from having to solve a costly inference problem twice, making these approaches more than two orders of magnitude slower than back-propagation. In the analog realm equilibrium propagation may be promising for fast and energy efficient learning, but states still need to be inferred and stored twice. Inspired by lifted neural networks and compartmental neuron models we propose a simple energy based compartmental neuron model, termed dual propagation, in which each neuron is a dyad with two intrinsic states. At inference time these intrinsic states encode the error/activity duality through their difference and their mean respectively. The advantage of this method is that only a single inference phase is needed and that inference can be solved in layerwise closed-form. Experimentally we show on common computer vision datasets, including Imagenet32x32, that dual propagation performs equivalently to back-propagation both in terms of accuracy and runtime. |
Rasmus Kjær Høier 🔗 |
Sat 3:00 p.m. - 4:30 p.m.
|
Lunch Break and Informal Poster Session
(
Break
)
|
🔗 |
Sat 4:30 p.m. - 5:15 p.m.
|
Irina Rish: Backpropagation Alternatives and Scalable AI
(
Keynote
)
SlidesLive Video |
Irina Rish 🔗 |
Sat 5:15 p.m. - 5:40 p.m.
|
Training Spiking Neural Networks with Local Tandem Learning
(
Invited Talk
)
SlidesLive Video Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient over their predecessors. However, there is a lack of an efficient and generalized training method for deep SNNs, especially for deployment on analog computing substrates. In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL). The LTL rule follows the teacher- student learning approach by mimicking the intermediate feature representations of a pre-trained ANN. By decoupling the learning of network layers and leveraging highly informative supervisor signals, we demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity. Our experimental results have also shown that the SNNs thus trained can achieve comparable accuracies to their teacher ANNs on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. Moreover, the proposed LTL rule is hardware friendly. It can be easily implemented on-chip to perform fast parameter calibration and provide robustness against the notorious device non-ideality issues. It, therefore, opens up a myriad of opportunities for training and deployment of SNN on ultra-low-power mixed-signal neuromorphic computing chips. |
Qu Yang 🔗 |
Sat 5:40 p.m. - 6:05 p.m.
|
Lessons of Local Learning in Training LLMs
(
Invited Talk
)
SlidesLive Video This presentation summarizes the main idea from interlocking backpropagation paper and discusses the intriguing intersection of large language models and local learning, investigating the unique challenges and opportunities. Additionally we'll touch on the state-of-the-art training paradigm of LLM and local learning's potential. |
Stephen Gou 🔗 |
Sat 6:05 p.m. - 6:15 p.m.
|
Understanding Predictive Coding as an Adaptive Trust-Region Method
(
Best Contributed Paper
)
link
SlidesLive Video Predictive coding (PC) is a brain-inspired local learning algorithm that has recently been suggested to provide advantages over backpropagation (BP) in biologically relevant scenarios. While theoretical work has mainly focused on showing how PC can approximate BP in various limits, the putative benefits of "natural'" PC are less understood. Here we develop a theory of PC as an adaptive trust-region (TR) algorithm that uses second-order information. We show that the learning dynamics of PC can be interpreted as interpolating between BP's loss gradient direction and a TR direction found by the PC inference dynamics. Our theory suggests that PC should escape saddle points faster than BP, a prediction which we prove in a shallow linear model and support with experiments on deeper networks. This work lays a foundation for understanding PC in deep and wide networks. |
Francesco Innocenti 🔗 |
Sat 6:15 p.m. - 6:30 p.m.
|
Short Break and Informal Poster Session
(
Break
)
|
🔗 |
Sat 6:30 p.m. - 7:15 p.m.
|
Panel - Geoffrey Hinton, Irina Rish, Edouard Oyallon, Timoleon Moraitis - Localized Learning: Past, Present and Future
(
Discussion Panel
)
SlidesLive Video |
Mengye Ren 🔗 |
Sat 7:15 p.m. - 8:00 p.m.
|
Afternoon Poster Session
(
Poster Session
)
|
🔗 |
-
|
Decentralized Plasticity in Reservoir Dynamical Networks for Pervasive Environments
(
Poster
)
link
We propose a framework for localized learning with Reservoir Computing dynamical neural systems in pervasive environments, where data is distributed and dynamic. We use biologically plausible intrinsic plasticity (IP) learning to optimize the non-linearity of system dynamics based on local objectives, and extend it to account for data uncertainty. We develop two algorithms for federated and continual learning, FedIP and FedCLIP, which respectively extend IP to client-server topologies and to prevent catastrophic forgetting in streaming data scenarios. Results on real-world datasets from human monitoring show that our approach improves performance and robustness, while preserving privacy and efficiency. |
Valerio De Caro · Davide Bacciu · Claudio Gallicchio 🔗 |
-
|
Localizing Partial Model for Personalized Federated Learning
(
Poster
)
link
Federated learning trains models across multiple devices using decentralized data, without exchanging actual data. However, standard federated learning approaches face limitations in terms of high communication and computational costs, personalization, and vulnerability to attacks. To address these challenges, we propose a novel partial sharing algorithm. The partial sharing algorithm trains local models by dividing them into personalized and shared parts. This allows clients to generate personalized models that are optimized for individual client data while exposing only the shared portion of the local model to potential attacks. Through experiments, we evaluate the personalization and robustness of our proposed algorithm and demonstrate its superior performance compared to existing approaches. |
Heewon Park · Miru Kim · Minhae Kwon 🔗 |
-
|
Learning Recurrent Models with Temporally Local Rules
(
Poster
)
link
Fitting generative models to sequential data typically involves two recursive computations through time, one forward and one backward.The latter could be a computation of the loss gradient (as in backpropagation through time), or an inference algorithm (as in the RTS/Kalman smoother).The backward pass in particular is computationally expensive (since it is inherently serial and cannot exploit GPUs), and difficult to map onto biological processes.Work-arounds have been proposed; here we explore a very different one:\ requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities.We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass. |
Azwar Abdulsalam · Joseph Makin 🔗 |
-
|
Metric Compatible Training for Online Backfilling in Large-Scale Retrieval
(
Poster
)
link
In large-scale retrieval systems, model upgrades require backfilling, which is the process of re-extracting all gallery embeddings from upgraded models. However, it inevitably spends a prohibitively large amount of computational cost and even entails the downtime of the service. To alleviate this bottleneck, backward-compatible learning is proposed to learn feature space of new model while being compatible with those of old model. Although it sidesteps this challenge by tackling query-side representations, this leads to suboptimal solutions in principle because gallery embeddings cannot benefit from model upgrades. We address this dilemma by introducing an online backfilling algorithm, which enables us to achieve a progressive performance improvement during the backfilling process without sacrificing the full performance of the new model after the completion of backfilling. To this end, we first show that a simple distance rank merge is a reasonable option for online backfilling. Then, we incorporate a reverse transformation module and metric-compatible contrastive learning, resulting in desirable merge results during backfilling with no extra overhead. Extensive experiments show the benefit of our framework on four standard benchmarks in various settings. |
Seonguk Seo · Mustafa Gokhan Uzunbas · Bohyung Han · Xuefei Cao · Joena Zhang · Taipeng Tian · Ser Nam Lim 🔗 |
-
|
Towards Modular Machine Learning Pipelines
(
Poster
)
link
Pipelines of Machine Learning (ML) components are a popular and effective approach to divide and conquer many business-critical problems. A pipeline architecture implies a specific division of the overall problem, however current ML training approaches do not enforce this implied division. Consequently ML components can become coupled to one another after they are trained, which causes insidious effects. For instance, even when one coupled ML component in a pipeline is improved in isolation, the end-to-end pipeline performancecan degrade. In this paper, we develop a conceptual framework to study ML coupling in pipelines and design new modularity regularizersthat can eliminate coupling during ML training. We show that the resulting ML pipelines become modular (i.e., their components can be trained independently of one another) and discuss the tradeoffs of our approach versus existing approaches to pipeline optimization. |
Aditya Modi · JIVAT NEET KAUR · Maggie Makar · Pavan Mallapragada · Amit Sharma · Emre Kiciman · Adith Swaminathan 🔗 |
-
|
Lightweight Learner for Shared Knowledge Lifelong Learning
(
Poster
)
link
In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where thegoal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. If all agents can communicate with all others, eventually all agents become identical and can solve all tasks. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. |
Yunhao Ge · Yuecheng Li · Di Wu · Ao Xu · Adam Jones · Amanda Rios · Iordanis Fostiropoulos · shixian wen · Po-Hsuan Huang · Zachary W. Murdock · Gozde Sahin · Shuo Ni · Kiran Lekkala · Sumedh Sontakke · Laurent Itti
|
-
|
Internet Learning: Preliminary Steps Towards Highly Fault-Tolerant Learning on Device Networks
(
Poster
)
link
Distributed machine learning has grown in popularity due to data privacy, edge computing, and large model training. A subset of this class, Vertical Federated learning (VFL), aims to provide privacy guarantees in the scenario where each party shares the same sample space but only holds a subset of features. While VFL tackles key privacy challenges, it often assumes perfect hardware or communication (and may perform poorly under other conditions). This assumption hinders the broad deployment of VFL, particularly on edge devices, which may need to conserve power and may connect or disconnect at any time. To address this gap, we define the paradigm of Internet Learning (IL), which defines a context, of which VFL is a subset, and puts good performance under extreme dynamic condition of data entities as the primary goal. As IL represents a fundamentally different paradigm, it will likely require novel learning algorithms beyond end-to-end backpropagation, which requires careful synchronization across devices. In light of this, we provide some potential approaches for the IL context and present preliminary analysis and experimental results on a toy problem. |
Surojit Ganguli · Avi Amalanshu · Amritanshu Ranjan · David I. Inouye 🔗 |
-
|
Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation
(
Poster
)
link
Conventional distributed Graph Neural Network (GNN) training relies either on inter-instance communication or periodic fallback to centralized training, both of which create overhead and constrain their scalability. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally-available parts of the training graph, and (2) synchronize these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31x speedup compared to the fastest baseline, despite using less training data. |
Jiong Zhu · Aishwarya Naresh Reganti · Edward Huang · Charles Dickens · Nikhil Rao · Karthik Subbian · Danai Koutra 🔗 |
-
|
Co-Dream: Collaborative data synthesis with decentralized models
(
Poster
)
link
We present a framework for distributed optimization that addresses the decentralized and siloed nature of data in the real world. Existing works in Federated Learning address it by learning a centralized model from decentralized data. Our framework \textit{Co-Dream} instead focuses on learning the representation of data itself. By starting with random data and jointly synthesizing samples from distributed clients, we aim to create proxies that represent the global data distribution. Importantly, this collaborative synthesis is achieved using only local models, ensuring privacy comparable to sharing the model itself. The collaboration among clients is facilitated through federated optimization in the data space, leveraging shared input gradients based on local loss. This collaborative data synthesis offers various benefits over collaborative model learning, including lower dimensionality, parameter-independent communication, and adaptive optimization. We empirically validate the effectiveness of our framework and compare its performance with traditional federated learning approaches through benchmarking experiments. |
Abhishek Singh · Gauri Gupta · Charles Lu · Yogesh Koirala · Sheshank Shankar · Mohammed Ehab · Ramesh Raskar 🔗 |
-
|
Energy-Based Learning Algorithms: A Comparative Study
(
Poster
)
link
This work compares seven energy-based learning algorithms, namely contrastive learning (CL), equilibrium propagation (EP), coupled learning (CpL) and different variants of these algorithms depending on the type of perturbation used. The algorithms are compared on deep convolutional Hopfield networks (DCHNs) and evaluated on five vision tasks (MNIST, Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100). The results reveal that while all algorithms perform similarly on the simplest task (MNIST), differences in performance become evident as task complexity increases. Perhaps surprisingly, we find that negative perturbations yield significantly better results than positive ones, and the centered variant of EP emerges as the top-performing algorithm. Lastly, we report new state-of-the-art DCHN simulations on all five datasets (both in terms of speed and accuracy), achieving a 13.5x speedup compared to Laborieux et al. (2021). |
Benjamin Scellier · Maxence Ernoult · Jack Kendall · Suhas Kumar 🔗 |
-
|
Associative memory and deep learning with Hebbian synaptic and structural plasticity
(
Poster
)
link
The brain achieves complex information processing and cognitive functions leveraging synaptic learning mechanisms that are local, asynchronous, online and Hebbian in nature. Our work here investigates a neural network model with localized Hebbian plasticity that can perform associative memory and multilayer representation learning. This functionality is achieved with a brain-like modular hybrid architecture combining feedforward and recurrent processing pathways. We evaluate the model on the MNIST and F-MNIST datasets and propose that several aspects of the model are attractive for machine learning and brain-like neuromorphic hardware design. |
Naresh Balaji Ravichandran · Anders Lansner · Pawel Herman 🔗 |
-
|
Dataset Pruning Using Early Exit Networks
(
Poster
)
link
We present EEPrune, a novel dataset pruning algorithm that leverages early exit networks during training. EEPrune utilizes the innate ability of early exit networks to assess the difficulty of individual samples and applies different criteria to decide whether to prune them. Specifically, for a training sample to be discarded, the confidence level of the model at the early exit should be above a certain threshold, along with a correct classification at both the early exit and final layers.We describe several other variants of our EEPrune algorithm. Extensive experiments on CIFAR-10, CIFAR-100 and Tiny Imagenet datasets demonstrate that EEPrune and its variations consistently outperform other dataset pruning methods. |
Alperen Gormez · Erdem Koyuncu 🔗 |
-
|
Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons
(
Poster
)
link
Activity difference based learning algorithms---such as contrastive Hebbian learning and equilibrium propagation---have been proposed as biologically plausible alternatives to error back-propagation. However, on traditional digital chips these algorithms suffer from having to solve a costly inference problem twice, making these approaches more than two orders of magnitude slower than back-propagation. In the analog realm equilibrium propagation may be promising for fast and energy efficient learning, but states still need to be inferred and stored twice. Inspired by lifted neural networks and compartmental neuron models we propose a simple energy based compartmental neuron model, termed dual propagation, in which each neuron is a dyad with two intrinsic states. At inference time these intrinsic states encode the error/activity duality through their difference and their mean respectively. The advantage of this method is that only a single inference phase is needed and that inference can be solved in layerwise closed-form. Experimentally we show on common computer vision datasets, including Imagenet32x32, that dual propagation performs equivalently to back-propagation both in terms of accuracy and runtime. |
Rasmus Kjær Høier · D. Staudt · Christopher Zach 🔗 |
-
|
MOLE: MOdular Learning FramEwork via Mutual Information Maximization
(
Poster
)
link
This paper is to introduce an asynchronous and local learning framework for neural networks, named Modular Learning Framework (MOLE). This framework modularizes neural networks by layers, defines the training objective via mutual information for each module, and sequentially trains each module by mutual information maximization. MOLE makes the training become local optimization with gradient-isolated across modules, and this scheme is more biologically plausible than BP. We run experiments on vector-, grid- and graph-type data. In particular, this framework is capable of solving both graph- and node-level tasks for graph-type data. Therefore, MOLE has been experimentally proven to be universally applicable to different types of data. |
Tianchao Li · Yulong Pei 🔗 |
-
|
Preventing Dimensional Collapse in Contrastive Local Learning with Subsampling
(
Poster
)
link
This paper presents an investigation of the challenges of training Deep Neural Networks (DNNs) via self-supervised objectives, using local learning as a parallelizable alternative to traditional backpropagation. In our approach, DNN are segmented into distinct blocks, each updated independently via gradients provided by small local auxiliary Neural Networks (NNs). Despite the evident computational benefits, extensive splits often result in performance degradation. Through analysis of a synthetic example, we identify a layer-wise dimensional collapse as a major factor behind such performance losses. To counter this, we propose a novel and straightforward sampling strategy based on blockwise feature-similarity, explicitly designed to evade such dimensional collapse. |
Louis Fournier · Adeetya Patel · Michael Eickenberg · Edouard Oyallon · Eugene Belilovsky 🔗 |
-
|
The Local Inconsistency Resolution Algorithm
(
Poster
)
link
We present a generic algorithm for learning and approximate inference across a broad class of statistical models, that unifies many approaches in the literature. Our algorithm, called local inconsistency resolution (LIR), has an intuitive epistemic interpretation. It is based on the theory of probabilistic dependency graphs (PDGs), an expressive class of graphical models rooted in information theory, which can capture inconsistent beliefs. |
Oliver Richardson 🔗 |
-
|
Gradient Scaling on Deep Spiking Neural Networks with Spike-Dependent Local Information
(
Poster
)
link
Deep spiking neural networks (SNNs) are promising neural networks for their model capacity from deep neural network architecture and energy efficiency from SNNs' operations. To train deep SNNs, recently, spatio-temporal backpropagation (STBP) with surrogate gradient was proposed. Although deep SNNs have been successfully trained with STBP, they cannot fully utilize spike information. In this work, we proposed gradient scaling with local spike information, which is the relation between pre- and post-synaptic spikes. Considering the causality between spikes, we could enhance the training performance of deep SNNs. According to our experiments, we could achieve higher accuracy with lower spikes by adopting the gradient scaling on image classification tasks, such as CIFAR10 and CIFAR100. |
Seongsik Park · Jeonghee Jo · Jongkil Park · Yeonjoo Jeong · Jaewook Kim · Suyoun Lee · Joon Young Kwak · Inho Kim · Jong-keuk Park · Kyeong Lee · Hwang Weon · Hyun Jae Jang
|
-
|
Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation
(
Poster
)
link
We propose a new model, \emph{independent linear Markov game}, for multi-agent reinforcement learning with a large state space and a large number of agents.This is a class of Markov games with \emph{independent} linear function approximation, where each agent has its own function approximation for the state-action value functions that are {\it marginalized} by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with \emph{each agent's own function class complexity}, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle {\it non-stationarity} incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games, with applications in learning in congestion games.In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in \cite{daskalakis2022complexity}, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters. Furthermore, we design the first provably efficient algorithm for learning Markov CE that breaks the curse of multiagents.
|
Qiwen Cui · Kaiqing Zhang · Simon Du 🔗 |
-
|
Understanding Predictive Coding as a Second-Order Trust-Region Method
(
Poster
)
link
Predictive coding (PC) is a brain-inspired local learning algorithm that has recently been suggested to provide advantages over backpropagation (BP) in biologically relevant scenarios. While theoretical work has mainly focused on the conditions under which PC can approximate or equal BP, how PC in its "natural regime" differs from BP is less understood. Here we develop a theory of PC as an adaptive trust-region (TR) method that uses second-order information. We show that the weight update of PC can be interpreted as shifting BP's loss gradient towards a TR direction found by the PC inference dynamics. Our theory suggests that PC should escape saddle points faster than BP, a prediction which we prove in a shallow linear model and support with experiments on deep networks. This work lays a theoretical foundation for understanding other suggested benefits of PC. |
Francesco Innocenti · Ryan Singh · Christopher Buckley 🔗 |
-
|
Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training
(
Poster
)
link
While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations. |
Yanis Bahroun · Shagesh Sridharan · Atithi Acharya · Dmitri Chklovskii · Anirvan Sengupta 🔗 |
-
|
Beyond weight plasticity: Local learning with propagation delays in spiking neural networks
(
Poster
)
link
We propose a novel local learning rule for spiking neural networks in which spike propagation times undergo activity-dependent plasticity. Our plasticity rule aligns pre-synaptic spike times to produce a stronger and more rapid response. Inputs are encoded by latency coding and outputs decoded by matching similar patterns of output spiking activity. We demonstrate the use of this method in a three-layer feedfoward network with inputs from a database of handwritten digits. Networks consistently showed improved classification accuracy after training, and training with this method also allowed networks to generalize to an input class unseen during training. Our proposed method takes advantage of the ability of spiking neurons to support many different time-locked sequences of spikes, each of which can be activated by different input activations. The proof-of-concept shown here demonstrates the great potential for local delay learning to expand the memory capacity and generalizability of spiking neural networks. |
Jørgen Farner · Ola Ramstad · Stefano Nichele · Kristine Heiney 🔗 |
-
|
Auto-Aligning Multiagent Incentives with Global Objectives
(
Poster
)
link
The general ability to achieve a singular task with a set of decentralized, intelligent agents is an important goal in multiagent research. The complex interaction between individual agents' incentives makes designing their objectives such that the resulting multiagent system aligns with a desired global goal particularly challenging. In this work, instead of considering the problem of designing suitable incentives from scratch, we assume a multiagent system with given preset incentives and consider $\textit{automatically modifying}$ these incentives online to achieve a new goal. This reduces the search space over possible individual incentives and takes advantage of theeffort instilled by the previous system designer. We demonstrate the promise as well as the limitations of re-purposing multiagent systems in this way, both theoretically and empirically, on a variety of domains. Surprisingly, we show that training a diverse multiagent system to align with a modified global objective ($g \rightarrow g')$ can, in at least one case, lead to better generalization performance in unseen test scenarios, when evaluated on the original objective ($g$).
|
Minae Kwon · John Agapiou · Edgar Duéñez-Guzmán · Romuald Elie · Georgios Piliouras · Kalesha Bullard · Ian Gemp 🔗 |
-
|
Layer-Wise Feedback Alignment is Conserved in Deep Neural Networks
(
Poster
)
link
In the quest to enhance the efficiency and bio-plausibility of training deep neural networks, Feedback Alignment (FA), which replaces the backward pass weights with random matrices in the training process, has emerged as an alternative to traditional backpropagation. While the appeal of FA lies in its circumvention of computational challenges and its plausible biological alignment, the theoretical understanding of this learning rule remains partial. This paper uncovers a set of conservation laws underpinning the learning dynamics of FA, revealing intriguing parallels between FA and Gradient Descent (GD). Our analysis reveals that FA harbors implicit biases akin to those exhibited by GD, challenging the prevailing narrative that these learning algorithms are fundamentally different. Moreover, we demonstrate that these conservation laws elucidate sufficient conditions for layer-wise alignment with feedback matrices in ReLU networks. We further show that this implies over-parameterized two-layer linear networks trained with FA converge to minimum-norm solutions. The implications of our findings offer avenues for developing more efficient and biologically plausible alternatives to backpropagation through an understanding of the principles governing learning dynamics in deep networks. |
Zach Robertson · Sanmi Koyejo 🔗 |