Expo Demonstration
Knobs of the Mind: Dopamine, Serotonin, and a Maze-Running Rover
Jin Tan Ruan · Dario Fumarola · Jess Torres
West Exhibition Hall A-B1
Modern deep-RL policies crack under distribution shifts because every new environment demands another slog of back-prop. We flip the script: first train once, then lock every weight and steer behaviour with three neuromodulatory “mood knobs.”
Dopamine-like reward gain fires up or damps down the urge to chase pay-offs.
Serotonin 5-HT2-like exploration gain widens or narrows the agent’s repertoire.
Serotonin 5-HT1-like risk penalty injects real-time caution when danger spikes.
These scalars mimic the way real neuromodulators gate cortical circuits: they change a neuron’s responsiveness in milliseconds without rewriting the synapse. That gives us a clean separation between slow structural learning (the frozen network) and fast functional adaptation (the gains). Shifting the knobs costs almost nothing computationally yet lets one policy jump across grid mazes, procedurally generated dungeons, and even onto a Jetson-Nano quadruped robot dog - all while dodging the usual “catastrophic forgetting” trap.
The takeaway: treating reinforcement learning agents like brains - plastic weights plus fluid neuro-chemistry - delivers instant, reversible behavioural tuning and makes real-world deployment far less brittle.
Live content is unavailable. Log in and register to view live content