Poster
in
Workshop: Next Generation of AI Safety
Neural Interactive Proofs
Lewis Hammond · Sam Adam-Day
Keywords: [ Neural Networks ] [ Safety ] [ Interactive Proofs ] [ multi-agent reinforcement learning ] [ game theory ]
We consider the problem of how a trusted, but computationally bounded agent (a ‘verifier’) can learn to interact with one or more powerful but untrusted agents (‘provers’) in order to solve a given task without being misled. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games (Anil et al., 2021), which generalises previously proposed interaction ‘protocols’. We then describe several new protocols for generating neural interactive proofs, and provide the first comprehensive theoretical comparison of both new and existing approaches. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.