Poster

Position: `AI Alignment' Encompasses Competing Technical Priorities

Tushita Jha ⋅ Rory Švarc ⋅ Mateusz Bagiński

Abstract

The ML literature contains many distinct concepts falling under the heading of ‘AI alignment’. After noting three concepts of AI alignment and situating these ideals in the context of their corresponding research programs, we claim that realistic interventions may promote ‘AI alignment’ under one conception while being actively counterproductive from the perspective of others. We suggest that tensions between alignment ideals emerge due to differences in background threat-models, alongside differences in both methodological and normative orientations. In light of our analysis, researchers taking themselves to produce research aimed to further the goal of ‘AI alignment’ should do three things. First, they should distinguish between ‘AI alignment’ as a high-level ideal and the specific ‘alignment proxies’ used in empirical research. Second, they should use more granular concepts to identify the source in addition to the nature of possible AI harms/benefits. Third, they explicitly specify the non-technical background commitments motivating specific conceptions of ‘AI alignment’.