Advanced Machine Intelligence: Orthogonality, Instrumental Convergence and the Dangers of Value Misalignment

Edward Potts, 2021, Stage 3

The object of this dissertation is artificial intelligence (AI), and in particular it concerns AI risk or AI safety. I argue for the veracity of Bostrom’s orthogonality thesis (2012) – contextualised with reference to Hume’s (2007) is-ought distinction – and instrumental convergence thesis (developed initially by Omohundro (2008) in terms of the “Basic AI Drives”). In combination, what these theses show is that the default outcome of advanced AI (AGI and ASI) is existential catastrophe, and thereby the importance of ensuring that the value systems of advanced artificial agents are human compatible. I consider two main approaches to the value alignment problem – direct specification and value learning – and point out the flaws in each. While this project does not offer its own approach value alignment, the central concern of AI safety, it does emphasise the necessity for AI research to undergo a perspectival shift and focus on the search for one. The AI community should, that is, be concerned foremost with AI safety rather than AI capability.

Leave a Reply Cancel reply