AI Alignment ⚐
up:: AI ⚐
What are AI
There are two kinds of AI: Narrow AI and Artificial General Intelligence (AGI)
- Narrow AI powerfully sort patterns in limited domains
- AGI: A system that can improve itself that is not limited to one domain. AGI is a system that can improve itself and is virtually unbound by a domain
How do machines learn
Machine learning uses data to answer questions. An agent gets trained (uses data) to make predictions (answers questions).
-
In reinforcement learning, agents receive rewards based on their behaviour
-
Additional ways of learning
- Symbolic Artificial Intelligence
- Deep learning
- Bayesian networks
- Evolutionary algorithms
Problems: AI Alignment Theory
The real risk with AI isn’t malice but competence. A super-intelligent AI will be extremely good at accomplishing its goals, and if those goals aren’t aligned with ours we’re in trouble. – Stephen Hawking
Utility Functions
- A utility function is the thing an agent want to maximise Additional links
- Instrumental Goals are behaviours that help the agent to maximise their utility function
Problems with AI
- Corregibility: A common instrumental function is to prevent change
- Reward Hacking TK: (the boat game)
- Specification Problems TK
- it’s more likely that if we don’t do anything, ai doesn’t do what we want it to do
- negative side effects, safe exploration
AI and human society
- AI and the future of work TK
- LL070 just a draft
- Sufficiently evolved AI could seem divine just a draft