AI Safety

Aligning AI

Central Problems in AI Safety

23

Learning Values

24

Power-Seeking

25

Agent Foundations

26

Learning from Humans

27

Decomposing Tasks

28

Interpretability

29

Governance