Koen Holtman

ホルトマン・コエン

Holtman Systems Research

Koen Holtman is a Dutch systems architect. He has a PhD in software design and 20 years of experience in multidisciplinary industrial R&D. Since 2019, he has been working as an independent AI/AGI safety researcher, and has published mathematical work on AGI agent foundations and corrigibility. He is currently working in the European CEN-CENELEC JTC 21 standards committee, to write the European AI risk management standards that will support the upcoming European AI Act.

Agent Foundations for Corrigible and Domesticated AGIs

Saturday, April 6th, 11:30–12:00

I discuss how safer AI systems, and safer potential future AGI systems, can be built by departing from the standard model of utility maximization.  In the current standard model of AI, the goal is to design an ML-based agent that can obediently and competently ‘maximize X’. I depart from this by showing how it is better to design ML-based agents that can obediently and competently ‘maximize X while acting as if Y’.   Various corrigibility and domestication desiderata that are very difficult to robustly encode inside of a utility function X can be trivially encoded as an ‘acting as if Y’ clause instead.  I will show that ‘maximize X while acting as if Y’ can be operationalized by using an AI design approach called counterfactual planning. This approach that has close ties to work on causal agent foundations. I will also show how ‘act as if Y’ is a very basic, but often overlooked, building block of human morality.