Agenda

Friday April 5th
09:00–09:30Registration and Reception
09:30–10:00Opening Ceremonies
10:00–10:30Ryan Kidd, ML Alignment & Theory Scholars Program (MATS)
“Insights from two years of AI safety field-building at MATS”
10:30–11:00Jesse Hoogland, Timaeus
“The Structure and Development of Neural Networks”
11:00–11:30Miki Aoyagi, CST, Nihon University
“Consideration on the Learning Efficiency of Multiple-Layered Neural Networks with Linear Units”
11:30–12:00Stan van Wingerden, Timaeus
“Singular Learning Theory and Alignment”
12:00–13:30Lunch Break
13:30–14:00Keynote: Dan Hendrycks, Center for AI Safety
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
14:00–14:30Scott Emmons, Center for Human-Compatible AI, UC Berkeley
“Challenges with Partial Observability of Human Evaluators in Reward Learning”
14:30–15:00Oliver Klingefjord, Meaning Alignment Institute
“What are human values, and how do we align to them?”
15:00–15:50James Fox, LISA / University of Oxford
& Matt MacDermott, Imperial College London / Causal Incentives Group
“Towards Causal Foundations of Safe AGI”
15:50–17:00Poster Session
Saturday April 6th
09:00–09:30Registration and Reception
09:30–09:45Welcome back
09:45–10:00Andrei Krutikov, Noeon Research
“Noeon Research’s AI safety agenda”
10:00–10:30Manuel Baltieri, Chief Researcher at Araya Inc. and Member of the Board of Directors, ISAL – International Society for Artificial Life
“The Role of the Free Energy Principle in AI Safety: Markov Blankets and Beyond”
10:30–11:00Martin Biehl, Cross Labs, Cross Compass
“When can we see a Moore machine as an agent?”
11:00–11:30Tim Parker, IRIT
“Formalizing Ethics In Logic”
11:30–12:00Koen Holtman, Holtman Research Systems
“Agent Foundations for Corrigible and Domesticated AGIs”
12:00–13:30Lunch Break
13:30–14:00Robert Miles, Independent
“Research Communication is Vital and You Can Do Better”
14:00–14:30Hoagy Cunningham, Anthropic
“Finding distributed features in LLMs with sparse autoencoders”
14:30–15:00Oskar John Hollinsworth, SERI MATS
“Linear Representations of Sentiment in Large Language Models”
15:00–15:30Coffee/Tea Break
15:30–16:00Aleksandar Petrov, University of Oxford
“Universal Approximation via Prefix-tuning a Single Transformer Attention Head”
16:00–16:30Noah Y. Siegel, Google DeepMind
“Are LLM-Generated Explanations Faithful?”
16:30–16:45Best Poster Award and Closing Ceremonies