Agenda
| Friday April 5th | |
| 09:00–09:30 | Registration and Reception |
| 09:30–10:00 | Opening Ceremonies |
| 10:00–10:30 | Ryan Kidd, ML Alignment & Theory Scholars Program (MATS) “Insights from two years of AI safety field-building at MATS” |
| 10:30–11:00 | Jesse Hoogland, Timaeus “The Structure and Development of Neural Networks” |
| 11:00–11:30 | Miki Aoyagi, CST, Nihon University “Consideration on the Learning Efficiency of Multiple-Layered Neural Networks with Linear Units” |
| 11:30–12:00 | Stan van Wingerden, Timaeus “Singular Learning Theory and Alignment” |
| 12:00–13:30 | Lunch Break |
| 13:30–14:00 | Keynote: Dan Hendrycks, Center for AI Safety “The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning“ |
| 14:00–14:30 | Scott Emmons, Center for Human-Compatible AI, UC Berkeley “Challenges with Partial Observability of Human Evaluators in Reward Learning” |
| 14:30–15:00 | Oliver Klingefjord, Meaning Alignment Institute “What are human values, and how do we align to them?” |
| 15:00–15:50 | James Fox, LISA / University of Oxford & Matt MacDermott, Imperial College London / Causal Incentives Group “Towards Causal Foundations of Safe AGI” |
| 15:50–17:00 | Poster Session |
| Saturday April 6th | |
| 09:00–09:30 | Registration and Reception |
| 09:30–09:45 | Welcome back |
| 09:45–10:00 | Andrei Krutikov, Noeon Research “Noeon Research’s AI safety agenda” |
| 10:00–10:30 | Manuel Baltieri, Chief Researcher at Araya Inc. and Member of the Board of Directors, ISAL – International Society for Artificial Life “The Role of the Free Energy Principle in AI Safety: Markov Blankets and Beyond” |
| 10:30–11:00 | Martin Biehl, Cross Labs, Cross Compass “When can we see a Moore machine as an agent?” |
| 11:00–11:30 | Tim Parker, IRIT “Formalizing Ethics In Logic” |
| 11:30–12:00 | Koen Holtman, Holtman Research Systems “Agent Foundations for Corrigible and Domesticated AGIs” |
| 12:00–13:30 | Lunch Break |
| 13:30–14:00 | Robert Miles, Independent “Research Communication is Vital and You Can Do Better” |
| 14:00–14:30 | Hoagy Cunningham, Anthropic “Finding distributed features in LLMs with sparse autoencoders” |
| 14:30–15:00 | Oskar John Hollinsworth, SERI MATS “Linear Representations of Sentiment in Large Language Models” |
| 15:00–15:30 | Coffee/Tea Break |
| 15:30–16:00 | Aleksandar Petrov, University of Oxford “Universal Approximation via Prefix-tuning a Single Transformer Attention Head” |
| 16:00–16:30 | Noah Y. Siegel, Google DeepMind “Are LLM-Generated Explanations Faithful?” |
| 16:30–16:45 | Best Poster Award and Closing Ceremonies |