Agenda

Friday April 5th
09:00–09:30	Registration and Reception
09:30–10:00	Opening Ceremonies
10:00–10:30	Ryan Kidd, ML Alignment & Theory Scholars Program (MATS) “Insights from two years of AI safety field-building at MATS”
10:30–11:00	Jesse Hoogland, Timaeus “The Structure and Development of Neural Networks”
11:00–11:30	Miki Aoyagi, CST, Nihon University “Consideration on the Learning Efficiency of Multiple-Layered Neural Networks with Linear Units”
11:30–12:00	Stan van Wingerden, Timaeus “Singular Learning Theory and Alignment”
12:00–13:30	Lunch Break
13:30–14:00	Keynote: Dan Hendrycks, Center for AI Safety “The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning“
14:00–14:30	Scott Emmons, Center for Human-Compatible AI, UC Berkeley “Challenges with Partial Observability of Human Evaluators in Reward Learning”
14:30–15:00	Oliver Klingefjord, Meaning Alignment Institute “What are human values, and how do we align to them?”
15:00–15:50	James Fox, LISA / University of Oxford & Matt MacDermott, Imperial College London / Causal Incentives Group “Towards Causal Foundations of Safe AGI”
15:50–17:00	Poster Session
Saturday April 6th
09:00–09:30	Registration and Reception
09:30–09:45	Welcome back
09:45–10:00	Andrei Krutikov, Noeon Research “Noeon Research’s AI safety agenda”
10:00–10:30	Manuel Baltieri, Chief Researcher at Araya Inc. and Member of the Board of Directors, ISAL – International Society for Artificial Life “The Role of the Free Energy Principle in AI Safety: Markov Blankets and Beyond”
10:30–11:00	Martin Biehl, Cross Labs, Cross Compass “When can we see a Moore machine as an agent?”
11:00–11:30	Tim Parker, IRIT “Formalizing Ethics In Logic”
11:30–12:00	Koen Holtman, Holtman Research Systems “Agent Foundations for Corrigible and Domesticated AGIs”
12:00–13:30	Lunch Break
13:30–14:00	Robert Miles, Independent “Research Communication is Vital and You Can Do Better”
14:00–14:30	Hoagy Cunningham, Anthropic “Finding distributed features in LLMs with sparse autoencoders”
14:30–15:00	Oskar John Hollinsworth, SERI MATS “Linear Representations of Sentiment in Large Language Models”
15:00–15:30	Coffee/Tea Break
15:30–16:00	Aleksandar Petrov, University of Oxford “Universal Approximation via Prefix-tuning a Single Transformer Attention Head”
16:00–16:30	Noah Y. Siegel, Google DeepMind “Are LLM-Generated Explanations Faithful?”
16:30–16:45	Best Poster Award and Closing Ceremonies

Technical AI Safety Conference

Agenda