Noah Y. Siegel

シーゲル・ノア

Google DeepMind

Noah Y. Siegel is a senior research engineer at Google DeepMind. He currently researches large language model reasoning, including faithfulness and debate. He previously worked on deep reinforcement learning for robotic control.

Are LLM-Generated Explanations Faithful?

Saturday, April 6th, 16:00–16:30

In order to oversee advanced AI systems, it is important to understand their reasons for generating a given output. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are truly capturing the factors responsible for the model’s predictions: the most “human-like” explanation may be different from the one that is most faithful to the model’s true decision making process. In this talk, I’ll give an overview of ways researchers have tried to measure and improve faithfulness, and discuss our own work on improving faithfulness metrics.

Technical AI Safety Conference

Noah Y. Siegel

シーゲル・ノア

Google DeepMind

Are LLM-Generated Explanations Faithful?

Saturday, April 6th, 16:00–16:30