Dr. Deborah Diaz

Google DeepMind

AcademicLessWrong

Papers

Posts

h-index

Links

Website @deborahdiaz

Research Topics

Value AlignmentMechanistic InterpretabilityAI AlignmentDeep Learning

About

Dr. Deborah Diaz is a researcher specializing in value alignment, mechanistic interpretability, ai alignment, deep learning. They have published extensively in top-tier venues and are actively involved in the academic community.

Scaling Monosemanticity: Extracting Interpretable Features from Large Language Models

NeurIPS2024156 citations

Co-authors: Neel Nanda, Jan Leike

Toward Understanding of Circuits in Transformers

ICML2023243 citations

Co-authors: Neel Nanda

Activation Patching: A Causal Lens on Neural Networks

ICLR2023189 citations

Co-authors: Paul Christiano