Projects Directory
Discover active AI safety research projects across academic institutions, independent organizations, and community-driven initiatives. Find collaboration opportunities and track research outputs.
Showing 16 projects
Circuits-Based Neural Network Interpretability
Constitutional AI Development
Developing AI systems that follow a set of principles (a 'constitution') to guide their behavior without extensive human feedback on every output.
Agent Foundations Research Agenda
Theoretical research on the mathematical foundations of aligned AI agents, including decision theory, logical uncertainty, and embedded agency.
Global AI Governance Mapping Project
Scalable Oversight Methods
Developing techniques for humans to effectively oversee AI systems even when the AI is performing tasks the human cannot directly evaluate.
Systematic Red Teaming for Large Language Models
Developing comprehensive methodologies for identifying vulnerabilities, harmful outputs, and failure modes in large language models.
AI Deception Detection Research
Investigating methods to detect when AI systems are being deceptive or strategically withholding information from users or overseers.
Eliciting Latent Knowledge (ELK)
Showing 1 to 9 of 16 results