Projects Directory

Discover active AI safety research projects across academic institutions, independent organizations, and community-driven initiatives. Find collaboration opportunities and track research outputs.

Showing 16 projects

ActiveAcademic

Circuits-Based Neural Network Interpretability

Reverse-engineering neural networks by identifying and understanding individual circuits and their functions within transformer models.

Chris Olah
Anthropic
InterpretabilityMechanistic InterpretabilityDeep Learning
1283
ActiveAcademic

Constitutional AI Development

Developing AI systems that follow a set of principles (a 'constitution') to guide their behavior without extensive human feedback on every output.

Yuntao Bai
Anthropic
AlignmentConstitutional AIRLHF
531
ActiveLessWrong

Agent Foundations Research Agenda

Theoretical research on the mathematical foundations of aligned AI agents, including decision theory, logical uncertainty, and embedded agency.

Eliezer Yudkowsky
MIRI
Agent FoundationsDecision TheoryAlignment
20502
ActiveEA Forum

Global AI Governance Mapping Project

Comprehensive mapping of AI governance initiatives, policies, and actors worldwide to inform effective policy interventions.

Allan Dafoe
Centre for the Governance of AI, Oxford University
AI GovernanceAI PolicyForecasting
8151
Seeking CollaboratorsAcademic

Scalable Oversight Methods

Developing techniques for humans to effectively oversee AI systems even when the AI is performing tasks the human cannot directly evaluate.

Jan Leike
OpenAI
Scalable OversightAlignmentEvaluation
642
ActiveAcademic

Systematic Red Teaming for Large Language Models

Developing comprehensive methodologies for identifying vulnerabilities, harmful outputs, and failure modes in large language models.

Deep Ganguli
Anthropic
Red TeamingEvaluationRobustness
462
ActiveLessWrong

AI Deception Detection Research

Investigating methods to detect when AI systems are being deceptive or strategically withholding information from users or overseers.

Evan Hubinger
Anthropic, MIRI
Deception DetectionAlignmentInterpretability
3121
ActiveIndependent

Eliciting Latent Knowledge (ELK)

Research program focused on getting AI systems to honestly report their internal knowledge, even when they might have incentives to be deceptive.

Paul Christiano
ARC
AlignmentInterpretabilityValue Learning
281
ActiveAcademic

Cooperative AI Foundation Research

Studying how to build AI systems that can cooperate effectively with humans and other AI systems, including multi-agent coordination.

Gillian Hadfield
DeepMind, Oxford University
Agent FoundationsAI GovernanceValue Learning
1050

Showing 1 to 9 of 16 results