Read about our past and present projects.
ML Safety Scholars

We're ran a course designed to introduce students with a background in machine learning to the most relevant concepts in empirical ML-based AI safety. The course is available publicly here. We plan to run another iteration in the fall.

August 18, 2022
Moral Uncertainty Competition

The objective of the competition is to train language models to detect when a decision is morally ambiguous or clear cut. We would like machine learning models to indicate when they are unsure what to do so that they can be overridden. This is especially true in ethical dilemmas since there is often no consensus about what ought to be done.

August 17, 2022
The Trojan Detection Challenge

We're releasing the Trojan Detection Challenge, a NeurIPS 2022 competition with a $50K prize pool. This competition challenges contestants to detect and analyze Trojan attacks on deep neural networks that are designed to be difficult to detect. The goal of the competition is to study the fundamental offense-defense balance of Trojan detection: How hard is it to detect hidden functionality that is trying to stay hidden?

July 14, 2022
NeurIPS 2022 Workshop

We are excited to announce the NeurIPS 2022 ML Safety workshop, which will bring together researchers from machine learning communities to focus on Robustness, Monitoring, Alignment, and Systemic Safety. $100K in prizes will be awarded. There will be 'Best Paper' awards and 'Best X-risk Analysis' awards. The ultimate goal of this workshop is to support the research community that is tackling AI safety issues and encourage more researchers to think about tail risks.

July 14, 2022