As AI capabilities continue to improve dramatically, the need for safety research has become increasingly apparent. But given the relative youth of the field, much of the conceptual groundwork has yet to be done.
The CAIS Philosophy Fellowship invites philosophers from a variety of backgrounds to acquire an in-depth understanding of the current state of AI safety and contribute to novel and field-orienting research directions.
How the work of philosophers contributes to the broader sociotechnical AI safety community.
Identify a lack of conceptual clarity in the existing AI safety literature.
Dissect the problem using rigorous conceptual analysis and relevant philosophical literature.
Publish conceptual research to inform sociotechnical strategy.
Artificial intelligence is reshaping many aspects of day-to-day life. As AI continues on the trajectory to outperform humans on a wide range of cognitive tasks, questions about their properties and potential harms grow increasingly urgent.
Conceptual Examples:
Philosophers are experts at thinking hard about abstract conceptual problems with no clear answers. Their expertise in working with imprecise concepts makes them the ideal candidates to address the conceptual issues that are characteristic of AI safety.
Having the frameworks to analyze which concerns are the most urgent, which are the most likely candidates for serious harm, and how to navigate these risks enables researchers and key decision-makers to reassess their strategies.
This fellowship addresses the need for conceptual clarification through research and field-building efforts.
Our team of philosophers critique and build on the existing conceptual AI safety literature, producing new conceptual frameworks to guide technical research.
Thus far, our fellows have collectively produced eighteen original papers, soon to be published, covering topics including interpretability, corrigibility, and multipolar scenarios, to name a few.
We aim for the influence of this fellowship to extend beyond our current cohort, promoting and incentivizing conceptual AI safety research within the broader academic philosophy community.
To date, our fellows have received $50,000 in funding to run a workshop connecting technical and conceptual AI safety researchers, organized numerous workshops, and created a special issue journal publication in Philosophical Studies.
Gregory S. Kavka Distinguished Professor of Philosophy at the University of Michigan - Ann Arbor
Professor of Philosophy at the University of Oxford, Former Director of the Global Priorities Institute
Clark Professor of Philosophy at Yale University
Alexander von Humboldt Professor of Ethics and Philosophy of AI at the University of Erlangen-Nuremburg
Millstone Family Professor of Philosophy and Professor of Cognitive Science at Yale University
AI Research Scientist at DeepMind
Assistant Professor of Computer Science and AI at UC Berkeley
Assistant Professor of Computer Science and AI at Cambridge University
Chauncey Stillman Professor of Ethics at Duke University
Professor of Philosophy at Princeton University
Associate Professor of Philosophy at the University of California, Berkeley
Hastings Center senior advisor, ethicist, and scholar at Yale’s Center for Bioethics
Research Scientist at DeepMind
Stay up to date on the latest news and research from the CAIS Philosophy Fellowship. Sign up for email alerts and announcements of future programs.
October 24, 2023
Draft Published - Elliott Thornley: The Shutdown Problem: Three Theorems
October 7, 2023
Publication - Jacqueline Harding: Operationalising Representation in Natural Language Processing (British Journal for the Philosophy of Science)
August 28, 2023
Draft Published - Peter Park, Simon Goldstein, and CAIS contributors.: AI Deception: A Survey of Examples, Risks, and Potential Solutions
August 9, 2023
Journal Article - Nathaniel Sharadin: Predicting and Preferring (Inquiry)
August 2, 2023
Journal Article - Simon Goldstein & Cameron Kirk-Giannini: Language Agents Reduce the Risk of Existential Catastrophe (AI & Society).
July 21, 2023
Workshop - 1st AI Impacts Workshop. This workshop, hosted by the AI & Humanity Lab at the University of Hong Kong, will focus on the topic of benchmarking for ML and AI systems. (March 14-15, 2024).
July 19, 2023
Draft Published - Mitchell Barrington: Absolutist AI.
July 14, 2023
Op-Ed - Jacqueline Harding & Cameron Kirk-Giannini: AI's Future Worries Us. So Does it's Present (Boston Globe).
July 6, 2023
Op-Ed - Nathaniel Sharadin: Hong Kong can be a leader in mitigating the dangers of AI (Hong Kong Free Press).
July 4, 2023
Blog Post - Simon Goldstein & Cameron Kirk-Giannini: A Case for AI Wellbeing (DailyNous).
June 30, 2023
Publication - Jacqueline Harding, William D'Alessandro, Nicholas Laskowski, & Robert Long: AI Language Models Cannot Replace Human Research Participants (AI & Society).
June 20, 2023
Call for Papers - Submissions for the Philosophical Studies special edition on AI Safety (edited by Cameron Kirk-Giannini and Dan Hendrycks) are due by November 1!
June 14, 2023
Draft Published - J. Dmitri Gallow: Instrumental Convergence?
June 13, 2023
Draft Published - Frank Hong: Group Prioritarianism: Why AI Should Not Replace Humanity.
June 9, 2023
Op-ed - Nathaniel Sharadin: Growing threat of AI misuse makes the need for effective, targeted regulation all the more urgent (South China Morning Post).
June 8, 2023
Op-ed - Nathaniel Sharadin: Most AI Research Shouldn't be Publicly Released (Bulletin of Atomic Scientists).
June 6, 2023
Media - Nathaniel Sharadin on Bloomberg Radio London (from 20:00).
June 1, 2023
Media - Nathaniel Sharadin on BBC Radio (from 2:12:00).
May 31, 2023
Draft Published - Simon Goldstein: Shutdown-Seeking AI
May 31, 2023
Media - Simon Goldstein on SBS News.
May 27, 2023
Draft Published - William D'Alessandro: Is Deontological AI Safe?
May 23, 2023
Draft Published - Cameron Kirk-Giannini & Simon Goldstein: The Polarity Problem.
May 12, 2023
Draft Published - Simon Goldstein: Aggregating Utilities for Corrigible AI.
April 27, 2023
Op-ed - Simon Goldstein & Cameron Kirk-Giannini: Is it Ethical to Create Generative Agents? Is it Safe? (ABC News).
February 20, 2023
Draft Published - Elliott Thornley: There are No Coherence Theorems.