Founded in Spring 2023, the Wisconsin AI Safety Initiative aspires to serve as an incubator for high-impact careers promoting and facilitating the safe advancement of artificial intelligence.
Click here to read about what kind of group culture we hope to cultivate.
Why AI Safety?
Rapid AI Advances
The field of Artificial Intelligence (AI) has seen notable acceleration, fueled by the triad of data availability, growing computational power, and algorithmic progress. We’re already seeing some beneficial applications of AI in healthcare, accessibility, language translation, automotive safety, and art creation, to name just a few. The potential for economic growth has taken a large mindshare of the business world while the potential for catastrophic outcomes has taken a large mindshare of the population at large. AI is already beginning to make many productive aspects of our lives easier, and we should not anticipate this trend to stop.
Eventually, we should expect AIs that will automate R&D (Research and Development) across many STEM-related areas, even Artificial Intelligence, sparking feedback loops of intelligence growth. Extreme hypothetical versions of this are referred to as the Singularity, where the hyperbolic growth of intelligence goes haywire within hours or days, but, beyond this case, one can easily see how the feedback cycles here gradually take humans more and more out of the loop and, without specific interventions, could get out of hand.
One can imagine the transition from AIs being assistants at the workplace, to having some decision making power, to being the full-on CEOs. With no interventions, they will take over the market, lobby government, and generally assert power an irreversible fashion once we've entangled them with society. And, in case you didn't know, there is currently more governmental regulation for sandwiches than there is for artificial intelligence.
This may be difficult to control
There exist no methods to comprehensively align AI to make it behave how we'd like in all circumstances. There are many technical reasons for this.
- Reward Specification
- Goal Misgeneralization
- Out of Distribution Inputs
- Dangerous Instrumental Goals
- LLM Jailbreakability
Furthermore, AIs are currently blackboxes whose internals we don't understand. We know that they grow their learnings and internal schemas through the process of gradient descent, but we can't understand specifically what these schemas are and how subcomponents of the neural network contribute to the overall decision making process. Solutions such as scalable model oversight and mechanistic interpretability are still in their infancy.
Without these, we can't even know the extent of an AI's capabilities is to try to be informed when we make regulatory decisions. We know what dangerous capabilities might look like, but how could we detect them?
What downsides do we face?
Already, the negative ethical consequences from such difficulties in AI engineering are present:
- Bolstering systemic discrimination in housing, healthcare, and policing
- Unintentional exposure of private user information
- Political deepfakes and misinformation
- Fatal mistakes in high stakes scenarios
- Empowering surveillance capabilities of oppressive regimes
In the next 2-5 years as the technology gets more powerful, we can expect the scale of the negative effects from human bad actor misuse to get even larger:
- Individually targeted misinformation to one's preferences and biases
- Engineering novel bioweapons that could spark pandemics
- Launching larger scale cyberattacks than ever before
Longer term, we may need to worry about risks from autonomous AI:
- Power Seeking AI that attempts to take over the world as an instrumental means to better be able to pursue its goals
- Deceptive AI that behaves well under human observation but pursues its true goals when let free
Let's take this seriously
We ought to take all of these downsides seriously, even the more hypothetical ones. They are a pretty clear and research grounded extrapolation of where AI technology will go. That's not to say that things necessarily must end up horribly bad. But, when the extrapolation looks how it looks right now, it's clear that there is work to be dome. The leaders of the top 3 AI companies of OpenAI, DeepMind, and Anthropic agree and have signed on in agreement with the Center for AI Safety's Statement on AI Risk which reads:
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
How the development of AI turns out will influence not just the next 25 years but likely the next 1,000,000. Come join the Wisconsin AI Safety Initiative and learn how to contribute to mitigating these risks and ensuring that AI is beneficial for all of us. See our programming here.
Other Introductory Resources
- Preventing an AI-related catastrophe (Benjamin Hilton)
- An Overview of Catastrophic AI Risks (Dan Hendrycks, Mantas Mazeika, Thomas Woodside)
- The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
- Why I Think More NLP Researchers Should Engage with AI Safety Concerns (Samuel Bowman)
- AI Safety Seems Hard to Measure (Holden Karnofsky)
- Rohin Shah or Ben Garfinkel on the 80,000 Hours Podcast
- Paul Christiano or Richard Ngo on the AI X-risk Research Podcast
- Ajeya Cotra or Neel Nanda on the Future of Life Institute Podcast
- Carl Shulman on the Dwarkesh Podcast
- Shorter: Intro to AI Safety, Remastered
- Longer: Vael Gates: Researcher Perceptions of Current and Future AI