WAISI & XLab

WAISI Technical AI Safety Workshop Program

Most AI Safety communities introduce members who are interested in technical AI safety through the pipeline of Intro Technical Fellowship → Paper Reading Sessions → Alignment Research Engineer Accelerator program (ARENA) → Research Programs (SPAR, XLab SRF, MATS). However, most university groups have struggled with ARENA sessions for a few key reasons: the steep learning curve, significant time commitment, and lack of experienced TA's. The technical workshop program aims to address these issues by creating ARENA-styled workshops on AI Safety topics that focus on shorter, more manageable exercises, while still preserving the rigor of research-style work.

Transferable Adversarial Materials (TAM): Defeating ISR AUASs and LAWSs via Disruptive and Adversarial Material

Within the past decade, small portable Unmanned Aerial Systems (UASs) operated by individual infantry units have been demonstrated to be vital assets on the battlefield in intelligence, surveillance, and reconnaissance (ISR) roles as well as in one-way suicide attacks (loitering munition) and reusable bomb-dropping UASs. Many countries are attempting to integrate AI vision models into these systems to automate navigation and target identification and reduce vulnerability to jamming. We aim to demonstrate the effectiveness of a Transferable Adversarial Material (TAM), a deformable material which could be deployed in a variety of settings and deceive military-purpose computer vision models analogous to those being deployed in AUASs.

Our Research Catalog

Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Hyeong Kyu Choi, Xiaojin Zhu, Yixuan Li
NeurIPS 2025 Spotlight
Aug 24, 2025
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
James Oldfield, Shawn Im, Yixuan Li, Mihalis A Nicolaou, Ioannis Patras, Grigorios G Chrysos
NeurIPS 2025
May 27, 2025
Visual Instruction Bottleneck Tuning
Changdae Oh, Jiatong Li, Shawn Im, Yixuan Li
NeurIPS 2025
May 20, 2025
On the Robustness Tradeoff in Fine-Tuning
Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick McDaniel
ICCV 2025
Mar 19, 2025
Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?
Blaine Hoak, Kunyang Li, Patrick McDaniel
Feb 17, 2025
Can Your Uncertainty Scores Detect Hallucinated Entity?
Min-Hsuan Yeh, Max Kamachee, Seongheon Park, Yixuan Li
TMLR 2025
Feb 17, 2025
A Unified Understanding and Evaluation of Steering Methods
Shawn Im, Yixuan Li
Feb 4, 2025
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence
Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei, Yixuan Li
ICML 2025
Feb 2, 2025
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li
ICML 2025
Feb 1, 2025
Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs
Jean-Charles Noirot Ferrand, Yohan Beugin, Eric Pauley, Ryan Sheatsley, Patrick McDaniel
ACM 2024
Jan 27, 2025
Err on the Side of Texture: Texture Bias on Real Data
Blaine Hoak, Ryan Sheatsley, Patrick McDaniel
SaTML 2025
Dec 13, 2024
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education
Anand Syamkumar, Nora Tseng, Kaycie Barron, Shanglin Yang, Shamya Karumbaiah, Rheeya Uppaal, Junjie Hu
ACM 2024
Nov 6, 2024
Safety-Aware Fine-Tuning of Large Language Models
Hyeong Kyu Choi, Xuefeng Du, Yixuan Li
NeurIPS 2024, Workshop on Safe Generative AI
Oct 13, 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios G Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
ICML 2025 Spotlight
Oct 8, 2024
On the Generalization of Preference Learning with DPO
Shawn Im, Yixuan Li
Aug 6, 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak
ICLR 2025
Jun 12, 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu
ICLR 2025
May 22, 2024
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
Hyeong Kyu Choi, Yixuan Li
ICML 2024
May 3, 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im, Yixuan Li
ICML 2024
Mar 27, 2024
ARGS: Alignment as reward-guided search
Maxim Khanov, Jirayu Burapacheep, Yixuan Li
ICLR 2024
Jan 23, 2024
Debate Helps Supervise Unreliable Experts
Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman
Nov 15, 2023
The Efficacy of Transformer-based Adversarial Attacks in Security Domains
Kunyang Li, Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel
Oct 17, 2023
The Space of Adversarial Strategies
Ryan Sheatsley, Blaine Hoak, Eric Pauley, and Patrick McDaniel
Aug 10, 2023
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection
Rheeya Uppaal, Junjie Hu, Yixuan Li
ACL 2023
May 22, 2023
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha
ICLR 2023
Feb 28, 2023

Faculty Collaborators

Assistant Professor
Learning (robust) representations and generative modeling
Assistant Professor in the Department of Computer Sciences
Reinforcement learning and autonomous agents
Assistant Professor in the Department of Computer Sciences
Natural language processing and machine learning
Professor in the Department of Computer Sciences
Adversarial machine learning, privacy, and formal methods
Associate Professor in the Electrical and Computer Engineering Department
Theory and algorithms for deep learning with foundation models
Associate Professor in the Department of Computer Sciences
Algorithmic and theoretical foundations of reliable machine learning
Professor in the Department of Computer Sciences
Mobile security, adversarial ML, and systems security research
Associate Professor in the Electrical and Computer Engineering Department
Machine learning, coding theory, and optimization
Assistant Professor in the Department of Computer Sciences
Fundamentals of data-driven systems and machine learning
Professor in the Department of Biostatistics
Image analysis, computer vision, and ML in biostatistics
Assistant Professor in the ECE Department
Machine learning, statistical inference, and crowdsourcing
Assistant Professor in the Department of Statistics
LLM evaluations, high dimensional statistics, and deep learning theory