Recent Research by WAISI Members
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models? | Hyeong Kyu Choi, Xiaojin Zhu, Yixuan Li | Aug 24, 2025 |
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models? Hyeong Kyu Choi, Xiaojin Zhu, Yixuan Li Aug 24, 2025 | ||
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders | James Oldfield, Shawn Im, Yixuan Li, Mihalis A Nicolaou, Ioannis Patras, Grigorios G Chrysos | May 27, 2025 |
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders James Oldfield, Shawn Im, Yixuan Li, Mihalis A Nicolaou, Ioannis Patras, Grigorios G Chrysos May 27, 2025 | ||
Visual Instruction Bottleneck Tuning | Changdae Oh, Jiatong Li, Shawn Im, Yixuan Li | May 20, 2025 |
Visual Instruction Bottleneck Tuning Changdae Oh, Jiatong Li, Shawn Im, Yixuan Li May 20, 2025 | ||
On the Robustness Tradeoff in Fine-Tuning | Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick McDaniel | Mar 19, 2025 |
On the Robustness Tradeoff in Fine-Tuning Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick McDaniel Mar 19, 2025 | ||
Alignment and Adversarial Robustness: Are More Human-Like Models More Secure? | Blaine Hoak, Kunyang Li, Patrick McDaniel | Feb 17, 2025 |
Alignment and Adversarial Robustness: Are More Human-Like Models More Secure? Blaine Hoak, Kunyang Li, Patrick McDaniel Feb 17, 2025 | ||
Can Your Uncertainty Scores Detect Hallucinated Entity? | Min-Hsuan Yeh, Max Kamachee, Seongheon Park, Yixuan Li | Feb 17, 2025 |
Can Your Uncertainty Scores Detect Hallucinated Entity? Min-Hsuan Yeh, Max Kamachee, Seongheon Park, Yixuan Li Feb 17, 2025 | ||
A Unified Understanding and Evaluation of Steering Methods | Shawn Im, Yixuan Li | Feb 4, 2025 |
A Unified Understanding and Evaluation of Steering Methods Shawn Im, Yixuan Li Feb 4, 2025 | ||
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence | Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei, Yixuan Li | Feb 2, 2025 |
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei, Yixuan Li Feb 2, 2025 | ||
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach | Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li | Feb 1, 2025 |
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li Feb 1, 2025 | ||
Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs | Jean-Charles Noirot Ferrand, Yohan Beugin, Eric Pauley, Ryan Sheatsley, Patrick McDaniel | Jan 27, 2025 |
Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs Jean-Charles Noirot Ferrand, Yohan Beugin, Eric Pauley, Ryan Sheatsley, Patrick McDaniel Jan 27, 2025 | ||
Err on the Side of Texture: Texture Bias on Real Data | Blaine Hoak, Ryan Sheatsley, Patrick McDaniel | Dec 13, 2024 |
Err on the Side of Texture: Texture Bias on Real Data Blaine Hoak, Ryan Sheatsley, Patrick McDaniel Dec 13, 2024 | ||
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education | Anand Syamkumar, Nora Tseng, Kaycie Barron, Shanglin Yang, Shamya Karumbaiah, Rheeya Uppaal, Junjie Hu | Nov 6, 2024 |
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education Anand Syamkumar, Nora Tseng, Kaycie Barron, Shanglin Yang, Shamya Karumbaiah, Rheeya Uppaal, Junjie Hu Nov 6, 2024 | ||
Safety-Aware Fine-Tuning of Large Language Models | Hyeong Kyu Choi, Xuefeng Du, Yixuan Li | Oct 13, 2024 |
Safety-Aware Fine-Tuning of Large Language Models Hyeong Kyu Choi, Xuefeng Du, Yixuan Li Oct 13, 2024 | ||
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition | Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios G Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos | Oct 8, 2024 |
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios G Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos Oct 8, 2024 | ||
On the Generalization of Preference Learning with DPO | Shawn Im, Yixuan Li | Aug 6, 2024 |
On the Generalization of Preference Learning with DPO Shawn Im, Yixuan Li Aug 6, 2024 | ||
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences | Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak | Jun 12, 2024 |
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak Jun 12, 2024 | ||
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity | Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu | May 22, 2024 |
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu May 22, 2024 | ||
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning | Hyeong Kyu Choi, Yixuan Li | May 3, 2024 |
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning Hyeong Kyu Choi, Yixuan Li May 3, 2024 | ||
Understanding the Learning Dynamics of Alignment with Human Feedback | Shawn Im, Yixuan Li | Mar 27, 2024 |
Understanding the Learning Dynamics of Alignment with Human Feedback Shawn Im, Yixuan Li Mar 27, 2024 | ||
ARGS: Alignment as reward-guided search | Maxim Khanov, Jirayu Burapacheep, Yixuan Li | Jan 23, 2024 |
ARGS: Alignment as reward-guided search Maxim Khanov, Jirayu Burapacheep, Yixuan Li Jan 23, 2024 | ||
Debate Helps Supervise Unreliable Experts | Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman | Nov 15, 2023 |
Debate Helps Supervise Unreliable Experts Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman Nov 15, 2023 | ||
The Efficacy of Transformer-based Adversarial Attacks in Security Domains | Kunyang Li, Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel | Oct 17, 2023 |
The Efficacy of Transformer-based Adversarial Attacks in Security Domains Kunyang Li, Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel Oct 17, 2023 | ||
The Space of Adversarial Strategies | Ryan Sheatsley, Blaine Hoak, Eric Pauley, and Patrick McDaniel | Aug 10, 2023 |
The Space of Adversarial Strategies Ryan Sheatsley, Blaine Hoak, Eric Pauley, and Patrick McDaniel Aug 10, 2023 | ||
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection | Rheeya Uppaal, Junjie Hu, Yixuan Li | May 22, 2023 |
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection Rheeya Uppaal, Junjie Hu, Yixuan Li May 22, 2023 | ||
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning | Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha | Feb 28, 2023 |
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha Feb 28, 2023 |