Recent Research by WAISI Members
Can Your Uncertainty Scores Detect Hallucinated Entity? | Min-Hsuan Yeh, Max Kamachee, Seongheon Park, Yixuan Li | Feb 17, 2025 |
Can Your Uncertainty Scores Detect Hallucinated Entity? Min-Hsuan Yeh, Max Kamachee, Seongheon Park, Yixuan Li Feb 17, 2025 | ||
A Unified Understanding and Evaluation of Steering Methods | Shawn Im, Yixuan Li | Feb 4, 2025 |
A Unified Understanding and Evaluation of Steering Methods Shawn Im, Yixuan Li Feb 4, 2025 | ||
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence | Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei, Yixuan Li | Feb 2, 2025 |
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei, Yixuan Li Feb 2, 2025 | ||
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach | Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li | Feb 1, 2025 |
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li Feb 1, 2025 | ||
Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs | Jean-Charles Noirot Ferrand, Yohan Beugin, Eric Pauley, Ryan Sheatsley, Patrick McDaniel | Jan 27, 2025 |
Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs Jean-Charles Noirot Ferrand, Yohan Beugin, Eric Pauley, Ryan Sheatsley, Patrick McDaniel Jan 27, 2025 | ||
Err on the Side of Texture: Texture Bias on Real Data | Blaine Hoak, Ryan Sheatsley, Patrick McDaniel | Dec 13, 2024 |
Err on the Side of Texture: Texture Bias on Real Data Blaine Hoak, Ryan Sheatsley, Patrick McDaniel Dec 13, 2024 | ||
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education | Anand Syamkumar, Nora Tseng, Kaycie Barron, Shanglin Yang, Shamya Karumbaiah, Rheeya Uppaal, Junjie Hu | Nov 6, 2024 |
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education Anand Syamkumar, Nora Tseng, Kaycie Barron, Shanglin Yang, Shamya Karumbaiah, Rheeya Uppaal, Junjie Hu Nov 6, 2024 | ||
Safety-Aware Fine-Tuning of Large Language Models | Hyeong Kyu Choi, Xuefeng Du, Yixuan Li | Oct 13, 2024 |
Safety-Aware Fine-Tuning of Large Language Models Hyeong Kyu Choi, Xuefeng Du, Yixuan Li Oct 13, 2024 | ||
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition | Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios G Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos | Oct 8, 2024 |
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios G Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos Oct 8, 2024 | ||
On the Generalization of Preference Learning with DPO | Shawn Im, Yixuan Li | Aug 6, 2024 |
On the Generalization of Preference Learning with DPO Shawn Im, Yixuan Li Aug 6, 2024 | ||
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences | Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak | Jun 12, 2024 |
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak Jun 12, 2024 | ||
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity | Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu | May 22, 2024 |
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu May 22, 2024 | ||
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning | Hyeong Kyu Choi, Yixuan Li | May 3, 2024 |
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning Hyeong Kyu Choi, Yixuan Li May 3, 2024 | ||
Understanding the Learning Dynamics of Alignment with Human Feedback | Shawn Im, Yixuan Li | Mar 27, 2024 |
Understanding the Learning Dynamics of Alignment with Human Feedback Shawn Im, Yixuan Li Mar 27, 2024 | ||
ARGS: Alignment as reward-guided search | Maxim Khanov, Jirayu Burapacheep, Yixuan Li | Jan 23, 2024 |
ARGS: Alignment as reward-guided search Maxim Khanov, Jirayu Burapacheep, Yixuan Li Jan 23, 2024 | ||
Debate Helps Supervise Unreliable Experts | Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman | Nov 15, 2023 |
Debate Helps Supervise Unreliable Experts Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman Nov 15, 2023 | ||
The Efficacy of Transformer-based Adversarial Attacks in Security Domains | Kunyang Li, Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel | Oct 17, 2023 |
The Efficacy of Transformer-based Adversarial Attacks in Security Domains Kunyang Li, Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel Oct 17, 2023 | ||
The Space of Adversarial Strategies | Ryan Sheatsley, Blaine Hoak, Eric Pauley, and Patrick McDaniel | Aug 10, 2023 |
The Space of Adversarial Strategies Ryan Sheatsley, Blaine Hoak, Eric Pauley, and Patrick McDaniel Aug 10, 2023 | ||
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection | Rheeya Uppaal, Junjie Hu, Yixuan Li | May 22, 2023 |
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection Rheeya Uppaal, Junjie Hu, Yixuan Li May 22, 2023 | ||
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning | Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha | Feb 28, 2023 |
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha Feb 28, 2023 |