Large Language Models
Research on LLMs, transformers, and language model scaling
20 papers indexed
Attention Is Where You Attack
The paper introduces the Attention Redistribution Attack (ARA), a white-box adversarial method that bypasses safety alignments in LLMs by manipulating the attention mechanism's geometry, showing that…
SDR: Set-Distance Rewards for Radiology Report Generation
The paper introduces Set-Distance Rewards (SDR), a permutation-invariant reward signal that effectively guides the generation of unordered radiology reports, significantly outperforming standard train…
Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking
Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin +9 more
The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
Jiacheng Liang, Yao Ma, Tharindu Kumarage, Satyapriya Krishna +4 more
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial…
Seed Hijacking of LLM Sampling and Quantum Random Number Defense
Ziyang You, Xiaoke Yang, Zhanling Fan, Feng Guo +2 more
The paper introduces SeedHijack, a backdoor attack that manipulates the pseudorandom number generation process in LLMs to force specific token selections, and proposes a hardware quantum random number…
Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models
The paper proposes using GPT-4o to generate controlled paraphrases of target text for sign language translation (SLT) augmentation, achieving significant BLEU-4 improvements on PHOENIX14T.
In-Context Reward Adaptation for Robust Preference Modeling
The paper proposes In-Context Reward Adaptation, a transformer-based framework that uses in-context learning and auxiliary signals (like human response time) to robustly model diverse and unseen human…
Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows
The paper benchmarks local, offline LLMs for confidential translation workflows, demonstrating that while they are viable for privacy-sensitive use, they generally lag behind top commercial NMT system…
TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
The paper introduces Head-Masked Nullspace Steering (HMNS), a novel geometry-aware attack method that achieves state-of-the-art jailbreak success rates by manipulating the internal attention mechanism…
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression
Minghui Zheng, Hongxu Chen, Huimin Ren, Hongsheng Xin +7 more
HMPO introduces a single-stage, cost-effective reinforcement learning framework that achieves significant token compression of Chain-of-Thought reasoning with minimal loss of accuracy, applicable acro…
Sequential Data Poisoning in LLM Post-Training
Jack Sanderson, Yihan Wang, Xiaoqian Lu, Gautam Kamath +1 more
The paper introduces the threat model of sequential data poisoning, demonstrating that multiple, collaborating attackers can exploit compound vulnerabilities in LLM post-training pipelines that are in…
Uncertainty-Aware Transfer Learning for Cross-Building Energy Forecasting: Toward Robust and Scalable District-Level Energy Management
The paper proposes an uncertainty-aware transfer learning framework using the Temporal Fusion Transformer (TFT) to achieve robust and scalable energy forecasting across different buildings, demonstrat…
Rethinking the Role of Temperature in Large Language Model Distillation
This paper re-examines the role of temperature ($ au$) in LLM distillation, demonstrating that while Reverse KL (RKL) is often preferred, Forward KL (FKL) significantly outperforms RKL at higher tempe…
MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning
Yi Bai, Wenhao Zhang, Yao Chen, Jiao Xue +2 more
The paper proposes MADS, a Model-Aware Diverse Core Set Selection method that uses LLM internal activation states to select a small, diverse core set of instructions, significantly improving model per…
Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models
Weiwei Qi, Zefeng Wu, Tianhang Zheng, Zikang Zhang +3 more
The paper proposes the Expected Safety Impact (ESI) framework to identify safety-critical parameters in LLMs, introducing targeted tuning methods (SET and SPA) to enhance safety and preserve alignment…
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
This paper investigates the privacy risk of reconstructing Personally Identifiable Information (PII) from Large Language Models (LLMs) that have undergone Supervised Finetuning (SFT), proposing a nove…
Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…
Short-form Text Rewriting with Phi Silica
Divya Tadimeti, Shawn Pan, Sameera Lanka, Chenghui Zhou +1 more
This paper demonstrates that targeted adaptation of the small language model Phi Silica, using dataset curation and fine-tuning, significantly improves its performance in short-form text rewriting, na…
The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF
Zeli Su, Zhankai Xu, Tianlei Chen, Longfei Zheng +3 more
The paper introduces DistractionIF, a benchmark showing that larger LLMs are paradoxically less robust to benign, instruction-like noise in reference text, suggesting reinforcement learning can restor…