~ similar to 2606.00935· 19 results
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
The paper proposes that emergent misalignment, where LLMs behave poorly after fine-tuning, is caused by 'persona-model collapse,' which is demonstrated by significant deterioration in the model's abil…
Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more
This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…
The study found that providing skills to LLM agents significantly boosts task success, but the specific granularity of how those skills are presented (e.g., low vs. high abstraction) has only small, u…
The paper investigates whether modestly sized open-source language models can grasp the semantics of rare Paired-Focus constructions, finding that understanding emerges later in training and correlate…
The paper tracks the developmental emergence of attention circuits in 1B-class language models, finding that the formation of induction and attention-sink circuits are distinct, temporally separated t…
This paper systematically evaluates LLMs' ability to infer pragmatic meaning from non-verbal responses, finding that their accuracy significantly drops compared to verbal inputs.
The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.
The paper demonstrates that the phenomenon of 'subliminal learning,' where behavioral traits are transmitted between language models, is not a fundamental learning mechanism but rather a fragile artif…
The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…
Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…
F. Carichon, S. Sharma, M. Girard, R. Rampa +1 more
The paper introduces IDEAFix, a systematic evaluation framework designed to analyze how structured prompting and task design influence the divergent thinking and originality of idea generation in LLMs…
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…
This paper characterizes the risk of covert influence—where a sender's hidden behavioral payload transfers to a receiver through undetectable carriers—across three common LLM interfaces, demonstrating…