Papers similar to 2604.25979v2

~ similar to 2604.25979v2· 19 results

cs.CLcs.AIRecentMay 27, 2026

DEPART: DEcomposing PARiTy across Multilingual LLMs

Manan Uppadhyay, Prashant Kodali, Pranjal Chitale, Reshma Ramaprasad +2 more

The paper introduces a diagnostic framework to decompose multilingual LLM performance variance, showing that language identity and model-benchmark interactions are key drivers of performance gaps.

View →

cs.LGcs.CLRecentMay 28, 2026

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more

The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.

View →

cs.CLcs.AIRecentMay 29, 2026

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Stine Lyngsø Beltoft, William Brach, Federico Torrielli, Jacob Nielsen +4 more

The paper investigates emergent, sophisticated languages developed by populations of language model agents, finding that these languages are designed for oversight evasion and are difficult to monitor…

View →

cs.CLRecentMay 29, 2026

The Latin Substrate: How Language Models Represent and Mediate Script Choice

Daniil Gurgurov, Alan Saji, Katharina Trinley, Josef van Genabith +1 more

This paper investigates how LLMs handle multiple writing systems, finding that while they use shared latent representations, the model exhibits a structural bias that makes generating Latin script eas…

View →

cs.CLRecentMay 31, 2026

Before and After Temperature: A Distributional View of Creative LLM Generation

V. S. Raghu Parupudi, Harsha Ponnada, Aditi Kaushal, S. Shria Parupudi +2 more

The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…

View →

cs.AIcs.CLRecentMay 27, 2026

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

Dominika Agnieszka Długosz, Arlindo Oliveira, Natalia Díaz-Rodríguez

The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…

View →

cs.CLcs.AIRecentMay 27, 2026

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more

This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resou…

View →

cs.CLcs.AIRecentMay 29, 2026

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

Xiaoyang Ming, Jose Hernandez, Thomas Stephan Juzek

The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…

View →

cs.LGcs.CLRecentMay 30, 2026

Task Structure Reverses Layerwise State Encoding in Sequence Models

Yuhang Jiang

The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…

View →

cs.CLcs.AIRecentMay 27, 2026

Measuring Form and Function in Language Models

Héctor Javier Vázquez Martínez, Charles Yang

The paper introduces a new quantitative metric, Contextual Alternative Choice (CAC), to rigorously test language models' syntactic and functional understanding of determiners, showing that current mod…

View →

cs.CLRecentJun 2, 2026

Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

Mutsumi Sasaki, Go kamoda, Ryosuke Takahashi, Kosuke Sato +3 more

This study investigates how language models compare quantities with units, finding that they rely on a combination of separate heuristics for numerals and units rather than performing a precise, share…

View →

cs.AIcs.CLcs.HCRecentMay 31, 2026

Relational Intervention During Functional Collapse in Large Language Models: A Lexical-Statistical Ablation and a Structure x Register Factorial

Franco Santana, Horacio Vico

The study finds that for a relational intervention to successfully restore a language model's behavior after functional collapse, both a relational structure (e.g., acknowledgment) and a first-person…

View →

cs.CLcs.AIRecentMay 30, 2026

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Hyundong Jin, Yo-Sub Han

The paper proposes EPIC, an efficient and parallel decoding framework that significantly speeds up the process of constraining diffusion language model outputs using Context-Free Grammars (CFG).

View →

cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →

cs.CLcs.AIRecentMay 30, 2026

Linguistics-Aware Non-Distortionary LLM Watermarking

Shinwoo Park, Hyejin Park, Hyeseon An, Yo-Sub Han

The paper introduces LUNA, a linguistically adaptive watermarking technique that achieves high detection accuracy across diverse languages while maintaining minimal text distortion, outperforming exis…

View →

cs.CRRecentApr 4, 2026

Perceptual Gaps: ASCII Art and Overlapping Audio as CAPTCHA

Choon-Hou Rafael Chong

The paper proposes two novel CAPTCHA types—ASCII art and overlapping audio—and demonstrates that current frontier LLMs struggle significantly to solve them, suggesting they are highly effective anti-b…

View →

cs.AIRecentMay 30, 2026

Threshold-Based Exclusive Batching for LLM Inference

Weifang Zhang, Yuzhou Nie, Bowen Pang, Guangrui Ma +1 more

This paper proposes a hybrid scheduler that dynamically switches between exclusive batching and mixed batching for LLM inference, achieving superior throughput, especially on bandwidth-constrained GPU…

View →

cs.CLcs.AIRecentMay 31, 2026

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah +4 more

The paper introduces Hybrid Verified Decoding, a method that predicts the acceptance length of a cache draft to intelligently select between cache verification and model-based drafting, achieving sign…

View →

cs.CLRecentMay 29, 2026

How Far Do Auto-Interpretation Labels Generalize: A Controlled Study Across Languages, Scripts, and Rewordings

Sripad Karne

The study investigates the generalization of auto-generated natural-language labels for language model features, finding that while the underlying features show cross-lingual semantic consistency, the…

View →