ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “Structure vs content”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLcs.AIRecentMay 29, 2026

Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions

Wesley Scivetti, Ethan Wilcox, Nathan Schneider, Kanishka Misra +1 more

The paper investigates whether modestly sized open-source language models can grasp the semantics of rare Paired-Focus constructions, finding that understanding emerges later in training and correlate…

View →
cs.AIRecentMay 31, 2026

AnyEdit++: Adaptive Long-Form Knowledge Editing via Bayesian Surprise

Bowen Tian, Caixue He, Jiemin Wu, Jingying Wang +3 more

AnyEdit++ introduces a structure-aware framework that uses Bayesian Surprise to adaptively segment long-form knowledge, significantly improving the coherence and accuracy of knowledge editing in LLMs.

View →
cs.IRcs.AIcs.CLRecentMay 29, 2026

On the impact of retrieved content representations in RAG Pipelines

Jonathan J Ross, Bevan Koopman, Anton van der Vegt, Guido Zuccon

The paper systematically compares multiple content representations for RAG pipelines and finds that answer retention—the ability of the representation to preserve the original answer-bearing content—i…

View →
cs.AIcs.CLRecentMay 28, 2026

Demystifying Data Organization for Enhanced LLM Training

Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more

This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…

View →
cs.CLRecentMay 29, 2026

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…

View →
cs.CLcs.AIcs.LGRecentJun 1, 2026

Multilinguality of Large Language Models From a Structural Perspective

Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

This paper analyzes the multilinguality of LLMs by examining their structural properties, finding that low-resource languages are structurally more distinct from English than high-resource languages,…

View →
cs.IRcs.CLcs.HCEmpiricalRecentJun 10, 2026

Factions Within, Uncertain Across: Within-Document Reader Sub-Groups in Social Highlighting

Kazuki Nakayashiki, Keisuke Watanabe

This paper investigates whether a group of people highlighting the same document forms a single consensus or is internally structured into reader sub-groups.

View →
cs.CLRecentMay 29, 2026

Language Models Can Resolve Reference Compositionally, But It's Not Their Native Strength: The Case of the Personal Relation Task

Bart Evelo, Meaghan Fowlie, Denis Paperno

The paper investigates compositional abilities in LLMs and humans using the Personal Relation Task, finding that LLMs excel at the structured (Intensional) task while humans are better at the real-wor…

View →
cs.IRcs.CLDatasetRecentJun 9, 2026

A PubMed-Scale Dataset of Structured Biomedical Abstracts

Chia-Hsuan Chang, Haerin Song, Brian Ondov, Hua Xu

The authors introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database.

View →
cs.LGcs.AIRecentMay 27, 2026

Learning Compositional Latent Structure with Vector Networks

Niclas Pokel, Benjamin F. Grewe

The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…

View →
cs.SDcs.AIRecentMay 29, 2026

AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing

Chih-Heng Chang, Keng-Seng Ho, Chih-Yu Tsai, Kuan-Lin Chen +2 more

AnchorSteer introduces a framework that achieves high-fidelity, structure-preserving music editing by decoupling semantic concept injection from structural constraints.

View →
cs.CLRecentMay 30, 2026

ProtStructQA: A Denotation Threshold in Protein Structural Reasoning

Aravind Mandiga, Guoming Li, Jin Lu, Ismailcem Budak Arpinar +2 more

The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent…

View →
cs.CLRecentMay 31, 2026

Worlds Within Words: Translating Culture in Ancient Chinese Texts with Multi-Agent Coordination

Xiaoqi He, Kaixin Lan, Mu You, Tao Fang +2 more

The paper proposes MACAT, a Multi-Agent Culture-Aware Translation framework, to selectively translate culture-loaded words in ancient Chinese texts, achieving superior performance over existing method…

View →
cs.CLRecentMay 28, 2026

COMPOSE: Composing Future Theorems from Citations and Formal Structure

David Busbib, Michael Werman

The paper introduces COMPOSE, a dual-graph framework that generates plausible future mathematical theorems by simultaneously conditioning a language model on both the scientific citation context and t…

View →
cs.CLcs.AIRecentMay 27, 2026

The Attentional White Bear Effect in Transformer Language Models

Rebecca Ramnauth, Brian Scassellati

The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…

View →
cs.CLRecentMay 30, 2026

Chunking Methods on Retrieval-Augmented Generation - Effectiveness Evaluation Against Computational Cost and Limitations

Mateusz Śmigielski, Michał Rajkowski, Mateusz Zbrocki, Michał Bernacki-Janson +4 more

This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…

View →
cs.CLcs.AIcs.LGRecentMay 30, 2026

Short-form Text Rewriting with Phi Silica

Divya Tadimeti, Shawn Pan, Sameera Lanka, Chenghui Zhou +1 more

This paper demonstrates that targeted adaptation of the small language model Phi Silica, using dataset curation and fine-tuning, significantly improves its performance in short-form text rewriting, na…

View →
cs.CVcs.AIcs.LGRecentMay 29, 2026

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…

View →
cs.CLcs.AIRecentJun 1, 2026

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca

The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…

View →
cs.CLcs.IRRecentJun 3, 2026

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

Zhenyu Yu, Shuigeng Zhou

This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.

View →