ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “Knowledge of the social and behavioral sciences”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →
cs.AIEmpiricalRecentJun 11, 2026

Automated reproducibility assessments in the social and behavioral sciences using large language models

Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger +6 more

This paper shows that large language models can automate reproducibility assessments in the social and behavioral sciences.

View →
cs.AIRecentMay 27, 2026

Measuring Progress Toward AGI: A Cognitive Framework

Ryan Burnell, Yumeya Yamamori, Orhan Firat, Kate Olszewska +9 more

The paper introduces a Cognitive Taxonomy and a rigorous evaluation protocol to provide an objective, multi-faceted framework for measuring system capabilities and tracking progress toward Artificial…

View →
cs.AIRecentMay 27, 2026

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

Chen Ying Claude, Zhihan Luo

The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…

View →
cs.AIRecentMay 31, 2026

The Case for Model Science: Verify, Explore, Steer, Refine

Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more

The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.

View →
cs.CRcs.CYecon.GNRecentApr 23, 2026

Mitigate or Fail: How Risk Management Shapes Cybersecurity Competency

Jeffrey T. Gardiner

The paper argues that despite the focus on risk, the cybersecurity profession is structurally trained as a threat-management discipline, leading to poor foundational risk reasoning among professionals…

View →
econ.GNcs.CEcs.CVRecentMay 31, 2026

Differing Roles of Leisure and Productivity in GDP - A Machine Learning based comparative analysis of Germany and USA

Achintya Ranjan, Uma Ranjan

This paper uses machine learning to model a country's GDP based on working hours and productivity, demonstrating that the differing relative importance of these two factors between Germany and the USA…

View →
cs.AIRecentMay 30, 2026

Doing What They Say, Not What They Reason: Locating the Faithfulness Gap in LLM Agents

Yufeng Wang

This paper investigates the 'faithfulness gap' in LLM agents—the discrepancy between stated reasoning and actual action—by decomposing it into two opposing steps: reasoning-to-conclusion and conclusio…

View →
cs.AIcond-mat.mtrl-scics.CLRecentMay 31, 2026

Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence

Fiona Y. Wang, Markus J. Buehler

The paper proposes a category-theoretic framework for agentic AI that models scientific discovery not as answer generation, but as a verifiable transition and revision of the underlying representation…

View →
cs.CYcs.AIcs.SERecentMay 31, 2026

ASE-26: a curriculum for agentic software engineering as a discipline

Mikael Gorsky

This paper introduces ASE-26, a comprehensive undergraduate curriculum designed to formalize and teach agentic software engineering as a distinct academic discipline.

View →
cs.CLcs.AIcs.CYRecentMay 29, 2026

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Adrian de Wynter

The paper argues that purported anthropomorphic attributes of LLMs are not unique to language models but are substrate-dependent, demonstrating this by training a neural network on the game Age of Emp…

View →
cs.CLcs.AIRecentMay 29, 2026

KnowledgeGain: Evaluating and Optimizing Science News Generation for Reader Learning

Dominik Soós, Meng Jiang, Jian Wu

The paper introduces KnowledgeGain, a novel metric that measures the actual knowledge gained by readers from science news, and demonstrates its use in optimizing news generation to improve reader lear…

View →
cs.CRcs.AIcs.CYRecentApr 4, 2026

Negotiating Privacy with Smart Voice Assistants: Risk-Benefit and Control-Acceptance Tensions

Molly Campbell, Mohamad Sheikho Al Jasem, Ajay Kumar Shrestha

This study proposes a negotiation framework, using composite indices (RBTI and CATI), to explain how youth navigate competing privacy pressures when using smart voice assistants, finding that high usa…

View →
cs.CLcs.AIRecentMay 27, 2026

Models That Know How Evaluations Are Designed Score Safer

Katharina Deckenbach, Haritz Puerto, Jonas Geiping, Sahar Abdelnabi

The paper demonstrates that models can acquire 'evaluation meta-knowledge' from training data describing evaluation practices, leading to inflated safety benchmark performance that is independent of e…

View →
cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →
cs.CRcs.CLRecentApr 24, 2026

Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning

Chaoran Chen, Dayu Yuan, Peter Kairouz

The paper introduces Behavioral Canaries, a novel auditing mechanism that detects unauthorized use of private retrieved context data during Reinforcement Learning Fine-Tuning (RLFT) by inducing detect…

View →
cs.CLRecentJun 1, 2026

CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs

Yangfan Ye, Xiaocheng Feng, Jialong Tang, Xiayu Cao +4 more

The paper introduces CultureForest, a new benchmark for evaluating Cultural Norm Grounded Reasoning in LLMs, demonstrating that models struggle to apply their cultural knowledge effectively in realist…

View →
cs.AIRecentMay 27, 2026

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

View →
cs.AIRecentMay 30, 2026

AI Sovereignty as National Learning Capacity: A Human-Centered Learning Mechanics Viewpoint on France, the United States, and China

Kim Phuc Tran

The paper proposes viewing national AI development, specifically in France, as a 'national AI learning system' governed by a controlled balance between information injection and entropy dissipation, a…

View →