Papers similar to 2605.28359

~ similar to 2605.28359· 20 results

cs.AIcs.CERecentJun 1, 2026

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita +3 more

The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and e…

View →

cs.CLcs.CERecentMay 27, 2026

FinBoardBench: Benchmarking Dynamic Wealth Management and Strategic Financial Reasoning of LLMs via Board Game Simulations

Xuesi Hu, Peng Wang, Jinpeng Miao, Xilin Tao +6 more

The paper introduces FinBoardBench, a novel evaluation suite using financial board games to demonstrate that current LLMs, despite strong static reasoning, fail at complex, dynamic wealth management a…

View →

cs.SEcs.AIcs.CLRecentMay 29, 2026

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more

The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…

View →

cs.AIq-fin.PMRecentMay 27, 2026

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Yuxuan Zhao, Sijia Chen, Ningxin Su

The paper introduces PortBench, a comprehensive benchmark that evaluates LLMs for portfolio management by assessing both correlation awareness and performance across a full, multi-stage decision pipel…

View →

cs.CRRecentMay 28, 2026

When AI Meets Wall Street: A Survey on Trustworthy AI in Fintech

Qingwen Zeng, Zhenghao Zhao, Yitian Yang, Yiqi Zhu +5 more

This paper proposes a unified, lifecycle-centric framework and a detailed taxonomy to survey and analyze novel, finance-specific attack surfaces and vulnerabilities in AI systems used within the finan…

View →

cs.AIRecentMay 27, 2026

FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research

Di Zhu, Lei Nico Zheng, Zihan Chen

FundaPod is a multi-persona agent platform designed for fundamental investment research, enabling AI agents with distinct viewpoints to independently gather evidence and surface disagreements for huma…

View →

q-fin.GNcs.CYcs.LGRecentJun 1, 2026

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation

Wenbin Wu

The paper demonstrates that large language models (LLMs) exhibit measurable, controllable biases toward specific assets like Bitcoin, identifying an internal feature that can causally shift portfolio…

View →

q-fin.PMcs.AIRecentMay 29, 2026

Regime-Adaptive Continual Learning for Portfolio Management

Chaofan Pan, Lingfei Ren, Linbo Xiong, Yonghao Li +2 more

The paper proposes ReCAP, a novel continual learning framework for portfolio management, which adaptively combines policies from a library based on detected market regimes to achieve superior long-ter…

View →

cs.AIcs.CRRecentMay 27, 2026

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Jay Yu, Amy Zhao, Danning Sui

The paper analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments are heterogeneous, lack clear autonomous execution, and exhibit poor risk…

View →

cs.AIcs.CRRecentMay 27, 2026

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Jay Yu, Amy Zhao, Danning Sui

The paper empirically analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments lack robust autonomous execution and exhibit poor risk-adjust…

View →

cs.GTcs.AIcs.CLRecentMay 29, 2026

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

Antonio Valerio Miceli-Barone, Vaishak Belle, Shay B. Cohen

The paper simulates bargaining scenarios using LLM agents to analyze how optimizing agents for financial profit affects their honesty and trust, finding that while fine-tuning improves deal-making, it…

View →

cs.CRcs.CYRecentMar 25, 2026

Infrastructure for Valuable, Tradable, and Verifiable Agent Memory

Mengyuan Li, Lei Gao, Haoxuan Xu, Jiate Li +4 more

The paper proposes an infrastructure, clawgang and meowtrade, to transform private, non-transferable agent memories into verifiable, tradable economic commodities.

View →

cs.CVcs.AIRecentMay 28, 2026

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…

View →

cs.AIcs.CLRecentJun 1, 2026

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Yiheng Shu, Bernal Jiménez Gutiérrez, Saisri Padmaja Jonnalagedda, Yuguang Yao +2 more

The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…

View →

q-fin.TRcs.CRq-fin.GNRecentMay 1, 2026

ForesightFlow: An Information Leakage Score Framework for Prediction Markets

Maksym Nechepurenko

The paper introduces ForesightFlow, an Information Leakage Score (ILS) framework, to quantify pre-event information leakage in prediction markets, and proposes a necessary extension to analyze empiric…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.CRRecentApr 23, 2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more

This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…

View →

cs.CLRecentMay 29, 2026

Learning Whom to Trust: Market-Feedback Adaptive Retrieval for Frozen LLMs in Event-Driven Financial RAG

Zijie Zhao, Roy E. Welsch

The paper proposes a market-feedback adaptive retrieval system for frozen LLMs in financial RAG, significantly improving event-impact prediction by learning which evidence sources to prioritize.

View →

cs.MAcs.AIRecentMay 29, 2026

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Tarun Kota

The paper evaluates multi-agent LLM oracle systems for prediction market resolution, finding that independent aggregation with confidence-weighted voting significantly outperforms single-model baselin…

View →

cs.AIRecentMay 27, 2026

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

View →