Papers similar to 2605.27864

~ similar to 2605.27864· 20 results

cs.AIcs.CERecentJun 1, 2026

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita +3 more

The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and e…

View →

cs.AIq-fin.TRRecentMay 27, 2026

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Taojie Zhu, Wentao Zhao, Rui Sun, Beidi Luan +6 more

The paper introduces KTD-Fin, a novel benchmark that evaluates LLM trading agents by masking historical market data and decomposing returns, finding that LLM agents' profits are largely due to passive…

View →

cs.CLcs.CERecentMay 27, 2026

FinBoardBench: Benchmarking Dynamic Wealth Management and Strategic Financial Reasoning of LLMs via Board Game Simulations

Xuesi Hu, Peng Wang, Jinpeng Miao, Xilin Tao +6 more

The paper introduces FinBoardBench, a novel evaluation suite using financial board games to demonstrate that current LLMs, despite strong static reasoning, fail at complex, dynamic wealth management a…

View →

cs.CRRecentMay 28, 2026

When AI Meets Wall Street: A Survey on Trustworthy AI in Fintech

Qingwen Zeng, Zhenghao Zhao, Yitian Yang, Yiqi Zhu +5 more

This paper proposes a unified, lifecycle-centric framework and a detailed taxonomy to survey and analyze novel, finance-specific attack surfaces and vulnerabilities in AI systems used within the finan…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.SEcs.AIcs.CLRecentMay 29, 2026

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more

The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…

View →

cs.AIq-fin.PMRecentMay 27, 2026

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Yuxuan Zhao, Sijia Chen, Ningxin Su

The paper introduces PortBench, a comprehensive benchmark that evaluates LLMs for portfolio management by assessing both correlation awareness and performance across a full, multi-stage decision pipel…

View →

cs.AIcs.CRRecentMay 27, 2026

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Jay Yu, Amy Zhao, Danning Sui

The paper analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments are heterogeneous, lack clear autonomous execution, and exhibit poor risk…

View →

cs.AIcs.CRRecentMay 27, 2026

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Jay Yu, Amy Zhao, Danning Sui

The paper empirically analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments lack robust autonomous execution and exhibit poor risk-adjust…

View →

cs.AIRecentMay 27, 2026

AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models

Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more

The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…

View →

cs.HCcs.AIRecentMay 29, 2026

Extending the UXR Point of View Pyramid: A Generative AI-Augmented Methodology for Human-Centred AI Systems

Festus Fatai Adedoyin, Huseyin Dogan, Melike Akca, Abiodun Adedeji

The paper extends the User Experience Research (UXR) Points of View (PoV) framework into an AI-augmented methodology specifically designed for guiding the development and governance of high-stakes, hu…

View →

q-fin.PMcs.AIRecentMay 29, 2026

Regime-Adaptive Continual Learning for Portfolio Management

Chaofan Pan, Lingfei Ren, Linbo Xiong, Yonghao Li +2 more

The paper proposes ReCAP, a novel continual learning framework for portfolio management, which adaptively combines policies from a library based on detected market regimes to achieve superior long-ter…

View →

cs.CRcs.AIcs.CLRecentApr 29, 2026

LATTICE: Evaluating Decision Support Utility of Crypto Agents

Aaron Chan, Tengfei Li, Tianyi Xiao, Angela Chen +2 more

The paper introduces LATTICE, a novel benchmark for evaluating how well crypto agents assist user decision-making, finding that different agents excel in different specific areas rather than having a…

View →

cs.CLcs.AIRecentMay 31, 2026

TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning

Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more

The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…

View →

cs.MAcs.AIRecentMay 29, 2026

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Tarun Kota

The paper evaluates multi-agent LLM oracle systems for prediction market resolution, finding that independent aggregation with confidence-weighted voting significantly outperforms single-model baselin…

View →

cs.CRRecentMay 9, 2026

Toward Web 4.0: Bidirectional Trust between AI Agents and Blockchain

Yunfeng Xia, Chao Li, Lei Li, Chenhao Zhang +3 more

The paper systematizes the interaction between autonomous AI agents and blockchain platforms using a bidirectional trust framework, identifying significant gaps in current standards and proposing a ta…

View →

cs.AIcs.CLcs.LGRecentMay 29, 2026

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more

COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.

View →

cs.CLRecentMay 30, 2026

Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more

The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…

View →

cs.AIcs.CLRecentJun 1, 2026

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Yiheng Shu, Bernal Jiménez Gutiérrez, Saisri Padmaja Jonnalagedda, Yuguang Yao +2 more

The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…

View →

cs.CRcs.CYRecentMar 25, 2026

Infrastructure for Valuable, Tradable, and Verifiable Agent Memory

Mengyuan Li, Lei Gao, Haoxuan Xu, Jiate Li +4 more

The paper proposes an infrastructure, clawgang and meowtrade, to transform private, non-transferable agent memories into verifiable, tradable economic commodities.

View →