~ similar to 2605.27864· 20 results
Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita +3 more
The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and e…
Taojie Zhu, Wentao Zhao, Rui Sun, Beidi Luan +6 more
The paper introduces KTD-Fin, a novel benchmark that evaluates LLM trading agents by masking historical market data and decomposing returns, finding that LLM agents' profits are largely due to passive…
Xuesi Hu, Peng Wang, Jinpeng Miao, Xilin Tao +6 more
The paper introduces FinBoardBench, a novel evaluation suite using financial board games to demonstrate that current LLMs, despite strong static reasoning, fail at complex, dynamic wealth management a…
Qingwen Zeng, Zhenghao Zhao, Yitian Yang, Yiqi Zhu +5 more
This paper proposes a unified, lifecycle-centric framework and a detailed taxonomy to survey and analyze novel, finance-specific attack surfaces and vulnerabilities in AI systems used within the finan…
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more
The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…
The paper introduces PortBench, a comprehensive benchmark that evaluates LLMs for portfolio management by assessing both correlation awareness and performance across a full, multi-stage decision pipel…
The paper analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments are heterogeneous, lack clear autonomous execution, and exhibit poor risk…
The paper empirically analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments lack robust autonomous execution and exhibit poor risk-adjust…
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more
The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…
The paper extends the User Experience Research (UXR) Points of View (PoV) framework into an AI-augmented methodology specifically designed for guiding the development and governance of high-stakes, hu…
Chaofan Pan, Lingfei Ren, Linbo Xiong, Yonghao Li +2 more
The paper proposes ReCAP, a novel continual learning framework for portfolio management, which adaptively combines policies from a library based on detected market regimes to achieve superior long-ter…
Aaron Chan, Tengfei Li, Tianyi Xiao, Angela Chen +2 more
The paper introduces LATTICE, a novel benchmark for evaluating how well crypto agents assist user decision-making, finding that different agents excel in different specific areas rather than having a…
Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more
The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…
The paper evaluates multi-agent LLM oracle systems for prediction market resolution, finding that independent aggregation with confidence-weighted voting significantly outperforms single-model baselin…
Yunfeng Xia, Chao Li, Lei Li, Chenhao Zhang +3 more
The paper systematizes the interaction between autonomous AI agents and blockchain platforms using a bidirectional trust framework, identifying significant gaps in current standards and proposing a ta…
Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more
COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.
Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more
The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…
The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…
Mengyuan Li, Lei Gao, Haoxuan Xu, Jiate Li +4 more
The paper proposes an infrastructure, clawgang and meowtrade, to transform private, non-transferable agent memories into verifiable, tradable economic commodities.