Papers similar to 2605.27896

~ similar to 2605.27896· 20 results

cs.AIq-fin.PMRecentMay 27, 2026

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

The paper introduces PortBench, a comprehensive benchmark that evaluates LLMs for portfolio management by assessing both correlation awareness and performance across a full, multi-stage decision pipel…

View →

cs.SEcs.AIcs.CLRecentMay 29, 2026

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more

The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…

View →

cs.AIq-fin.TRRecentMay 27, 2026

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Taojie Zhu, Wentao Zhao, Rui Sun, Beidi Luan +6 more

The paper introduces KTD-Fin, a novel benchmark that evaluates LLM trading agents by masking historical market data and decomposing returns, finding that LLM agents' profits are largely due to passive…

View →

cs.AIcs.CERecentJun 1, 2026

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita +3 more

The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and e…

View →

cs.AIRecentMay 28, 2026

FinVerBench: Benchmark Validity and Calibration in Large Language Model Financial Statement Verification

Silu Panda

The paper introduces FinVerBench, a comprehensive benchmark for financial statement verification, concluding that successful verification requires calibrated judgment under realistic observational con…

View →

cs.AIRecentMay 27, 2026

FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research

Di Zhu, Lei Nico Zheng, Zihan Chen

FundaPod is a multi-persona agent platform designed for fundamental investment research, enabling AI agents with distinct viewpoints to independently gather evidence and surface disagreements for huma…

View →

cs.CRRecentMay 28, 2026

When AI Meets Wall Street: A Survey on Trustworthy AI in Fintech

Qingwen Zeng, Zhenghao Zhao, Yitian Yang, Yiqi Zhu +5 more

This paper proposes a unified, lifecycle-centric framework and a detailed taxonomy to survey and analyze novel, finance-specific attack surfaces and vulnerabilities in AI systems used within the finan…

View →

cs.AIRecentMay 28, 2026

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more

The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…

View →

cs.AIcs.GTRecentMay 28, 2026

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

Boning Li, Baoxiang Wang, Longbo Huang

The paper introduces PokerSkill, a novel framework that successfully enables Large Language Models (LLMs) to play expert-level poker by grounding their choices using human-designed, rule-based poker s…

View →

cs.CVcs.AIRecentMay 28, 2026

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…

View →

cs.CRcs.CERecentApr 10, 2026

Conversations Risk Detection LLMs in Financial Agents via Multi-Stage Generative Rollout

Xiaotong Jiang, Jun Wu

The paper proposes FinSec, a novel four-tier security detection framework, to robustly identify complex financial risks and suspicious dialogue patterns in LLM-powered financial agents, achieving stat…

View →

cs.LOcs.AIcs.CRRecentApr 1, 2026

Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving

Devakh Rashie, Veda Rashi

The paper introduces the Lean-Agent Protocol, a formal verification platform that uses Lean 4 theorem proving to ensure agentic AI actions in finance are mathematically compliant with complex regulati…

View →

cs.CRcs.AIcs.CLRecentApr 29, 2026

LATTICE: Evaluating Decision Support Utility of Crypto Agents

Aaron Chan, Tengfei Li, Tianyi Xiao, Angela Chen +2 more

The paper introduces LATTICE, a novel benchmark for evaluating how well crypto agents assist user decision-making, finding that different agents excel in different specific areas rather than having a…

View →

q-fin.PMcs.AIRecentMay 29, 2026

Regime-Adaptive Continual Learning for Portfolio Management

Chaofan Pan, Lingfei Ren, Linbo Xiong, Yonghao Li +2 more

The paper proposes ReCAP, a novel continual learning framework for portfolio management, which adaptively combines policies from a library based on detected market regimes to achieve superior long-ter…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.AIcs.CRRecentMay 27, 2026

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Jay Yu, Amy Zhao, Danning Sui

The paper analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments are heterogeneous, lack clear autonomous execution, and exhibit poor risk…

View →

cs.AIcs.CRRecentMay 27, 2026

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

Jay Yu, Amy Zhao, Danning Sui

The paper empirically analyzes the nascent DeFi investment agent market, finding that while token valuations are high, current deployments lack robust autonomous execution and exhibit poor risk-adjust…

View →

q-fin.GNcs.CYcs.LGRecentJun 1, 2026

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation

Wenbin Wu

The paper demonstrates that large language models (LLMs) exhibit measurable, controllable biases toward specific assets like Bitcoin, identifying an internal feature that can causally shift portfolio…

View →

cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →

cs.MAcs.AIRecentMay 29, 2026

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Tarun Kota

The paper evaluates multi-agent LLM oracle systems for prediction market resolution, finding that independent aggregation with confidence-weighted voting significantly outperforms single-model baselin…

View →