~ similar to 2604.09056v1· 20 results
The paper introduces the Lean-Agent Protocol, a formal verification platform that uses Lean 4 theorem proving to ensure agentic AI actions in finance are mathematically compliant with complex regulati…
Qingwen Zeng, Zhenghao Zhao, Yitian Yang, Yiqi Zhu +5 more
This paper proposes a unified, lifecycle-centric framework and a detailed taxonomy to survey and analyze novel, finance-specific attack surfaces and vulnerabilities in AI systems used within the finan…
This paper investigates the practical barriers preventing the trustworthy deployment of AI-driven Cyber Threat Intelligence (CTI) in the highly regulated financial sector, identifying four key socio-t…
The study compared the cybersecurity risk assessment capabilities of five popular large language models (LLMs) against human experts, finding that LLMs consistently underestimated risks and require ma…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
Xuesi Hu, Peng Wang, Jinpeng Miao, Xilin Tao +6 more
The paper introduces FinBoardBench, a novel evaluation suite using financial board games to demonstrate that current LLMs, despite strong static reasoning, fail at complex, dynamic wealth management a…
The paper proposes a novel nine-dimension risk assessment framework for institutional DeFi adoption, significantly enhancing existing methodologies by incorporating novel dimensions like composability…
The paper demonstrates that large language models (LLMs) exhibit measurable, controllable biases toward specific assets like Bitcoin, identifying an internal feature that can causally shift portfolio…
The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…
Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more
The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
This paper introduces Swiss-Bench 003, an expanded evaluation framework assessing LLM reliability and adversarial security across eight dimensions using 808 Swiss-specific items, revealing that self-g…
The paper proposes CyberAId, a hybrid multi-agent system designed to enhance cybersecurity for financial institutions by integrating specialized LLM subagents with existing SIEM/XDR telemetry, address…
The paper introduces an LLM-based framework that uses vulnerability-specific prompting and a large-scale dataset to achieve high-precision, scalable detection of multiple smart contract vulnerabilitie…
Bowen Cai, Weiheng Bai, Hangyun Tang, Youshui Lu +1 more
The paper introduces FAUDITOR, a specialized, self-learning fuzzer that detects complex Monetarily Exploitable Vulnerabilities (MEVuls) in smart contracts by integrating NLP-processed auditor knowledg…
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more
This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…
Qian'ang Mao, Jiaxin Wang, Ya Liu, Li Zhu +2 more
The paper develops a unified, cross-layer security framework for autonomous LLM agents operating in agentic commerce, identifying key attack vectors and proposing a layered defense architecture.
The paper introduces SecLens-R, a multi-stakeholder evaluation framework, demonstrating that LLM performance for vulnerability detection varies significantly depending on the specific priorities (e.g.…
SCAFDS introduces a novel, seven-stage graph attention system that models fraud propagation using co-occurrence edge features and generates forensically traceable SAR narratives, significantly improvi…