~ similar to 2605.29493· 20 results
The paper introduces UA-Legal-Bench, a comprehensive Ukrainian legal reasoning benchmark built from a massive judicial corpus, demonstrating that LLM performance is highly task-dependent and that simp…
The paper proposes a unified evidentiary framework combining cryptographic provenance, statistical watermarking, and zero-knowledge attestation to address the legal challenges posed by synthetic media…
Junyu Lu, Qi Wei, Peishuo Zheng, Jie Zhang +5 more
The paper introduces Prosecution Decision Prediction (PDP), a new legal AI task that assesses prosecutorial review decisions, showing that current state-of-the-art LLMs perform significantly worse on…
The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…
The paper introduces HOPM, a hierarchical online prompt mutation framework that significantly improves the performance of language models in high-stakes evidence document generation by integrating dua…
The authors created ImmigrationQA, a large source-grounded QA dataset for U.S. immigration law, and fine-tuned a small language model (Llama 3.2 3B) on it, achieving a significant performance boost ov…
The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…
Alex Leung, Rex Zhang, Ervin Ling, Kentaroh Toyoda +1 more
This paper maps the emerging insurability frontier of AI risk by coding 55 AI threat classes against 26 insurance products, identifying four tiers of coverage: affirmative, silent, excluded, and outsi…
Ethan Zhao, Maksym Taranukhin, Wei Cui, Moira Aikenhead +1 more
The paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark, and evaluates RAG systems, finding that while open-source models are competitive, automatic evaluations struggle with nuanced…
This paper analyzes top-tier cybersecurity papers to find evidence of generative AI's influence, finding a post-2022 increase in AI-associated marker words and a general drift toward higher lexical co…
The paper evaluates an automated legal triage system (FETCH) that uses follow-up questions, demonstrating that while low-cost LLMs are effective for classification, generating high-quality questions r…
This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das +2 more
This survey proposes a proactive, lifecycle-based framework, utilizing the C5 Interaction Model, to detect emerging adversarial synthetic narratives generated by GenAI, moving beyond traditional react…
Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das +2 more
This survey proposes a proactive, lifecycle-based framework, utilizing the C5 Interaction Model, to detect emerging adversarial synthetic narratives generated by Generative AI, moving beyond tradition…
Shuning Zhang, Eve He, Xiao Zhan, Shijing He +3 more
This paper investigates how Generative AI enables scalable, hyper-realistic fraud in Chinese e-commerce by fabricating product defect evidence, proposing new defense mechanisms like verifiable materia…
The paper introduces a novel guardrail orchestration layer that improves the compliance and efficiency of high-stakes multimodal document generation by scoring multiple generated candidates against we…
The paper empirically investigates whether AI-generated reviews can improve the drafting process of academic papers, finding that AI reviews cover many human-identified issues but also introduce novel…
Yanhui Sun, Wu Liu, Haifeng Ming, Xinru Wang +2 more
The paper introduces CyberJurors, a multi-agent framework and the VerdictBench benchmark to simulate and solve complex e-commerce dispute verdicts by modeling the reasoning and consensus process of cr…
This paper reviews recent EU AI regulatory documents to clarify definitions and synthesize current provisions regarding security, privacy, and autonomous agentic AI.