Papers similar to 2606.01737

~ similar to 2606.01737· 20 results

cs.CLRecentMay 28, 2026

CanLegalRAGBench: Evaluating Retrieval-Augmented Generation on Canadian Case Law

Ethan Zhao, Maksym Taranukhin, Wei Cui, Moira Aikenhead +1 more

The paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark, and evaluates RAG systems, finding that while open-source models are competitive, automatic evaluations struggle with nuanced…

View →

cs.CLcs.AIcs.MARecentMay 27, 2026

LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning

Zerui Chen, Qinggang Zhang, Zhishang Xiang, Zhimin Wei +4 more

LegalGraphRAG introduces a multi-agent, hierarchical graph retrieval-augmented generation framework to overcome the limitations of traditional RAG in legal domains, achieving state-of-the-art reliable…

View →

cs.CLcs.AIcs.CVRecentJun 1, 2026

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more

The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…

View →

cs.CVcs.LGeess.IVRecentJun 3, 2026

An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

Gandhimathi Padmanaban, Fred Feng

This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.

View →

cs.CRcs.AIcs.MMRecentApr 9, 2026

Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark

Longgang Zhang, Xiaowei Fu, Fuxiang Huang, Lei Zhang

The paper introduces a new benchmark (BGTD) and a multimodal framework (mmTraffic) that enables explainable, evidence-grounded interpretation of encrypted network traffic using LLMs.

View →

cs.CRcs.CLRecentApr 17, 2026

TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts

Hua-Rong Chu, Kuan-Chun Wang, Yao-Te Huang

The paper introduces TWGuard, a linguistic context-optimized safety guardrail model, demonstrating that tailoring AI safety mechanisms to specific local linguistic contexts significantly improves perf…

View →

cs.CVRecentJun 1, 2026

Vision-language Models for Driver Monitoring Systems: A Driver Activity Description Dataset

David J. Lerch, Sarath Mulugurthi, Manuel Martin, Frederik Diederichs +1 more

The paper addresses the difficulty of using general vision-language models (VLMs) for fine-grained driver behavior recognition by creating a new, richly described dataset and demonstrating that fine-t…

View →

cs.CVRecentJun 1, 2026

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai +1 more

The paper proposes a training-free framework, Visual Representation-Guided Video-LLM Reasoning, to perform composed video retrieval by using visual examples and text instructions, achieving strong per…

View →

cs.CLcs.AIRecentMay 28, 2026

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions

Volodymyr Ovcharov

The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…

View →

cs.CLcs.AIRecentMay 27, 2026

The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment

Junyu Lu, Qi Wei, Peishuo Zheng, Jie Zhang +5 more

The paper introduces Prosecution Decision Prediction (PDP), a new legal AI task that assesses prosecutorial review decisions, showing that current state-of-the-art LLMs perform significantly worse on…

View →

cs.AIcs.SIRecentMay 27, 2026

CyberJurors: A Multi-Agent Simulation Task for E-Commerce Disputes Verdict

Yanhui Sun, Wu Liu, Haifeng Ming, Xinru Wang +2 more

The paper introduces CyberJurors, a multi-agent framework and the VerdictBench benchmark to simulate and solve complex e-commerce dispute verdicts by modeling the reasoning and consensus process of cr…

View →

cs.CLRecentMay 29, 2026

LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories

Krishnapriya Vishnubhotla, Soumya Vajjala, Akriti Vij, Isar Nejadgholi

The paper evaluates the inconsistency of using LLMs as automated judges for multi-dimensional safety evaluations, finding that LLMs are unreliable for nuanced safety issues like financial advice but m…

View →

cs.AIRecentMay 27, 2026

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning

Yang Zhang, Xiaoshuai Sun, Rui Zhao, Wujin Sun +4 more

The paper proposes CSMR, a cognitive scheduling framework that allows a language model to dynamically decide when to acquire task-relevant visual evidence, significantly improving multimodal reasoning…

View →

cs.CLRecentMay 31, 2026

Efficient RAG with Intent-Aware Retrieval and Semantics-Preserving Chunking

Fachrina Dewi Puspitasari, Chaoning Zhang, Jiaquan Zhang, Zhicheng Wang +5 more

The paper proposes InSemRAG, an enhanced RAG framework that improves retrieval accuracy and knowledge integrity by incorporating intent-aware retrieval and semantics-preserving chunking, achieving sta…

View →

cs.IRcs.AIcs.MARecentJun 1, 2026

TechGraphRAG: An Agentic Graph-Augmented RAG Framework for Technical Literature Reasoning

Kanwar Bharat Singh

The paper introduces TechGraphRAG, an advanced, agentic RAG framework that enhances technical literature reasoning by integrating multi-step query refinement, external database searching, and knowledg…

View →

cs.CVRecentJun 1, 2026

Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion

DongQing Liu, MengShi Qi, HongWei Ji

The paper proposes a zero-shot reason-then-retrieve pipeline using Qwen3.5-27B to solve the challenging task of composed video retrieval (CoVR-R), achieving high performance on both validation and bli…

View →

cs.CVcs.CRRecentMar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel

KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.

View →

cs.CLcs.AIRecentMay 27, 2026

BenGER: Benchmarking LLM Systems on Subsumption-Based Legal Reasoning in German Law

Sebastian Nagl, Ann-Kristin Mayrhofer, Martin Heidebach, Aleyna Koçak +5 more

The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…

View →

cs.CLRecentJun 1, 2026

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

View →

cs.DCcs.AIcs.CLRecentJun 1, 2026

Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense

Nataraj Agaram Sundar, Tejas Morabia

The paper introduces a novel guardrail orchestration layer that improves the compliance and efficiency of high-stakes multimodal document generation by scoring multiple generated candidates against we…

View →