ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.31360· 20 results

cs.CRRecentMay 28, 2026

SAMD: A Tool for Identifying False Data Injection Scenarios in AI/ML-enabled Medical Devices

Mohammadreza Hallajiyan, Xueren Ge, Athish Pranav Dharmalingam, Gargi Mitra +3 more

The paper introduces SAMD, an automated tool that uses STPA-Sec to identify potential false data injection attack scenarios in AI/ML-enabled medical devices during the design phase.

View →
cs.AIcs.CYRecentMay 27, 2026

Operational AI Deployment Assurance: Governance-State Orchestration Under Threshold-Sensitive Deployment Conditions -- A Governance Framework for High-Stakes AI Systems

Khalid Adnan Alsayed

The paper proposes Operational AI Deployment Assurance (OADA), a governance framework that translates complex AI evaluation metrics and operational uncertainties into actionable, deployment-oriented a…

View →
cs.AIcs.LGcs.SERecentMay 27, 2026

From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence

Raffael Theiler, Ludovico Comito, David Leko, Leandro Von Krannichfeldt +2 more

The paper introduces an agentic, framework-based system to transform under-specified academic papers into standardized, comparable, and executable benchmarks for industrial Prognostics and Health Mana…

View →
cs.AIRecentJun 1, 2026

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more

The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…

View →
cs.CRcs.LGcs.SERecentApr 8, 2026

Data Leakage in Automotive Perception: Practitioners' Insights

Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Andras Balint +1 more

This study investigates how industrial practitioners perceive and manage data leakage in automotive perception systems, finding that leakage control is a socio-technical coordination problem requiring…

View →
cs.HCcs.AIcs.LGRecentMay 27, 2026

SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

Gennady Andrienko, Natalia Andrienko

The paper introduces SmartIterator (SI), a visual analytics framework that systematically guides analysts through the complex process of evaluating and understanding how data groupings change across p…

View →
cs.CRcs.AIRecentMar 18, 2026

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti

The paper proposes and validates a comprehensive four-layer Zero Trust security architecture designed to mitigate critical vulnerabilities in autonomous AI agents handling Protected Health Information…

View →
cs.LGcs.AIRecentMay 29, 2026

From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets

Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer +1 more

The paper introduces PRAXIS, a novel algorithm that efficiently approximates the computation of 'Rashomon sets' for decision trees, significantly reducing memory and runtime complexity.

View →
cs.CRcs.AIcs.LGRecentMay 22, 2026

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more

This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…

View →
cs.AIRecentMay 27, 2026

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

Aakash Pant, Kavya Shah, Apoorv Agnihotri, Sneha Nikam +2 more

The paper critiques current AI benchmarking practices for low-resource settings, arguing that evaluation must shift focus from isolated model performance to the holistic performance of the deployed sy…

View →
cs.LGcs.AIRecentMay 30, 2026

TabChange: Precise Attribute Changes in Tabular Data

Arjun Dahal, Yu Lei, Raghu N. Kacker, Richard Kuhn

TabChange proposes a novel framework to generate natural and minimally altered counterfactual instances in tabular data by precisely controlling attribute modifications based on their relationship str…

View →
cs.AIRecentMay 27, 2026

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

Sandra Woolley, Tim Collins, Khalid Khattak, Illia Chernomorets +2 more

This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…

View →
cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →
cs.CRRecentApr 5, 2026

Styx: Collaborative and Private Data Processing With TEE-Enforced Sticky Policy

Shixuan Zhao, Weicheng Wang, Ninghui Li, Zhiqiang Lin

Styx is a novel framework that enhances data privacy and security in collaborative data processing, such as joint AI training, by integrating sticky policies with Trusted Execution Environments (TEEs)…

View →
cs.AIcs.LGeess.SPRecentMay 27, 2026

Picid: A Modular Evaluation Infrastructure for Reproducible PHM Across Tasks and Domains

Lev Telyatnikov, Raffael Theiler, Leandro Von Krannichfeldt, Olga Fink

The paper introduces Picid, a modular evaluation infrastructure that standardizes and formalizes the entire Prognostics and Health Management (PHM) evaluation pipeline to ensure reproducible and fair…

View →
cs.LGcs.IREmpiricalRecentJun 10, 2026

DeMix: Debugging Training Data with Mixed Data Error Types by Investigating Influence Vectors

Jiale Deng, Yanyan Shen, Xiaogang Shi, Chai Junjun

This paper proposes DeMix, a novel framework for simultaneously diagnosing erroneous samples and their error types in machine learning models.

View →
cs.LGcs.AIcs.CLRecentMay 28, 2026

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

Matt Turk

The paper introduces the Causal Sensitivity Score (CSS), an interventional metric that reveals that standard coverage-based evaluations fail to detect critical responsiveness deficits in clinical LLMs…

View →
cs.AIcs.CLcs.ETRecentJun 1, 2026

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo +3 more

The paper introduces ClinEnv, a novel interactive, multi-stage benchmark designed to evaluate LLMs' decision-making and information-gathering process during longitudinal inpatient medical simulations.

View →
cs.CVcs.AIRecentMay 31, 2026

Data Collection for Training Quality-Control AI in Carpet Manufacturing

Akbar Erkinov

The paper proposes an end-to-end, deployable blueprint for an in-line machine-vision system that not only inspects carpet defects in real-time but also systematically collects and labels defect data t…

View →
stat.MLcs.CRcs.LGRecentApr 5, 2026

The Hiremath Early Detection (HED) Score: A Measure-Theoretic Evaluation Standard for Temporal Intelligence

Prakul Sunil Hiremath

The paper introduces the Hiremath Early Detection (HED) Score, a new measure-theoretic standard that accurately quantifies the time-value of early detection, significantly outperforming traditional me…

View →