~ similar to 2605.31360· 20 results
The paper introduces SAMD, an automated tool that uses STPA-Sec to identify potential false data injection attack scenarios in AI/ML-enabled medical devices during the design phase.
The paper proposes Operational AI Deployment Assurance (OADA), a governance framework that translates complex AI evaluation metrics and operational uncertainties into actionable, deployment-oriented a…
The paper introduces an agentic, framework-based system to transform under-specified academic papers into standardized, comparable, and executable benchmarks for industrial Prognostics and Health Mana…
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…
This study investigates how industrial practitioners perceive and manage data leakage in automotive perception systems, finding that leakage control is a socio-technical coordination problem requiring…
The paper introduces SmartIterator (SI), a visual analytics framework that systematically guides analysts through the complex process of evaluating and understanding how data groupings change across p…
The paper proposes and validates a comprehensive four-layer Zero Trust security architecture designed to mitigate critical vulnerabilities in autonomous AI agents handling Protected Health Information…
Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer +1 more
The paper introduces PRAXIS, a novel algorithm that efficiently approximates the computation of 'Rashomon sets' for decision trees, significantly reducing memory and runtime complexity.
Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more
This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…
Aakash Pant, Kavya Shah, Apoorv Agnihotri, Sneha Nikam +2 more
The paper critiques current AI benchmarking practices for low-resource settings, arguing that evaluation must shift focus from isolated model performance to the holistic performance of the deployed sy…
TabChange proposes a novel framework to generate natural and minimally altered counterfactual instances in tabular data by precisely controlling attribute modifications based on their relationship str…
This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
Styx is a novel framework that enhances data privacy and security in collaborative data processing, such as joint AI training, by integrating sticky policies with Trusted Execution Environments (TEEs)…
The paper introduces Picid, a modular evaluation infrastructure that standardizes and formalizes the entire Prognostics and Health Management (PHM) evaluation pipeline to ensure reproducible and fair…
This paper proposes DeMix, a novel framework for simultaneously diagnosing erroneous samples and their error types in machine learning models.
The paper introduces the Causal Sensitivity Score (CSS), an interventional metric that reveals that standard coverage-based evaluations fail to detect critical responsiveness deficits in clinical LLMs…
Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo +3 more
The paper introduces ClinEnv, a novel interactive, multi-stage benchmark designed to evaluate LLMs' decision-making and information-gathering process during longitudinal inpatient medical simulations.
The paper proposes an end-to-end, deployable blueprint for an in-line machine-vision system that not only inspects carpet defects in real-time but also systematically collects and labels defect data t…
The paper introduces the Hiremath Early Detection (HED) Score, a new measure-theoretic standard that accurately quantifies the time-value of early detection, significantly outperforming traditional me…