Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Yu Wang

Yu Wang

32 indexed papers

Recent (6 mo)
32
With code
0
Influential cites
0
Benchmarked
0

Publications per year

32
26

Top categories

Crypto×18AI×12NLP×7Software Eng.×4Vision×2HCI×2Robotics×1Info Retrieval×1

Frequent co-authors

Haoyu Wang8×
Yanjie Zhao3×
Ruoyu Wang3×
Ziyang Cheng2×
Yanfeng Wang2×
Xiangyu Wang2×

Research Timeline

2026
Root-Cause-Driven Automated Vulnerability Repair

The paper introduces Kumushi, a root-cause-driven patching agent that significantly improves automated vulnerability repair by focusing LLMs on the true source of bugs, outperforming existing methods and matching commercial agents.

ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking

This paper introduces an active traffic analysis method (NATA) and a deep learning framework (BM-Net) to demonstrate that bandwidth perturbations can be used by an adversary to correlate and de-anonymize Tor traffic flows.

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

The paper introduces TurnGate, a response-aware defense mechanism that detects the earliest turn in a multi-turn dialogue where the accumulated interaction enables a harmful action, significantly improving malicious intent detection.

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

This paper introduces Agentic Workflow Injection (AWI), a new class of vulnerability in LLM-powered GitHub Actions, and presents TaintAWI, a novel taint-analysis tool that identifies hundreds of exploitable zero-day vulnerabilities.

Prompt Overflow: What the Guardrail Inspects Is Not What the Model Infers

The paper introduces the Prompt Overflow Attack, demonstrating that guardrail models inspecting truncated or segmented inputs fail to detect malicious instructions that are only actionable when the full, overlong context is provided to the downstream LLM.

How Agentic AI Coding Assistants Become the Attacker's Shell

The paper analyzes how agentic AI coding assistants can be compromised via prompt injection attacks embedded in external artifacts, turning them into unauthorized execution shells for attackers.

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

The paper introduces SafeRx-Agent, a knowledge-grounded multi-agent framework that improves medication recommendation accuracy and safety by incorporating fine-grained ATC codes and rigorous safety verification.

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

VFEAgent is a novel multi-agent framework that automates the entire Finite Element Analysis (FEA) workflow, achieving high success rates in generating complete and physically valid simulations directly from multimodal inputs.

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

Agentic Active Omni-Modal Perception for Multi-Hop Audio-Visual Reasoning

The paper introduces MOV-Bench, a challenging benchmark for multi-hop audio-visual reasoning, and proposes AOP-Agent, an agentic framework that significantly improves open-source Omni-LLMs' ability to perform active cross-modal perception.

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats

The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.

Synthetic Data from Cross-Domain Events for Large-Scale Recommendation Systems

The paper introduces SCALR, a novel framework that generates synthetic user-item interaction data from a source domain to augment a target recommendation domain, significantly improving system performance in A/B tests.

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environments.

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

UniScale proposes a unified framework that jointly optimizes model routing and test-time scaling to achieve a superior, fine-grained quality-cost trade-off for large language model inference.

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.

LaSR: Context-Aware Speech Recognition via Latent Reasoning

The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.

UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

UniD$^3$ is a novel Knowledge Graph-enhanced RAG framework that processes vast biomedical literature to systematically extract, organize, and validate comprehensive drug-disease knowledge, achieving high accuracy in structured data generation.

InsightVQA: High-Dimensional Emotion-Cognitive Visual Question Answering Benchmark

The paper introduces InsightVQA, a large-scale benchmark dataset designed for hierarchical visual question answering that assesses complex emotion understanding and cognitive reasoning beyond simple emotion recognition.

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value Decoupling' phenomenon.

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation

The paper introduces RoboTrustBench, a comprehensive benchmark that evaluates the trustworthiness of video world models for robotic manipulation across challenging scenarios, finding that current models fail in complex reasoning and safety checks.

Highlighted terms show continued research focus across papers

Papers

cs.CVRecentJun 1, 2026

InsightVQA: High-Dimensional Emotion-Cognitive Visual Question Answering Benchmark

Shiyu Wang, Ziyu Liu, Chaoyi Yu, Yujie Yin +5 more

The paper introduces InsightVQA, a large-scale benchmark dataset designed for hierarchical visual question answering that assesses complex emotion understanding and cognitive reasoning beyond simple e…

View →
cs.AIRecentJun 1, 2026

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

Huayi Lai, Shichao Song, Simin Niu, Hanyu Wang +4 more

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value…

View →
cs.CVcs.CLcs.RORecentJun 1, 2026

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation

Huiqiong Li, Jiayu Wang, Zhiting Mei, Anirudha Majumdar +2 more

The paper introduces RoboTrustBench, a comprehensive benchmark that evaluates the trustworthiness of video world models for robotic manipulation across challenging scenarios, finding that current mode…

View →
cs.CLRecentMay 31, 2026

UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

Qing Wang, Tianshi Liu, Minghao Zhou, Jialu Liang +4 more

UniD$^3$ is a novel Knowledge Graph-enhanced RAG framework that processes vast biomedical literature to systematically extract, organize, and validate comprehensive drug-disease knowledge, achieving h…

View →
cs.AIRecentMay 30, 2026

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

Zhuoyu Wang, Junnan Huang, Xinyu Chen

TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.

View →
cs.CLRecentMay 30, 2026

LaSR: Context-Aware Speech Recognition via Latent Reasoning

Heyang Liu, Ziyang Cheng, Jiayi Huang, Wenyang Xiao +4 more

The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.

View →
cs.IRcs.AIRecentMay 29, 2026

Synthetic Data from Cross-Domain Events for Large-Scale Recommendation Systems

Xiangyu Wang, Yawen He, Shivendra Pratap Singh, Han Huang +11 more

The paper introduces SCALR, a novel framework that generates synthetic user-item interaction data from a source domain to augment a target recommendation domain, significantly improving system perform…

View →
cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →
cs.AIcs.CLRecentMay 29, 2026

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

Kaiyu Huang, Xingyu Wang, Mingze Kong, Zhubo Shi +5 more

UniScale proposes a unified framework that jointly optimizes model routing and test-time scaling to achieve a superior, fine-grained quality-cost trade-off for large language model inference.

View →
cs.CLcs.AIRecentMay 27, 2026

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

Xinyu Wang, Hanwei Wu, Zhenghan Tai, Sicheng Lyu +6 more

The paper introduces SafeRx-Agent, a knowledge-grounded multi-agent framework that improves medication recommendation accuracy and safety by incorporating fine-grained ATC codes and rigorous safety ve…

View →
cs.AIcs.CERecentMay 27, 2026

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

Jiachen Zhang, Junyi Lao, Chenghao Liu, Siyuan Liu +4 more

VFEAgent is a novel multi-agent framework that automates the entire Finite Element Analysis (FEA) workflow, achieving high success rates in generating complete and physically valid simulations directl…

View →
cs.AIRecentMay 27, 2026

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

View →
cs.AIRecentMay 27, 2026

Agentic Active Omni-Modal Perception for Multi-Hop Audio-Visual Reasoning

Ke Xu, Yuhao Wang, Ziyang Cheng, Hongcheng Liu +2 more

The paper introduces MOV-Bench, a challenging benchmark for multi-hop audio-visual reasoning, and proposes AOP-Agent, an agentic framework that significantly improves open-source Omni-LLMs' ability to…

View →
cs.AIRecentMay 27, 2026

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats

Xiangyu Wang, Zhiwei Yu, Chengze Du, Dingchang Wang +2 more

The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.

View →
cs.SEcs.CRRecentMay 25, 2026

How Agentic AI Coding Assistants Become the Attacker's Shell

Yue Liu, Yanjie Zhao, Yunbo Lyu, Ting Zhang +2 more

The paper analyzes how agentic AI coding assistants can be compromised via prompt injection attacks embedded in external artifacts, turning them into unauthorized execution shells for attackers.

View →
cs.CRRecentMay 22, 2026

Prompt Overflow: What the Guardrail Inspects Is Not What the Model Infers

Yuanbo Zhou, Changjia Zhu, Junyu Wang, Xu He +4 more

The paper introduces the Prompt Overflow Attack, demonstrating that guardrail models inspecting truncated or segmented inputs fail to detect malicious instructions that are only actionable when the fu…

View →
cs.CRRecentMay 8, 2026

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

Shenao Wang, Xinyi Hou, Zhao Liu, Yanjie Zhao +4 more

This paper introduces Agentic Workflow Injection (AWI), a new class of vulnerability in LLM-powered GitHub Actions, and presents TaintAWI, a novel taint-analysis tool that identifies hundreds of explo…

View →
cs.CRRecentMay 7, 2026

ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking

Zilve Fan, Zijian Zhang, Yangnan Guo, Jiaqi Gao +4 more

This paper introduces an active traffic analysis method (NATA) and a deep learning framework (BM-Net) to demonstrate that bandwidth perturbations can be used by an adversary to correlate and de-anonym…

View →
cs.CLcs.AIcs.CRRecentMay 7, 2026

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang +5 more

The paper introduces TurnGate, a response-aware defense mechanism that detects the earliest turn in a multi-turn dialogue where the accumulated interaction enables a harmful action, significantly impr…

View →
cs.CRcs.SERecentMay 5, 2026

Root-Cause-Driven Automated Vulnerability Repair

Hulin Wang, Zion Leonahenahe Basque, Jie Hu, Ati Priya Bajaj +12 more

The paper introduces Kumushi, a root-cause-driven patching agent that significantly improves automated vulnerability repair by focusing LLMs on the true source of bugs, outperforming existing methods…

View →