Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Kai Wang

Kai Wang

11 indexed papers

Recent (6 mo)
11
With code
0
Influential cites
0
Benchmarked
0

Publications per year

11
26

Top categories

NLP×6Crypto×6AI×5Vision×4ML×4Multiagent×1

Frequent co-authors

Chenkai Wang2×
Gang Wang2×
Zeming Wei2×
Hezhen Hu1×
Wangbo Zhao1×
Lanqing Guo1×

Research Timeline

2026
CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training

The paper introduces ReCAP, a native GUI agent that significantly improves CAPTCHA solving success (from 30% to 80%) by integrating specialized CAPTCHA capabilities into a general-purpose, end-to-end vision-language model.

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

The paper introduces Salami Slicing Risk, a novel multi-turn jailbreak technique that accumulates harmful intent through numerous low-risk inputs, achieving state-of-the-art attack success rates against major LLMs.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

This paper demonstrates a novel attack against the shuffling defense used in secure Transformer inference, showing that randomly permuted activations can still be exploited to recover model weights.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rather than just malicious user prompts.

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while maintaining clean generation quality.

CCLab: Adversarial Testing of Learning- and Non-Learning-Based Congestion Controllers

The paper introduces CCLab, an adversarial testing framework, to systematically evaluate the robustness of both learning-based and traditional congestion controllers, finding that learning-based controllers are generally more robust and can be further improved using adversarial traces.

The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning

The paper proposes a dual-interventional framework to characterize how linguistic structures and contextual cues influence LLMs' spatial reasoning for navigation, finding that topological information is crucial, while semantic details can be unreliable.

Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR

The paper introduces EASE, a method that enhances multimodal Reinforcement Learning with Verifiable Rewards (RLVR) by providing spatial attention supervision anchored to visual evidence, significantly improving visual grounding and reasoning capabilities in VLMs.

PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining

The paper introduces PMC-InterCPT, a refined biomedical interleaved corpus that enhances multimodal continued pretraining by integrating figure-referencing body text alongside captions, leading to improved medical and general multimodal model performance.

HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image

HumanNOVA introduces a photorealistic, universal, and rapid model capable of generating high-quality 3D human avatars from a single input RGB image.

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

Highlighted terms show continued research focus across papers

Papers

cs.CVRecentJun 1, 2026

HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image

Hezhen Hu, Wangbo Zhao, Lanqing Guo, Hanwen Jiang +5 more

HumanNOVA introduces a photorealistic, universal, and rapid model capable of generating high-quality 3D human avatars from a single input RGB image.

View →
cs.CLRecentJun 1, 2026

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

View →
cs.CLRecentMay 31, 2026

PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining

Guanghao Zhu, Zeyu Liu, Zhitian Hou, Pengkai Wang +8 more

The paper introduces PMC-InterCPT, a refined biomedical interleaved corpus that enhances multimodal continued pretraining by integrating figure-referencing body text alongside captions, leading to imp…

View →
cs.CLcs.AIRecentMay 29, 2026

The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning

Xudong Zhang, Jian Yang, Shengkai Wang, Jiangpeng Tian +4 more

The paper proposes a dual-interventional framework to characterize how linguistic structures and contextual cues influence LLMs' spatial reasoning for navigation, finding that topological information…

View →
cs.CVcs.CLRecentMay 29, 2026

Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR

Ruina Hu, Chen Wang, Lai Wei, Jionghao Bai +4 more

The paper introduces EASE, a method that enhances multimodal Reinforcement Learning with Verifiable Rewards (RLVR) by providing spatial attention supervision anchored to visual evidence, significantly…

View →
cs.CRcs.LGRecentMay 21, 2026

CCLab: Adversarial Testing of Learning- and Non-Learning-Based Congestion Controllers

Zhi Chen, Shehab Sarar Ahmed, Chenkai Wang, Brighten Godfrey +1 more

The paper introduces CCLab, an adversarial testing framework, to systematically evaluate the robustness of both learning-based and traditional congestion controllers, finding that learning-based contr…

View →
cs.CRcs.LGRecentMay 19, 2026

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more

The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…

View →
cs.CRcs.AIcs.CLRecentMay 12, 2026

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more

The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…

View →
cs.CRcs.AIRecentMay 6, 2026

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu +5 more

This paper demonstrates a novel attack against the shuffling defense used in secure Transformer inference, showing that randomly permuted activations can still be exploited to recover model weights.

View →
cs.CRcs.AIcs.CLRecentApr 13, 2026

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Yihao Zhang, Kai Wang, Jiangrong Wu, Haolin Wu +6 more

The paper introduces Salami Slicing Risk, a novel multi-turn jailbreak technique that accumulates harmful intent through numerous low-risk inputs, achieving state-of-the-art attack success rates again…

View →
cs.CRcs.AIcs.CVRecentMar 23, 2026

CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training

Yuxi Chen, Haoyu Zhai, Chenkai Wang, Rui Yang +3 more

The paper introduces ReCAP, a native GUI agent that significantly improves CAPTCHA solving success (from 30% to 80%) by integrating specialized CAPTCHA capabilities into a general-purpose, end-to-end…

View →