Xiang Zhang
8 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a Spatiotemporal-Aware Fault Injection (STAFI) framework to efficiently locate and time critical bit-flip vulnerabilities in DNNs used for ADAS, significantly improving fault detection compared to existing methods.
The paper introduces AgentWard, a lifecycle-oriented, defense-in-depth architecture designed to systematically secure autonomous AI agents by protecting them across all stages of their operation.
This paper introduces the Relay Tampering Attack (RTA), demonstrating that malicious third-party relays can undermine the security of LLM agents by modifying responses post-alignment, even if the LLM itself is perfectly aligned.
The paper distinguishes between a model's ability to generate useful updates for external agent components (harness-updating) and its ability to benefit from those updates (harness-benefit), finding that updating capabilities are surprisingly uniform while benefit is maximized in mid-tier models.
The paper introduces Agent-Radar, a training-free method that dynamically steers multi-agent attention toward relevant context using a novel decay mechanism, significantly improving performance in long-running LLM conversations.
The paper analyzes observation masking in long-horizon search agents, finding that its effectiveness depends on a complex interaction between the model's capacity and the retriever's strength, exhibiting an inverted-U shaped gain.
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
Papers
WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.