Xiang Wang
11 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more severe than the underlying language models alone.
The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.
The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangerous over-generalization during reflection.
The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks than on objective factual knowledge.
The paper introduces PokerSkill, a novel framework that successfully enables Large Language Models (LLMs) to play expert-level poker by grounding their choices using human-designed, rule-based poker skills, achieving competitive performance without requiring specialized training or complex solvers.
The paper proposes TRACE, a trajectory risk-aware compression method, to effectively aggregate sparse and delayed safety evidence across long agent trajectories, achieving state-of-the-art performance on multiple safety benchmarks.
The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppressing harmful stylistic biases.
The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to encode visual evidence.
The paper introduces MiCU, a domain-specific LLM that significantly improves smart home command understanding, especially for ambiguous commands, by synthesizing training data and optimizing the model for efficiency.
The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little consistent aggregate improvement.
The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.
Papers
OneReason Technical Report
OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more
The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…