Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Xiang Wang

Xiang Wang

11 indexed papers

Recent (6 mo)
11
With code
0
Influential cites
0
Benchmarked
0

Publications per year

11
26

Top categories

AI×10NLP×3Crypto×3Vision×2ML×2Info Retrieval×1Game Theory×1

Frequent co-authors

Garvin Guo2×
Yu Chen2×
Shuai Li2×
Xinpei Zhao2×
Huaxing Liu2×
OneRec Team1×

Research Timeline

2026
A Systematic Security Evaluation of OpenClaw and Its Variants

The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more severe than the underlying language models alone.

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangerous over-generalization during reflection.

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks than on objective factual knowledge.

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

The paper introduces PokerSkill, a novel framework that successfully enables Large Language Models (LLMs) to play expert-level poker by grounding their choices using human-designed, rule-based poker skills, achieving competitive performance without requiring specialized training or complex solvers.

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

The paper proposes TRACE, a trajectory risk-aware compression method, to effectively aggregate sparse and delayed safety evidence across long agent trajectories, achieving state-of-the-art performance on multiple safety benchmarks.

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation

The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppressing harmful stylistic biases.

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to encode visual evidence.

MiCU: End-to-End Smart Home Command Understanding with Large Language Model

The paper introduces MiCU, a domain-specific LLM that significantly improves smart home command understanding, especially for ambiguous commands, by synthesizing training data and optimizing the model for efficiency.

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little consistent aggregate improvement.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

Highlighted terms show continued research focus across papers

Papers

cs.IRcs.AIcs.CLRecentJun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…

View →
cs.CVcs.AIRecentJun 1, 2026

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang +5 more

The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little co…

View →
cs.CVcs.AIRecentMay 31, 2026

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more

The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…

View →
cs.CLcs.AIRecentMay 31, 2026

MiCU: End-to-End Smart Home Command Understanding with Large Language Model

Haowei Han, Kexin Hu, Weiwei Cai, Debiao Zhang +5 more

The paper introduces MiCU, a domain-specific LLM that significantly improves smart home command understanding, especially for ambiguous commands, by synthesizing training data and optimizing the model…

View →
cs.AIRecentMay 30, 2026

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

Zhepei Hong, Lin Wang, Liting Li, Haokai Ma +4 more

The paper proposes TRACE, a trajectory risk-aware compression method, to effectively aggregate sparse and delayed safety evidence across long agent trajectories, achieving state-of-the-art performance…

View →
cs.CLRecentMay 30, 2026

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang Zhiming Zheng

The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppres…

View →
cs.AIcs.GTRecentMay 28, 2026

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

Boning Li, Baoxiang Wang, Longbo Huang

The paper introduces PokerSkill, a novel framework that successfully enables Large Language Models (LLMs) to play expert-level poker by grounding their choices using human-designed, rule-based poker s…

View →
cs.AIRecentMay 27, 2026

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

Xiang Wang, Tingting Zhang, Sen Wang, Ying Wu +3 more

The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks…

View →
cs.CRcs.AIcs.LGRecentMay 18, 2026

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou, Jie Li

The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangero…

View →
cs.LGcs.AIcs.CRRecentApr 17, 2026

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar +3 more

The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.

View →
cs.CRcs.AIRecentApr 3, 2026

A Systematic Security Evaluation of OpenClaw and Its Variants

Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu +3 more

The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more seve…

View →