Xiang Wang

11 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×10NLP×3Crypto×3Vision×2ML×2Info Retrieval×1Game Theory×1

Frequent co-authors

Garvin Guo2×

Yu Chen2×

Shuai Li2×

Xinpei Zhao2×

Huaxing Liu2×

OneRec Team1×

Research Timeline

2026

A Systematic Security Evaluation of OpenClaw and Its Variants

The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more severe than the underlying language models alone.

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangerous over-generalization during reflection.

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks than on objective factual knowledge.

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

The paper introduces PokerSkill, a novel framework that successfully enables Large Language Models (LLMs) to play expert-level poker by grounding their choices using human-designed, rule-based poker skills, achieving competitive performance without requiring specialized training or complex solvers.

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

The paper proposes TRACE, a trajectory risk-aware compression method, to effectively aggregate sparse and delayed safety evidence across long agent trajectories, achieving state-of-the-art performance on multiple safety benchmarks.

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation

The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppressing harmful stylistic biases.

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to encode visual evidence.

MiCU: End-to-End Smart Home Command Understanding with Large Language Model

The paper introduces MiCU, a domain-specific LLM that significantly improves smart home command understanding, especially for ambiguous commands, by synthesizing training data and optimizing the model for efficiency.

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little consistent aggregate improvement.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

Highlighted terms show continued research focus across papers

Papers

cs.IRcs.AIcs.CLRecentJun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

View →

cs.CVcs.AIRecentJun 1, 2026