Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Xuan Zhang

Xuan Zhang

13 indexed papers

Recent (6 mo)
13
With code
0
Influential cites
0
Benchmarked
0

Publications per year

13
26

Top categories

AI×12Crypto×5NLP×2Vision×2Info Retrieval×1ML×1

Frequent co-authors

Yuxuan Zhang4×
Wenxuan Zhang2×
Mingxuan Zhang2×
Jiahui Han2×
Dadi Guo2×
Songze Li2×

Research Timeline

2026
A Security Analysis of the OpenClaw AI Agent Framework

This paper analyzes 470 security advisories in the OpenClaw AI agent framework, demonstrating that the system's structural weakness lies in per-layer trust enforcement, enabling cross-layer remote code execution (RCE) attacks.

TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

TraceScope is an interactive, sandboxed triage pipeline that analyzes complex phishing URLs by simulating human interaction and verifying suspicious behavior against a detailed checklist, achieving high detection rates even against advanced, evasive threats.

A Comparative Evaluation of AI Agent Security Guardrails

This paper comparatively evaluates DKnownAI Guard against three competitors, demonstrating that DKnownAI Guard achieves superior performance in detecting both agent-specific threats and harmful content.

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

MIRA proposes a novel source-aware filtering framework that discovers and anchors evaluation rubrics during data selection, significantly improving code-oriented mid-training data quality while reducing token usage.

iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis

iLoRA introduces a novel Bayesian graph-conditioned LoRA framework that jointly learns prediction and latent interaction structure, significantly improving microbiome diagnosis by modeling microbe-microbe cross-talk.

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vulnerability.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical privacy vulnerability.

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

MindClaw: Closed-Loop Embodied Mental-State Reasoning for Precision Intervention

The paper introduces MindClaw, a closed-loop framework that enables embodied agents to perform real-time mental-state reasoning and intervene with precision, significantly outperforming standard VLM baselines.

Collaborative Space Object Detection with Multi-Satellite Viewpoints in LEO Constellations

This paper demonstrates that fusing multi-viewpoint data from multiple satellites significantly enhances the accuracy of space object detection in congested LEO constellations, establishing multi-view fusion as an effective strategy.

QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards

QUBRIC introduces a co-design framework that simultaneously optimizes queries and rubrics, overcoming the bottleneck of vague rubrics derived from open-ended questions, leading to significant gains in RL performance.

CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring

This paper proposes CompRank, a token-efficient reranking framework for large language models that reduces redundant computation and achieves strong reranking performance.

Highlighted terms show continued research focus across papers

Papers

cs.IREmpiricalRecentJun 10, 2026

CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring

Xuan Lu, Haohang Huang, Yingqi Fan, Junlong Tong +4 more

This paper proposes CompRank, a token-efficient reranking framework for large language models that reduces redundant computation and achieves strong reranking performance.

View →
cs.CLcs.AIRecentJun 2, 2026

QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards

Rongzhi Zhang, Rui Feng, Zhihan Zhang, Jingfeng Yang +7 more

QUBRIC introduces a co-design framework that simultaneously optimizes queries and rubrics, overcoming the bottleneck of vague rubrics derived from open-ended questions, leading to significant gains in…

View →
cs.CVcs.AIRecentJun 1, 2026

Collaborative Space Object Detection with Multi-Satellite Viewpoints in LEO Constellations

Xingyu Qu, Wenxuan Zhang, Peng Hu

This paper demonstrates that fusing multi-viewpoint data from multiple satellites significantly enhances the accuracy of space object detection in congested LEO constellations, establishing multi-view…

View →
cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →
cs.AIRecentMay 31, 2026

MindClaw: Closed-Loop Embodied Mental-State Reasoning for Precision Intervention

Ruoxuan Zhang, Qiaoqiao Wan, Zhengguang Wang, Chenghao Yu +3 more

The paper introduces MindClaw, a closed-loop framework that enables embodied agents to perform real-time mental-state reasoning and intervene with precision, significantly outperforming standard VLM b…

View →
cs.CRcs.AIRecentMay 29, 2026

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

Mingxuan Zhang, Jiahui Han, Dadi Guo, Songze Li +4 more

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vul…

View →
cs.CRcs.AIRecentMay 29, 2026

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

Mingxuan Zhang, Jiahui Han, Dadi Guo, Songze Li +4 more

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical priva…

View →
cs.AIRecentMay 28, 2026

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Haowen Wang, Yaxin Du, Jian Yang, Jiajun Wu +8 more

MIRA proposes a novel source-aware filtering framework that discovers and anchors evaluation rubrics during data selection, significantly improving code-oriented mid-training data quality while reduci…

View →
cs.LGcs.AIRecentMay 28, 2026

iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis

Yang Song, Yixuan Zhang, Lingfa Meng, Tongyuan Hu +4 more

iLoRA introduces a novel Bayesian graph-conditioned LoRA framework that jointly learns prediction and latent interaction structure, significantly improving microbiome diagnosis by modeling microbe-mic…

View →
cs.AIRecentMay 28, 2026

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Chen He, Yuhao Wu, Lei Wang, Wenxuan Zhang +1 more

The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.

View →
cs.CRcs.AIRecentApr 27, 2026

A Comparative Evaluation of AI Agent Security Guardrails

Qi Li, Jiu Li, Pingtao Wei, Jianjun Xu +7 more

This paper comparatively evaluates DKnownAI Guard against three competitors, demonstrating that DKnownAI Guard achieves superior performance in detecting both agent-specific threats and harmful conten…

View →
cs.CRcs.AIRecentApr 23, 2026

TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu +1 more

TraceScope is an interactive, sandboxed triage pipeline that analyzes complex phishing URLs by simulating human interaction and verifying suspicious behavior against a detailed checklist, achieving hi…

View →
cs.CRcs.AIRecentMar 29, 2026

A Security Analysis of the OpenClaw AI Agent Framework

Surada Suwansathit, Yuxuan Zhang, Guofei Gu

This paper analyzes 470 security advisories in the OpenClaw AI agent framework, demonstrating that the system's structural weakness lies in per-layer trust enforcement, enabling cross-layer remote cod…

View →