Heng Zhang

39 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×24AI×21NLP×14ML×10Vision×6Sound×4Robotics×3Audio and Speech Processing×2

Frequent co-authors

Jiaheng Zhang9×

Xiangzheng Zhang5×

Wenjie Qu5×

Kun Wang4×

Peng Wang3×

Qiaosheng Zhang3×

Research Timeline

2026

Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be reported at the model-harness configuration level.

LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation

LoSATok proposes a low-dimensional semantic-acoustic tokenizer that efficiently compresses high-dimensional audio features into a compact latent space, significantly improving the performance and efficiency of audio generation models.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

AliMark proposes a novel watermarking framework that treats sentence-level watermarking as a bit sequence alignment problem, significantly enhancing robustness against structural text perturbations like sentence splitting and merging.

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

The paper introduces DynSess, a novel session-level framework that evaluates and optimizes role-playing agents by assessing long-horizon conversational quality, significantly outperforming existing turn-level methods.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

AliMark proposes a novel framework that enhances the robustness of sentence-level watermarking by reformulating the problem as a bit sequence encoding and alignment task, significantly improving resilience against structural text perturbations like sentence splitting and merging.

HunterAgent: Neuro-Symbolic Attack Trace Reconstruction under Anti-Forensics

HunterAgent is a neuro-symbolic framework that reconstructs causal attack chains from fragmented, anti-forensics-corrupted logs, achieving high accuracy while drastically reducing hallucination.

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models

The paper proposes AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly models the inherent asymmetry between visual and linguistic modalities, achieving significant performance gains and efficiency improvements.

Are Full Rollouts Necessary for On-Policy Distillation?

This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to significant improvements in training efficiency.

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.

Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation

The paper proposes a sequence-alignment framework using Soft Dynamic Time Warping to evaluate audio-driven talking-head generation, demonstrating that this approach provides more robust and fair comparisons than traditional frame-wise metrics.

A Primer in Post-Training Reasoning Data: What We Know About How It Works

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction, and scalability.

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

The paper introduces Tree-like Self-Play (TSP), a novel framework that treats secure code generation as a fine-grained decision process, significantly improving LLM security by forcing the model to self-correct localized vulnerabilities.

MedPMC: A Systematic Framework for Scaling High-Fidelity Medical Multimodal Data for Foundation Models

The paper introduces MedPMC, a framework that transforms permissively licensed literature into high-fidelity infrastructure for medical multimodal models, resulting in improved performance on various benchmarks.

WebSwarm: Recursive Multi-Agent Orchestration for Deep-and-Wide Web Search

The paper proposes WebSwarm, a multi-agent search system that dynamically instantiates agentic search nodes for task decomposition, recursive expansion, and agent collaboration.

FabriVLA: A Lightweight Vision-Language-Action Model for Precise Multi-Task Manipulation

The paper introduces FabriVLA, a lightweight Vision-Language-Action model that achieves strong performance on the Meta-World MT50 benchmark using a compact 1B scale VLM backbone and a flow-matching action head.

BoxTwin: Learning Elastoplastic Articulated Object Dynamics from Videos

The paper presents BoxTwin, an interactive digital twin framework that learns the full dynamics of elastoplastic articulated objects from videos and accurately tracks joint trajectories and reproduces post contact plastic behavior.

No Training, Better Flights: Test-Time Scaled VLMs for UAV Navigation

This paper proposes a test-time scaling approach for Vision-Language Models in Unmanned Aerial Vehicle navigation, enabling self-correction and generation of more accurate and reliable flight plans.

Pushing the Frontier of Full-Song Generation: Hierarchical Autoregressive Planning Meets Flow-Matching Rendering

This paper introduces a unified framework for generating high-quality full-length music from lyrics, text descriptions, and musical attributes, consisting of a semantic-aware tokenizer, hybird-LM, FullDiT, and a two-level melody module.

Highlighted terms show continued research focus across papers

Papers

cs.SDcs.AIeess.ASEmpiricalRecentJul 22, 2026

Pushing the Frontier of Full-Song Generation: Hierarchical Autoregressive Planning Meets Flow-Matching Rendering

Junyu Dai, Xinyue Fan, Weiqin Li, Xiangang Li +12 more

View →

cs.CVcs.ROEmpirical