Heng Zhang
33 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes DMN, a compositional jailbreak framework that utilizes distributed instructions, multimodal evidence, and a number chain task across multiple images to significantly enhance the attack success rate against multimodal LLMs.
CachePrune introduces a privacy-aware, fine-grained KV cache sharing mechanism that allows LLM inference systems to safely reuse cache entries across users' requests, significantly improving efficiency while eliminating side-channel leakage.
SAMark introduces a self-anchored text watermarking framework that achieves high robustness (up to 90.2% TP@FP1%) against challenging paragraph-level paraphrasing attacks by establishing a step-independent green region in semantic space.
GradSentry introduces a novel backdoor sample filtering method that uses the spectral entropy of individual sample gradients to detect poisoned data during LLM fine-tuning, proving effective even at high poison ratios.
The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be reported at the model-harness configuration level.
LoSATok proposes a low-dimensional semantic-acoustic tokenizer that efficiently compresses high-dimensional audio features into a compact latent space, significantly improving the performance and efficiency of audio generation models.
The paper introduces FinBoardBench, a novel evaluation suite using financial board games to demonstrate that current LLMs, despite strong static reasoning, fail at complex, dynamic wealth management and strategic decision-making.
The paper proposes BiCoT, a novel watermarking framework that embeds ownership signals into the internal structure of Chain-of-Thought reasoning traces, achieving robust detection without compromising the model's reasoning fidelity.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.
AliMark proposes a novel watermarking framework that treats sentence-level watermarking as a bit sequence alignment problem, significantly enhancing robustness against structural text perturbations like sentence splitting and merging.
The paper introduces DynSess, a novel session-level framework that evaluates and optimizes role-playing agents by assessing long-horizon conversational quality, significantly outperforming existing turn-level methods.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.
AliMark proposes a novel framework that enhances the robustness of sentence-level watermarking by reformulating the problem as a bit sequence encoding and alignment task, significantly improving resilience against structural text perturbations like sentence splitting and merging.
HunterAgent is a neuro-symbolic framework that reconstructs causal attack chains from fragmented, anti-forensics-corrupted logs, achieving high accuracy while drastically reducing hallucination.
The paper proposes AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly models the inherent asymmetry between visual and linguistic modalities, achieving significant performance gains and efficiency improvements.
This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to significant improvements in training efficiency.
The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.
The paper proposes a sequence-alignment framework using Soft Dynamic Time Warping to evaluate audio-driven talking-head generation, demonstrating that this approach provides more robust and fair comparisons than traditional frame-wise metrics.
This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction, and scalability.
The paper introduces Tree-like Self-Play (TSP), a novel framework that treats secure code generation as a fine-grained decision process, significantly improving LLM security by forcing the model to self-correct localized vulnerabilities.
Papers
Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs
Wenqi Chen, Ziyan Zhang, Bing Wang, Lin Liu +2 more
The paper introduces Tree-like Self-Play (TSP), a novel framework that treats secure code generation as a fine-grained decision process, significantly improving LLM security by forcing the model to se…