Ge Zhang
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM refusal rates when integrated into agent workflows.
TraceGraph introduces a graph-based framework to map agent decision-making across pooled trajectories, revealing hidden differences in agent behavior and improving performance by targeting known failure regions.
TriLens is a white-box detector that monitors the entropy of three internal streams (attention, feed-forward, residual) at every layer of a language model to detect hallucinations by tracking how internal certainty forms.
Papers
TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection
Bohan Yang, Yijun Gong, Zhi Zhang, Ge Zhang +2 more
TriLens is a white-box detector that monitors the entropy of three internal streams (attention, feed-forward, residual) at every layer of a language model to detect hallucinations by tracking how inte…