Meng Han
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
AttnDiff introduces a data-efficient white-box framework that extracts intrinsic attention-based fingerprints to verify the provenance and detect unauthorized derivation of large language models (LLMs) despite common model laundering techniques.
The paper introduces CORDON-MAS, a compartmentalized framework that defends Retrieval-Augmented Generation (RAG) against knowledge poisoning by enforcing strict information-flow control, significantly reducing attack success rates.
TriLens is a white-box detector that monitors the entropy of three internal streams (attention, feed-forward, residual) at every layer of a language model to detect hallucinations by tracking how internal certainty forms.
Papers
TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection
Bohan Yang, Yijun Gong, Zhi Zhang, Ge Zhang +2 more
TriLens is a white-box detector that monitors the entropy of three internal streams (attention, feed-forward, residual) at every layer of a language model to detect hallucinations by tracking how inte…