Gongshen Liu
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
GradSentry introduces a novel backdoor sample filtering method that uses the spectral entropy of individual sample gradients to detect poisoned data during LLM fine-tuning, proving effective even at high poison ratios.
The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.
The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.
Papers
HLL: Can Agents Cross Humanity's Last Line of Verification?
Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more
The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents ar…