Hui Li
22 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
Styx is a novel framework that enhances data privacy and security in collaborative data processing, such as joint AI training, by integrating sticky policies with Trusted Execution Environments (TEEs).
The paper proposes Look One Step Ahead (LOSA), a novel framework that enables efficient, privacy-preserving, and robust service provisioning in dynamic air-ground integrated networks by decoupling planning into a look-ahead phase and a real-time execution phase.
The paper introduces SecGoal, a benchmark dataset and framework, demonstrating that fine-tuning smaller LLMs on this dataset significantly improves the precision of extracting formalizable security goals from natural language protocol documents.
The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.
The paper introduces Neo, an agentic program analysis framework that successfully detects zero-day privilege escalation vulnerabilities in complex, polyglot microservices by combining LLMs with advanced code analysis.
The paper proposes ESC-Skills, a skill-centric framework that discovers and self-evolves executable emotional support skills to improve the interpretability and emotional quality of conversational AI.
The paper demonstrates that the valence structure learned by modern LLMs aligns with human EEG emotional representations, but finds that further supervised alignment is ineffective due to a phenomenon called saturation regularity.
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of whether the evidence is presented in a single prompt or gradually across multiple turns.
The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.
The paper proposes a unified framework that decouples long-video reasoning into semantic and visual evidence, significantly improving performance on the HD-EPIC VQA Challenge.
The paper introduces SpatialAct, a challenging benchmark that reveals a significant 'reasoning-to-action gap,' showing that current VLMs struggle to maintain coherent spatial understanding and perform reliable actions in multi-turn 3D environments.
PatchWorld introduces a gradient-free framework to create executable Python world models from offline trajectories, achieving high planning scores by inducing symbolic belief-state programs.
EvoGens is an evolution-inspired framework that treats scientific idea generation as an evolutionary search, significantly boosting the novelty and diversity of generated research ideas compared to existing LLM-based methods.
The paper introduces SkyShield, the first front-view monocular semantic occupancy benchmark for low-altitude urban UAV flight, along with a novel metric and model to address the unique safety challenges of aerial navigation.
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation quality and robustness, especially under low step budgets.
The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to perpetuate errors.
The paper proposes GIM-World, a geometry-aware implicit memory framework that significantly improves long-horizon video world models by explicitly encoding 3D scene geometry into a compact memory state.
OctoT2I introduces a self-evolving, agentic routing framework that efficiently selects and combines multiple Text-to-Image models, achieving high performance while significantly boosting inference speed and energy efficiency.
This paper investigates the vulnerability of LLM-based automatic grading systems to prompt injection (PI) attacks, demonstrating that current systems are highly susceptible to manipulation that can lead to unfairly high scores.
The paper proposes Astra, an agentic framework that equips Vision-Language Models (VLMs) with the ability to perform spatial reasoning by actively generating and utilizing imagined visual evidence from a world simulator.
Papers
Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators
Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao +3 more
The paper proposes Astra, an agentic framework that equips Vision-Language Models (VLMs) with the ability to perform spatial reasoning by actively generating and utilizing imagined visual evidence fro…