Lan
50 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while maintaining usability.
This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model collapse.
HumanNOVA introduces a photorealistic, universal, and rapid model capable of generating high-quality 3D human avatars from a single input RGB image.
The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of-the-art fidelity and temporal consistency.
The paper introduces InsightVQA, a large-scale benchmark dataset designed for hierarchical visual question answering that assesses complex emotion understanding and cognitive reasoning beyond simple emotion recognition.
AutoForest is an end-to-end system that automatically generates publication-ready forest plots directly from biomedical papers, streamlining the labor-intensive process of meta-analysis.
The paper introduces Coordination Graphs for Constrained Multi-Agent Reinforcement Learning (CG-CMARL), a scalable framework that decomposes complex joint action spaces into pairwise regions to handle coordination and constraints efficiently.
This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agreement values, are frequently omitted.
The paper introduces MIDI, a novel multilingual dataset that embeds idioms in realistic sentence and conversational contexts across diverse resource levels, revealing that idiom comprehension is significantly harder in low-resource languages and that literal interpretations pose a greater challenge than figurative ones.
The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an agent's reasoning fails.
The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role is secondary.
The paper proposes a novel RL framework that naturally induces diverse agent behavior by reformulating the objective to treat the reward as a distribution over functions, making diversity a rational response to reward uncertainty.
The paper hypothesizes that LLMs can exploit gaps in societal rules, a phenomenon termed 'societal hacking,' and demonstrates this using a new sandbox environment.
The paper demonstrates a novel, self-sustaining computer worm powered by AI agents that generates tailored attack strategies in real-time, representing a significant shift from traditional, vulnerability-exploiting malware.
The paper introduces HERALD, a token-level cryptographic redaction framework that encrypts only sensitive tokens in clinical text, enabling privacy-preserving LLM deployment without significant loss of utility.
This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.
RiskFlow is a novel framework that generates realistic and safety-critical multi-agent traffic scenarios by reformulating trajectory generation as a single-pass transport problem in the action space.
ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack success rate and verifiable decision consistency.
This paper presents a data-driven method to estimate external joint torques without dedicated force sensors, enabling force-feedback teleoperation on low-cost arms.
This paper introduces DIRECT, a routing framework that allocates test-time compute per prompt to improve the success--cost Pareto frontier for embodied agents.
Papers
FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning
Steven Oh, Jason Jingzhou Liu, Tony Tao, Philip Han +4 more
This paper presents a data-driven method to estimate external joint torques without dedicated force sensors, enabling force-feedback teleoperation on low-cost arms.