Lei Xu
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces CardioLens, a rigorous evaluation testbed for multi-sequence Cardiac MRI, which reveals that current Multimodal Large Language Models (MLLMs) exhibit a significant 'clinical reality gap' and perform poorly when simulating real-world cardiac interpretation workflows.
This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system damage is avoided.
The paper introduces HomeFlow, a verifiable data flywheel that procedurally generates high-quality, multi-turn training data for smart home agents, achieving state-of-the-art performance on smart home tasks.
The paper introduces FTDiff, a reinforcement learning fine-tuning framework that efficiently generates high-quality, drug-like molecules constrained by a target protein structure, outperforming existing methods.
The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-home scenarios.
Papers
SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes
Kuan Li, Shuo Zhang, Huacan Wang, Fangzhou Yu +11 more
The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-…