Yanfeng Wang
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces MOV-Bench, a challenging benchmark for multi-hop audio-visual reasoning, and proposes AOP-Agent, an agentic framework that significantly improves open-source Omni-LLMs' ability to perform active cross-modal perception.
The paper introduces SkillBrew, a multi-objective framework that treats skill bank curation as a constrained optimization problem to build efficient and well-curated skill repositories for LLM agents.
The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.
The paper introduces MCP-Persona, a novel benchmark designed to evaluate LLM agents' performance on real-world, personalized applications using the Model Context Protocol (MCP), revealing that current state-of-the-art agents struggle with such personalized tool use.
This paper introduces MedReCo and MedReCo-VLM, a framework that enables entity-aware cross-image reasoning for medical imaging, allowing AI to compare current scans with prior studies and analogous cases based on structured clinical reports.
Papers
A Vision-language Framework for Comparative Reasoning in Radiology
Tengfei Zhang, Ziheng Zhao, Lisong Dai, Xiaoman Zhang +4 more
This paper introduces MedReCo and MedReCo-VLM, a framework that enables entity-aware cross-image reasoning for medical imaging, allowing AI to compare current scans with prior studies and analogous ca…