Ao Zhang

50 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×31NLP×17ML×8Robotics×8Vision×6Sound×5Info Retrieval×5Distributed×3

Frequent co-authors

Chao Zhang4×

Hao Zhang3×

Research Timeline

2026

TRACE: Temporal Relationship-Aware Conversational Entrainment Detection in Dyadic Speech

This paper introduces DyadEE, a dataset for emotional entrainment detection in conversational interactions, and TRACE, a window-level framework for modeling dyadic interaction using emotion fine-tuned Whisper representations. TRACE achieves the highest accuracy of 97.01% on DyadEE.

ShopX: A Foundation Model for Intent-to-Item Fulfillment in Agentic Shopping

This paper proposes ShopX, a model-centric framework for intent-driven shopping experiences using a single foundation model for intent understanding, execution planning, and item-space operations.

When RAG Meets Query Planning: Logical Query Trees for Resolving Exploratory Reasoning Problems

The paper introduces PlanRAG, a framework for Retrieval-Augmented Generation (RAG) that models exploratory reasoning problems as logical query trees, addressing representation and optimization gaps between structured SQL and unstructured natural language.

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

The paper proposes Fuzzy-Function Programming and introduces Program-as-Weights (PAW), a compact, locally-executable neural artifact for everyday programming tasks.

SPEARBench: A Benchmark for Naturalness Evaluation in Streaming Speech-to-Speech Language Models

The paper introduces SPEARBench, a benchmark for evaluating naturalness in speech-to-speech language models using a multidimensional protocol.

Embedded Blockchain Infrastructure Management (eBIM): A RISC-V-Empowered Hardware--Software Co-Design Framework Towards Trustworthy Blockchain

This paper proposes eBIM, a software-hardware collaborative paradigm for blockchain infrastructure management using RISC-V, and surveys related research and technologies.

GIFT: Geometry-Informed Low-precision Gradient Communication for LLM Pretraining

The paper presents GIFT, a method for reducing communication volume in large language model pretraining by transforming gradients into a near-isotropic space before quantization.

WaspMOT: A Benchmark for Long-Term Multi-Object Tracking of Trichogramma Wasps

The paper introduces WaspMOT, a new benchmark for long-term multi-object tracking, and evaluates five tracking-by-detection methods.

Native Video-Action Pretraining for Generalizable Robot Control

This paper introduces LingBot-VA 2.0, a video-action foundation model designed for embodiment, with semantic visual-action tokenization, causal pretraining, sparse MoE backbone, and enhanced asynchronous inference.

SolarChain-Eval: A Physics-Constrained Benchmark for Trustworthy Economic Agents in Decentralized Energy Markets

This paper proposes SolarChain-Eval, a physics-constrained benchmark for evaluating trustworthy economic agents in decentralized energy markets.

Learning Agile Navigation in Crowded Environments for Quadruped Robots

This paper proposes VOP-Nav, a novel navigation system for quadruped robots that combines the geometric safety of Velocity Obstacles with the agile adaptability of end-to-end learning.

JoyNexus: Service-Oriented Multi-Tenant Post-Training for VLA Models

JoyNexus is a multi-tenant service for VLA model supervised fine-tuning, reinforcement learning, and evaluation, which decouples services, introduces group batching, and improves training efficiency.

RecGPT-V3 Technical Report

RecGPT-V3 is a stateful, hybrid-modal recommender system that uses a Memory Hub for user memory and a Hybrid-modal Foundation Model for joint reasoning over text tags and Semantic IDs, achieving consistent gains in user experience and commercial outcomes.

SALMONN-2: Advancing General-Purpose Hearing Abilities with Self-Supervised Representations

The paper proposes SALMONN-2, an ALLM built on a unified SSL encoder, and presents a multi-layer feature fusion adapter to better exploit hierarchical SSL encoder representations. It also explores multimodal in-context learning in ALLMs and shows that a general-purpose SSL encoder achieves comparable performance to specialized audio encoders.

Spatial Semantic Communication: When Semantic Transmission Meets Index Modulation

This paper proposes a spatial semantic communication (SSC) system using fluid antenna-index modulation (FA-IM) technology, which synergizes residual quantization (RQ) and IM for efficient semantic transmission.

ReferTrack: Referring Then Tracking for Embodied Visual Tracking

The paper introduces ReferTrack, a method for embodied visual tracking using a single forward-facing camera, achieving state-of-the-art performance on EVT-Bench.

RPPNet: Perceptually-Grouped Rhythm-Pitch Primitives for Long-Term Structure Melody Generation via Boundary-Aware Modeling

This paper proposes RPPNet, a two-stage deep learning architecture for music generation with variable structural boundaries, which automatically derives grouping of Rhythm-Pitch Primitive sequences from acoustic cues and musical psychology.

RL-MACRO: A Cybernetic Closed-Loop Intelligence Framework for Multimodal Adaptive Robotic Craniotomy

This paper proposes RL-MACRO, a cybernetic closed-loop intelligence framework for autonomous robotic craniotomy, which includes a CNN-LSTM observer for temperature reconstruction, an offline Implicit Q-Learning policy, and a novel dual-head Actor for coordinating cutting parameters.

SoundscapeAgent: Agentic Soundscape Construction for Controllable Synthesis and Scalable Audio-Language Supervision

The paper introduces an agentic soundscape construction framework for controllable compositional audio generation, which makes explicit the scene planning, source selection, temporal layout, and rendering steps.

SpecBox: Speculative Sandbox Scheduling for Efficient LLM Agent Serving

This paper presents SpecBox, a runtime system for LLM agents that uses speculative sandbox preallocation to improve resource utilization and reduce interactive tail latency.

Highlighted terms show continued research focus across papers

Papers

cs.DCcs.AIcs.LGEmpiricalRecentJul 27, 2026

SpecBox: Speculative Sandbox Scheduling for Efficient LLM Agent Serving

Yihui Zhang, Tianyu Wo, Jinghao Wang, Xiaoyang Sun +6 more

This paper presents SpecBox, a runtime system for LLM agents that uses speculative sandbox preallocation to improve resource utilization and reduce interactive tail latency.

View →

cs.ROEmpirical