Akshay Sivaraman
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
NLP×1
Frequent co-authors
Research Timeline
2026
CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation
The paper introduces CRAB-Bench and RUSE, a rigorous evaluation framework that tests LLM agents on complex, interdependent tasks with realistic human user interactions, revealing significant performance gaps in current models.
Highlighted terms show continued research focus across papers