Peng Zhou
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks than on objective factual knowledge.
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication fidelity.
Papers
Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…