Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows | ArxivCSExplorer