Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue | ArxivCSExplorer