LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments | ArxivCSExplorer