"Active exploration" | ArxivCSExplorer

20 results for “Active exploration”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.LGcs.AIEmpiricalRecentJun 10, 2026

ATLAS: Active Theory Learning for Automated Science

Noémi Éltető, Nathaniel D. Daw, Kimberly L. Stachenfeld, Kevin J. Miller

This paper introduces ATLAS, an active learning framework for discovering interpretable behavioral models in cognitive science.

View →

cs.CLRecentMay 29, 2026

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Tianjie Ju, Yueqing Sun, Zheng Wu, Wei Zhang +6 more

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a signific…

View →

cs.AIRecentJun 1, 2026

Joint Agent Memory and Exploration Learning via Novelty Signals

Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen +8 more

The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.

View →

cs.ROcs.AIRecentMay 29, 2026

TARIC: Memory-Augmented Traversability-Aware Outdoor VLN under Interrupted Semantic Cues

Tianle Zeng, Hanjing Ye, Jianwei Peng, Jingwen Yu +2 more

The paper proposes a memory-augmented, traversability-aware framework for outdoor VLN that maintains stable, goal-consistent guidance even when semantic cues are interrupted or unavailable.

View →

cs.LGcs.AIstat.MLRecentMay 28, 2026

Active Timepoint Selection for Learning Measure-Valued Trajectories

Nicolas Huynh, Mihaela van der Schaar

The paper proposes a novel active learning framework using Linearized Optimal Transport to strategically select measurement timepoints, thereby minimizing uncertainty when inferring continuous probabi…

View →

cs.AIRecentMay 28, 2026

Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

Ahmed Abouelazm, Felix Klingebiel, Philip Schörner, J. Marius Zöllner

The paper introduces an uncertainty-aware framework that uses regulated expert advice to guide safe and efficient exploration for autonomous driving policies, significantly improving performance in co…

View →

cs.LGcs.AIRecentMay 29, 2026

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

Stephane Hatgis-Kessell, Emma Brunskill

The paper introduces Prompted Policy Optimization (PromptPO), an LLM-based method that successfully optimizes policies for various sequential RL tasks, demonstrating that LLMs can replace classical RL…

View →

cs.AIcs.CLEmpiricalRecentJun 11, 2026

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao +4 more

This paper presents EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery.

View →

cs.ROcs.AIRecentJun 2, 2026

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Roohan Ahmed Khan, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou

The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…

View →

cs.AIcs.CVcs.RORecentMay 28, 2026

Planning with the Views via Scene Self-Exploration

Kangrui Wang, Linjie Li, Zhengyuan Yang, Shiqi Chen +6 more

The paper addresses the challenge of multi-turn view planning for VLMs by proposing an iterative framework that uses self-exploration and view graph distillation, significantly improving planning perf…

View →

cs.LGcs.AIRecentMay 29, 2026

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno +3 more

The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple s…

View →

cs.LGcs.AIRecentMay 29, 2026

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more

The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…

View →

cs.CLRecentMay 29, 2026

EMBGuard: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents

Dongwook Choi, Taeyoon Kwon, Bogyung Jeong, Minju Kim +5 more

EMBGuard introduces a novel, MLLM-based safety guardrail that explicitly identifies and explains physical hazards from (visual observation, action) pairs, enabling safer planning for embodied agents.

View →

cs.CLcs.AIRecentMay 30, 2026

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

Qiming Shi, Zhaolu Kang, Yunfan Zhou, Di Weng +1 more

SPADER is a novel reinforcement learning framework that addresses the challenges of Multi-Answer Question Answering by improving credit assignment and promoting diverse exploration during long-horizon…

View →

cs.CVRecentJun 1, 2026

Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models

Wei Deng, Xianlin Zhang, Mengshi Qi

The paper proposes an agentic pipeline for spatial reasoning by introducing a dynamic cognitive map and Spatial Assertion Codes (SAC), achieving state-of-the-art performance on complex spatial tasks.

View →

cs.AIRecentMay 27, 2026

GONDOR to the Rescue: Satisficing Planning with Low Memory

Yonatan Vernik, Alexander Tuisov, Alexander Shleyfman

The paper introduces GONDOR, a memory-efficient extension of Greedy Best-First Search (GBFS) that enables search continuation under strict memory constraints by periodically compressing the search tre…

View →

cs.ROcs.AIRecentMay 27, 2026

Visualizing Latent Phase Structures in Locomotion Policies: A Multi-Environment Study with Temporal Feature Extension

Daisuke Yasui, Toshitaka Matuki, Hiroshi Sato

The paper proposes a novel framework to visualize and uncover latent, structured motion phases in deep reinforcement learning locomotion policies by augmenting state observations with action and next-…

View →

cs.AIRecentMay 29, 2026

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu +5 more

The paper proposes SCALE, a self-improving web agent framework that uses adversarial roles and graph exploration to autonomously discover agent limitations and enhance adaptability in complex web envi…

View →

cs.CLcs.AIRecentMay 27, 2026

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

Xiaohongshu Inc

The paper introduces VibeSearchBench, a new benchmark designed to evaluate long-horizon, proactive search capabilities, demonstrating that current state-of-the-art LLM agents are still significantly i…

View →