Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/My

My

50 indexed papers

Recent (6 mo)
50
With code
0
Influential cites
0
Benchmarked
0

Publications per year

50
26

Top categories

AI×36ML×14Crypto×10NLP×8Vision×4HCI×3Stats ML×2Logic×2

Frequent co-authors

Volodymyr Ovcharov3×
Mykola Lukashchuk2×
Bert de Vries2×
Boqian Wu2×
Qiao Xiao2×
Patrik Okanovic2×

Research Timeline

2026
ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use

The paper demonstrates that advanced AI agents frequently exhibit misaligned and unsafe behavior by bypassing human corrections or restrictions (violating corrigibility) when tasked with completing realistic computer-use goals.

Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation

The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios like fragmented instances and noisy annotations.

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient sparse pre-training.

Efficient Test-time Inference for Generative Planning Models

The paper proposes an efficient inference procedure for generative planning models by modifying the Open-Closed List (OCL) search, achieving superior performance over existing baselines.

Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs

The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structured legal citation graph.

The Case for Model Science: Verify, Explore, Steer, Refine

The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-epoch training.

Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches

This survey provides a comprehensive analysis of Reasoning Language Model (RLM) adoption across 28 scientific disciplines, revealing significant disparities in RLM maturity across different scientific fields.

Subliminal Learning Is Steering Vector Distillation

The paper demonstrates that subliminal learning, where a student model acquires a teacher's traits from semantically unrelated outputs, is fundamentally mediated by a single, transferable steering vector.

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-induced errors and improving computational efficiency.

Detecting Pen-In-Air States from Video: A Proof-of-Concept Toward Complementary Handwriting Analysis

This paper demonstrates a proof-of-concept method using top-view video to detect 'Pen-Up' states in handwriting, showing it can reliably complement traditional digitizing tablets for developmental disorder analysis.

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validation and submission.

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and enhancing auditability compared to existing memory systems.

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins to semantically similar responses.

What Type of Inference is Active Inference?

This paper provides a detailed message-passing scheme for EFE-based planning and clarifies the corrections needed for cross-entropy planning and full EFE-based planning.

Graph Cascades: Contagion-Based Mesoscopic Rewiring for Structure-Aware Graph Machine Learning

The paper introduces Graph Cascades, a mesoscopic rewiring technique that enhances Graph Neural Networks by promoting node pairs with strong multi-hop connections to direct edges, improving performance particularly on heterophilic graphs.

Formal verification of the S-two AIR

This paper formally verifies that the algebraic intermediate representation (AIR) used by the S-two prover correctly captures the computational semantics of the Cairo virtual machine language, ensuring that satisfying the AIR implies the program runs to completion.

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.

The Curious Case of Reversible Elementary Second Order Cellular Automaton 115

The paper proves that the reversible elementary second order cellular automaton rule 115 is periodic when started on finite initial configurations.

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

This paper presents EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery.

Highlighted terms show continued research focus across papers

Papers

cs.DMmath.COmath.DSTheoreticalRecentJun 11, 2026

The Curious Case of Reversible Elementary Second Order Cellular Automaton 115

Enrico Formenti, Supreeti Kamylia

The paper proves that the reversible elementary second order cellular automaton rule 115 is periodic when started on finite initial configurations.

View →
cs.AIcs.CLEmpirical
Recent
Jun 11, 2026

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao +4 more

This paper presents EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery.

View →
cs.LGcs.AIEmpiricalRecentJun 4, 2026

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.

View →
cs.AIRecentJun 3, 2026

What Type of Inference is Active Inference?

Wouter W. L. Nuijten, Mykola Lukashchuk, Thijs van de Laar, Bert de Vries

This paper provides a detailed message-passing scheme for EFE-based planning and clarifies the corrections needed for cross-entropy planning and full EFE-based planning.

View →
cs.LGstat.MLRecentJun 3, 2026

Graph Cascades: Contagion-Based Mesoscopic Rewiring for Structure-Aware Graph Machine Learning

Meher Chaitanya, My Le, Luana Ruiz

The paper introduces Graph Cascades, a mesoscopic rewiring technique that enhances Graph Neural Networks by promoting node pairs with strong multi-hop connections to direct edges, improving performanc…

View →
cs.CRcs.LOcs.PLRecentJun 3, 2026

Formal verification of the S-two AIR

Jeremy Avigad, Anat Ganor, Lior Goldberg, David Levit +3 more

This paper formally verifies that the algebraic intermediate representation (AIR) used by the S-two prover correctly captures the computational semantics of the Cairo virtual machine language, ensurin…

View →
cs.LGcs.AIRecentJun 1, 2026

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

Kyunghun Nam, Sumyeong Ahn

The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…

View →
cs.CVRecentJun 1, 2026

Detecting Pen-In-Air States from Video: A Proof-of-Concept Toward Complementary Handwriting Analysis

Lauren Sismeiro, Remy Plastre, Binbin Xu, Frederic Puyjarinet +1 more

This paper demonstrates a proof-of-concept method using top-view video to detect 'Pen-Up' states in handwriting, showing it can reliably complement traditional digitizing tablets for developmental dis…

View →
cs.AIRecentJun 1, 2026

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more

The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…

View →
cs.AIcs.CERecentJun 1, 2026

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita +3 more

The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and e…

View →
cs.AIcs.LGRecentJun 1, 2026

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu +12 more

S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins…

View →
cs.AIRecentMay 31, 2026

The Case for Model Science: Verify, Explore, Steer, Refine

Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more

The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.

View →
cs.LGcs.AIRecentMay 31, 2026

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more

This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…

View →
cs.AIRecentMay 31, 2026

Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches

Teddy Ferdinan, Bartłomiej Koptyra, Mikołaj Langner, Tomasz Adamczyk +41 more

This survey provides a comprehensive analysis of Reasoning Language Model (RLM) adoption across 28 scientific disciplines, revealing significant disparities in RLM maturity across different scientific…

View →
cs.AIRecentMay 31, 2026

Subliminal Learning Is Steering Vector Distillation

Camila Blank, Agam Bhatia, Senthooran Rajamanoharan, Arthur Conmy +1 more

The paper demonstrates that subliminal learning, where a student model acquires a teacher's traits from semantically unrelated outputs, is fundamentally mediated by a single, transferable steering vec…

View →
cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →
cs.AIRecentMay 30, 2026

Efficient Test-time Inference for Generative Planning Models

Robert Gieselmann, Mihai Samson, Federico Pecora, Jeremy L. Wyatt

The paper proposes an efficient inference procedure for generative planning models by modifying the Open-Closed List (OCL) search, achieving superior performance over existing baselines.

View →
cs.CLcs.DLRecentMay 30, 2026

Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs

Volodymyr Ovcharov

The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structu…

View →
cs.LGcs.AIRecentMay 29, 2026

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use

Jeremy Tien, Abishek Anand, Yu-Rou Tuan, Yuchen Shen +2 more

The paper demonstrates that advanced AI agents frequently exhibit misaligned and unsafe behavior by bypassing human corrections or restrictions (violating corrigibility) when tasked with completing re…

View →
cs.CVcs.AIRecentMay 29, 2026

Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation

Erik Großkopf, Soumya Snigdha Kundu, Hendrik Möller, Nicolas Münster +8 more

The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios lik…

View →