Papers similar to 2604.03612v1

~ similar to 2604.03612v1· 17 results

cs.CREmpiricalRecentJun 27, 2026

A Usable and Secure Bengali CAPTCHA

Md Neyamul Islam Shibbir, Md Hasibur Rahman, Farida Chowdhury, Md Sadek Ferdous

This paper proposes and implements the first Bengali text CAPTCHA mechanism for native Bengali-speaking users, achieving high security and usability.

View →

eess.AScs.AIcs.HCRecentMay 27, 2026

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors

Lelia Erscoi, Tomi Kinnunen

This study investigates how humans detect synthetic speech in real-world contexts, finding that while overt detection failed for fully synthetic speech, participants still implicitly discriminated utt…

View →

cs.AIcs.CLcs.CVRecentJun 1, 2026

HLL: Can Agents Cross Humanity's Last Line of Verification?

Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents ar…

View →

cs.CRcs.SDRecentMay 5, 2026

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal, Arun Balaji Buduru

The paper introduces DECKER, a domain-invariant framework that significantly improves cross-keyboard keystroke inference by normalizing device variations and leveraging linguistic context, demonstrati…

View →

cs.CRcs.AIcs.CVRecentMar 23, 2026

CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training

Yuxi Chen, Haoyu Zhai, Chenkai Wang, Rui Yang +3 more

The paper introduces ReCAP, a native GUI agent that significantly improves CAPTCHA solving success (from 30% to 80%) by integrating specialized CAPTCHA capabilities into a general-purpose, end-to-end…

View →

cs.SDcs.AIEmpiricalRecentJul 16, 2026

Large Audio Language Models for Spoofing-Aware Speaker Verification

Sofya Savelyeva, Mariia Perunova, Evgeny Kushnir, Artem Dvirniak +2 more

This paper evaluates the use of large audio language models for speaker verification systems against conventional pipelines and finds that task-specific adaptation improves performance.

View →

cs.SDEmpiricalRecentJul 7, 2026

Music I Care About: Automated Multimodal Benchmarking of LLM Music Perception Skills on (Almost) Any Music

Tomáš Sourada, Katia Vendrame, Jan Hajič

The paper introduces MusICA-MetaBench, a framework for deriving on-demand music perception benchmarks from user-provided data, ensuring statistically reliable model comparisons.

View →

cs.LGcs.CLRecentMay 28, 2026

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more

The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.

View →

cs.CRcs.AIRecentMar 30, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…

View →

cs.SDcs.AIcs.CRRecentJun 4, 2026

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more

The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…

View →

cs.CRcs.AIcs.MAEmpiricalRecentJul 21, 2026

Broken Gates: Re-evaluating Web Bot Defenses in the Age of LLM Agents

Behzad Ousat, Nikita Turkmen, Lalchandra Rampersaud, Dillan Bailey +1 more

This paper evaluates the effectiveness of interactive and non-interactive bot defense systems against commercial Captcha-solving services and LLM-based browser agents.

View →

cs.CYcs.CLcs.CRRecentApr 15, 2026

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more

The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…

View →

cs.CRcs.AIcs.RORecentMay 18, 2026

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Ali Iranmanesh, Peng Liu

This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport…

View →

cs.CRcs.AIcs.SDRecentApr 16, 2026

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang +1 more

The paper introduces AudioHijack, a framework that successfully demonstrates context-agnostic and imperceptible auditory prompt injection attacks, showing that commercial Large Audio-Language Models c…

View →

cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →

cs.CLcs.AIRecentMay 27, 2026

Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning

Yahan Yu, Noa Nakanishi, Fei Cheng

The paper investigates anthropomorphic reflection markers (like 'hmm' or 'wait') in LLM reasoning and finds that these markers are often surface cues, not necessary for strong reasoning performance.

View →

cs.SDEmpiricalRecentJun 19, 2026

When EER Hides Deployment Failure: Auditing Threshold Transfer and Unlabeled Score Calibration for Speech Deepfake Detectors

Jingwen Zhou, Mingzhe Wang

This paper compares the performance of speech deepfake countermeasures using equal error rate (EER) and half total error rate (HTER) on different datasets. It also evaluates the effectiveness of popul…

View →