Architectures

Neural network architectures, attention mechanisms, and model design

20 papers indexed

cs.AIcs.CLcs.LGRecentJun 1, 2026

Forget Attention: Importance-Aware Attention Is All You Need

The paper proposes SISA (SSM-Informed Softmax Attention), a novel hybrid attention mechanism that integrates state-space model (SSM) importance signals directly into the attention score, achieving sta…

View →

cs.CVcs.AIRecentMay 29, 2026

Zamba2-VL Technical Report

Hassan Shapourian, Kasra Hejazi, Olabode M. Sule, Beren Millidge

Zamba2-VL is a new suite of vision-language models built on the Zamba2 hybrid architecture, achieving state-of-the-art performance and significantly improved inference efficiency compared to leading T…

View →

cs.IRcs.AIcs.LGEmpiricalRecentJun 26, 2026

Bifocal Diffusion Language Models: Asymmetric Bidirectional Context for Parallel Generation

Yuhang Chen, Xianfeng Wu, Jinhao Duan, Mingfu Liang +10 more

This paper introduces Bifocal dLLMs (R2LM), a new paradigm for discrete diffusion language models that combines causal and bidirectional attention for improved throughput and generation quality.

View →

cs.CRcs.LGRecentApr 2, 2026

AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

Vickson Ferrel

AEGIS introduces a novel physics-based system that analyzes encrypted network traffic flow dynamics, achieving state-of-the-art zero-day evasion detection with high accuracy and low latency.

View →

cs.LGcs.CLRecentMay 30, 2026

Task Structure Reverses Layerwise State Encoding in Sequence Models

Yuhang Jiang

The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…

View →

cs.AIcs.HCcs.LGRecentMay 27, 2026

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models

Abhilash Durgam, Nyle Siddiqui, Jeffrey A. Chan-Santiago, Qiushi Fu +2 more

CaMBRAIN introduces a novel Mamba-based State Space Model (SSM) for real-time, continuous EEG inference, achieving state-of-the-art results with significantly higher throughput than existing methods.

View →

cs.CVEmpiricalRecentJul 24, 2026

IR275K: A Benchmark for Infrared Multi-Frame Super-Resolution Toward Efficient Remote Sensing

Jie Deng, Heyang Wang, Changxin Wang, Junkai Shen +5 more

This paper introduces IR275K, a curated benchmark for multi-frame super-resolution in infrared remote sensing, and evaluates CGMamba, a lightweight state-space model, achieving state-of-the-art perfor…

View →

cs.IRRecentJun 2, 2026

MARS: Multi-rate Aggregation of Recency Signals for Sequential Recommendation across Sparse and Dense Regimes

Zhenyu Yu, Shuigeng Zhou

MARS proposes an encoder-agnostic aggregation operator that explicitly models multi-scale temporal structure in sequential recommendation, achieving state-of-the-art performance across both sparse and…

View →

cs.LGcs.AIcs.NEEmpiricalRecentJun 30, 2026

EVOTS: Evolutionary Transformer Search for Time Series Forecasting

AbdElRahman ElSaid, Damir Pulatov

This paper introduces EVOTS, an evolutionary neural architecture search framework for discovering task-adaptive Transformer-like models for multivariate time-series forecasting.

View →

cs.CLcs.CRcs.LGRecentApr 3, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko

The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…

View →

cs.CRcs.AIcs.CLRecentApr 4, 2026

Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks

Manoj Parmar

This paper provides the first systematic threat analysis of State-Space Models (SSMs) in safety-critical applications, introducing novel attack classes and formal metrics to quantify their security an…

View →

cs.CVEmpiricalRecentJul 16, 2026

DAPGNet: Dynamic Adaptive Physics-Guided Graph Diffusion Network for Hyperspectral Image Classification

Pengkun Wang, Weijia Cao, Ning Wang, Xiaofei Yang

This paper proposes DAPGNet, a dynamic adaptive physics-guided graph diffusion network for hyperspectral image classification, which achieves state-of-the-art performance on four datasets.

View →

cs.CLcs.AIcs.CYEmpiricalRecentJul 22, 2026

Learning the Arabic Dialect Continuum as a Continuous Space: A Regression Approach to Speaker Origin Prediction

Mohamed Aziz Khadraoui, Adel Ammar, Bilel Benjdira, Zahid Khan +2 more

A regression-based approach is presented for Arabic dialect geolocation using a hierarchical neural architecture and spherical geodesic loss, achieving a median localization error of 481.2 km.

View →

cs.CRcs.AIcs.CVRecentApr 6, 2026

SE-Enhanced ViT and BiLSTM-Based Intrusion Detection for Secure IIoT and IoMT Environments

Afrah Gueriani, Hamza Kheddar, Ahmed Cherif Mazari, Seref Sagiroglu +1 more

The paper proposes an SE ViT-BiLSTM hybrid model for enhanced intrusion detection in IIoT and IoMT environments, achieving superior performance on real-world datasets, especially after data balancing.

View →

eess.SPEmpiricalRecentJun 18, 2026

ConsisFormer: Compute-Efficient Transformer for Wireless Foundation Models Based on Channel Consistency

Yuwei Wang, Li Sun, Tingting Yang, Liwen Jing +3 more

This paper proposes ConsisFormer, a compute-efficient Transformer design for wireless foundation models (WFMs) using short-term channel consistency and adaptive token aggregation.

View →

cs.CVcs.AIRecentMay 28, 2026

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Yuyang Zhao, Yicheng Pan, Qiyuan He, Jincheng Yu +5 more

SANA-Streaming introduces a novel, efficient framework that enables real-time, high-resolution streaming video-to-video editing by combining a hybrid diffusion transformer with specialized training an…

View →

cs.SDcs.AIcs.MMRecentMay 27, 2026

EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

Chong Jing, Zitong Lan, Junan Zhang, Zhizheng Wu

EigeNet introduces a geometry-informed multi-modal Transformer framework to achieve state-of-the-art few-shot novel view Room Impulse Response (RIR) prediction by effectively integrating spatial geome…

View →

cs.AIcs.LGcs.NEPositionRecentJul 22, 2026

The Giant Hippocampus: From Structural Monoculture to a System of Systems

Jaeho Seol

This paper argues for the importance of modularity and heterogeneity in AI architectures, contrasting the Transformer model with the structure of the cortex.

View →

cs.CLcs.AIcs.LGRecentMay 30, 2026

Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

Yuhang Jiang

The paper demonstrates that in Mamba-2, single-bucket probes can detect a large functional signature (detection layer) that is not fully responsible for the actual computation (execution layer), chall…

View →

cs.ARcs.AIcs.NERecentJun 4, 2026

ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training

Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough +4 more

This paper proposes and validates a novel hardware architecture, ITP-STDP, to significantly reduce the energy consumption and hardware overhead associated with training Spiking Neural Networks (SNNs).

View →