Architectures
Neural network architectures, attention mechanisms, and model design
20 papers indexed
Forget Attention: Importance-Aware Attention Is All You Need
The paper proposes SISA (SSM-Informed Softmax Attention), a novel hybrid attention mechanism that integrates state-space model (SSM) importance signals directly into the attention score, achieving sta…
Zamba2-VL Technical Report
Zamba2-VL is a new suite of vision-language models built on the Zamba2 hybrid architecture, achieving state-of-the-art performance and significantly improved inference efficiency compared to leading T…
AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection
AEGIS introduces a novel physics-based system that analyzes encrypted network traffic flow dynamics, achieving state-of-the-art zero-day evasion detection with high accuracy and low latency.
Task Structure Reverses Layerwise State Encoding in Sequence Models
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models
CaMBRAIN introduces a novel Mamba-based State Space Model (SSM) for real-time, continuous EEG inference, achieving state-of-the-art results with significantly higher throughput than existing methods.
MARS: Multi-rate Aggregation of Recency Signals for Sequential Recommendation across Sparse and Dense Regimes
MARS proposes an encoder-agnostic aggregation operator that explicitly models multi-scale temporal structure in sequential recommendation, achieving state-of-the-art performance across both sparse and…
Learning the Signature of Memorization in Autoregressive Language Models
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks
This paper provides the first systematic threat analysis of State-Space Models (SSMs) in safety-critical applications, introducing novel attack classes and formal metrics to quantify their security an…
SE-Enhanced ViT and BiLSTM-Based Intrusion Detection for Secure IIoT and IoMT Environments
The paper proposes an SE ViT-BiLSTM hybrid model for enhanced intrusion detection in IIoT and IoMT environments, achieving superior performance on real-world datasets, especially after data balancing.
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer
Yuyang Zhao, Yicheng Pan, Qiyuan He, Jincheng Yu +5 more
SANA-Streaming introduces a novel, efficient framework that enables real-time, high-resolution streaming video-to-video editing by combining a hybrid diffusion transformer with specialized training an…
EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction
EigeNet introduces a geometry-informed multi-modal Transformer framework to achieve state-of-the-art few-shot novel view Room Impulse Response (RIR) prediction by effectively integrating spatial geome…
ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training
Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough +4 more
This paper proposes and validates a novel hardware architecture, ITP-STDP, to significantly reduce the energy consumption and hardware overhead associated with training Spiking Neural Networks (SNNs).
Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction
Jiafu Huang, Chao Peng, Chenyang Xu, Zhengfeng Yang +6 more
The paper proposes using an auxiliary reconstruction task, specifically one that captures intra-state feature dependencies, to improve the quality of state representations learned by the encoder in ne…
Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink
The paper demonstrates that in Mamba-2, single-bucket probes can detect a large functional signature (detection layer) that is not fully responsible for the actual computation (execution layer), chall…
LALE: Lightweight-Transformer Architecture for Land-Cover Estimation
LALE introduces a novel lightweight architecture that efficiently combines local convolutional features and global transformer context for land-cover segmentation, achieving superior efficiency and pe…
Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation
This study systematically evaluates Vision Mamba models for detecting AI-generated images, finding that while they show promise, their current strengths and limitations must be understood relative to…
MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining
MambaNetBurst introduces a compact, tokenizer-free byte-level classifier using a Mamba-2 backbone to achieve strong network traffic classification without requiring pre-training or complex data prepro…
On Efficient Scaling of GNNs via IO-Aware Layers Implementations
This paper develops specialized, I/O-aware GPU kernels for common GNN layer types, achieving significant speedups and memory reductions compared to existing frameworks.
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
LiteGuard proposes an efficient task-agnostic model fingerprinting framework that achieves enhanced generalization and significantly reduces computational overhead compared to existing methods like Me…
Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
This paper investigates the application of Parameter-Efficient Fine-Tuning (PEFT) methods, specifically adapters and LoRA, to large pretrained models for instance segmentation, demonstrating that thes…