~ similar to 2605.29843· 18 results
Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more
This paper proposes a preconditioning layer for stable weight conditioning in LLM training.
Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more
This paper proposes a preconditioning layer for stable weight conditioning in LLM training.
SPARQLe is a hardware-software co-design framework that exploits the inherent sub-precision sparsity of LLM activations to reduce memory traffic and enable efficient computation on lower-bit datapaths…
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…
Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc +2 more
This paper optimizes the decoding of Hamming Quasi-Cyclic (HQC) codes for post-quantum cryptography on NPU-integrated mobile devices by redesigning the core kernels to leverage the Hexagon Vector eXte…
Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc +2 more
This paper optimizes the decoding of Hamming Quasi-Cyclic (HQC) codes for post-quantum cryptography on NPU-integrated mobile devices by redesigning the kernels to leverage the Hexagon Vector eXtension…
The paper introduces Rotated Robustness (RoR), a training-free defense that uses orthogonal transformations to prevent catastrophic model collapse in LLMs caused by hardware bit-flip attacks.
The paper introduces Logit-aware Final-block Quantization (LFQ), an enhancement to block-wise quantization that quantizes the final Transformer block using a cross-entropy loss to significantly boost…
This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
Weifang Zhang, Yuzhou Nie, Bowen Pang, Guangrui Ma +1 more
This paper proposes a hybrid scheduler that dynamically switches between exclusive batching and mixed batching for LLM inference, achieving superior throughput, especially on bandwidth-constrained GPU…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more
The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…
The paper introduces Curvature-Conditioned Query (CCQ), a novel read-time contraction mechanism that improves linear attention's performance on long-context and retrieval tasks by incorporating the ge…
EigeNet introduces a geometry-informed multi-modal Transformer framework to achieve state-of-the-art few-shot novel view Room Impulse Response (RIR) prediction by effectively integrating spatial geome…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
Xinjue Wang, Xiuheng Wang, Yejun Zhang, Sergiy A. Vorobyov +2 more
The paper investigates whether using fine-grained, tensorized adapters (CP components) instead of standard LoRA ranks improves the accuracy-budget trade-off in PEFT, finding that while they fill budge…