20 results for “quantization”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Hanna Foerster, Ilia Shumailov, Cheng Zhang, Yiren Zhao +2 more
This paper identifies a critical privacy vulnerability, termed Quantamination, where dynamic quantization in popular ML frameworks can leak sensitive user data across batch boundaries.
This paper systematically analyzes combining dimensionality reduction and quantization to compress text embeddings, showing that this combined approach achieves substantial compression (e.g., 0.1% siz…
HARP introduces a novel, adaptive, learnable orthogonal processor that significantly improves the robustness and accuracy of extreme low-bit LLM quantization compared to fixed methods.
The paper argues that large activation spikes in LLMs are structural vector biases, and proposes a novel quantization framework (INSERTQUANT) to eliminate these spikes, enabling robust low-bit quantiz…
This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…
The paper introduces Logit-aware Final-block Quantization (LFQ), an enhancement to block-wise quantization that quantizes the final Transformer block using a cross-entropy loss to significantly boost…
Jinyang Du, Shenghao Jin, Ziqian Xu, Ruihao Gong +4 more
The paper proposes a compression pipeline combining few-step distillation and low-bit quantization to significantly reduce the deployment cost and parameter footprint of large dual-expert video diffus…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
The paper proposes a Privacy-Preserving Product-Quantization Approximate Nearest Neighbor (PPPQ-ANN) framework that achieves practical performance and strong privacy guarantees for large-scale nearest…
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
The paper introduces QADR, a novel hybrid quantum-classical framework that efficiently trains variational quantum circuits by localizing entanglement reduction, thereby overcoming the exponential memo…
The paper introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) to achieve massive, structured compression of deep neural networks, demonstrating compression ratios up to 77,000x…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
The paper investigates which quantum encodings can be applied directly to classical data point clouds while preserving the topological invariants necessary for topological data analysis (TDA).
The paper introduces 'contrastive privacy,' a formal, model-agnostic, and quantitative method for evaluating the semantic success of AI-based sanitization across multiple media modalities.
Quantum Gatekeeper is a robust, multi-factor context-bound image steganography framework that embeds payloads using LSB and derives a gate key from a Variational Quantum Circuit (VQC), ensuring recove…
The paper introduces Quantum Tunneling-Aware Machine Learning (QTAML) and a compensation algorithm (TAC) that accurately models and compensates for quantum tunneling errors in AI inference, achieving…
The paper reformulates nonreversible perturbations of Fokker--Planck dynamics as gauge fields, providing a unified operator viewpoint to analyze relaxation processes and develop methods for learning o…
The paper introduces GEM, an effective concept erasure framework for Rectified Flow Transformers, by unifying trajectory-based unlearning with classic teacher-guided flow matching.