20 results for “text embedding”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper systematically analyzes combining dimensionality reduction and quantization to compress text embeddings, showing that this combined approach achieves substantial compression (e.g., 0.1% siz…
Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková +1 more
This paper introduces SkMTEB, a comprehensive text embedding benchmark for Slovak, and develops efficient, locally-deployable Slovak embeddings.
Yu-Che Tsai, Kuan-Yu Chen, Yuan-Hao Chen, Yu-Han Chang +3 more
PromptEmbedder introduces a dual-LLM framework that efficiently and transferably adapts text embeddings by decoupling task-specific knowledge from the backbone model, significantly reducing computatio…
PASA introduces a robust, semantic-level watermarking technique that embeds and detects watermarks in the latent embedding space, successfully resisting semantic-invariant attacks like paraphrasing.
Equilibrated Diffusion introduces a frequency-aware approach to image customization, disentangling style and subject content embeddings to achieve superior subject fidelity and text adherence.
This paper analyzes the failure of current embedding-based defenses in multi-agent LLM systems and proposes using token-level confidence scores (logits) for improved robustness.
Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver +1 more
The paper proposes an empirical calibration method, TeDA, to provide a more comparable and interpretable assessment of privacy loss for text rewriting mechanisms under Local Differential Privacy (LDP)…
XMark introduces a novel multi-bit watermarking technique that reliably embeds binary messages into LLM-generated text while maintaining high text quality and robust performance even with limited toke…
This paper introduces robustness indicators to systematically analyze how multilingual text embedding model rankings change based on dataset composition and aggregation methods, revealing that only a…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
SWAN introduces a novel, training-free framework that embeds watermarks directly into the semantic structure of a sentence using Abstract Meaning Representation (AMR), achieving superior robustness ag…
Yuexin Li, Wenjie Qu, Linyu Wu, Yulin Chen +4 more
AliMark proposes a novel framework that enhances the robustness of sentence-level watermarking by reformulating the problem as a bit sequence encoding and alignment task, significantly improving resil…
Yuexin Li, Wenjie Qu, Linyu Wu, Yulin Chen +4 more
AliMark proposes a novel watermarking framework that treats sentence-level watermarking as a bit sequence alignment problem, significantly enhancing robustness against structural text perturbations li…
This paper proposes a lightweight encoder-based MEL solution called FAST-MEL that meets three objectives: high linking accuracy, computational efficiency, and storage efficiency.
Yuanfan Li, Qi Zhou, Chengzhengxu Li, Zhaohan Zhang +4 more
The paper introduces MGTEVAL, a comprehensive and extensible platform designed to systematically evaluate the performance, robustness, and efficiency of machine-generated text detectors.
Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more
Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.
Jiahao Huo, Wenjie Qu, Yibo Yan, Kening Zheng +4 more
SAMark introduces a self-anchored text watermarking framework that achieves high robustness (up to 90.2% TP@FP1%) against challenging paragraph-level paraphrasing attacks by establishing a step-indepe…
Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more
The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…
The paper proposes a novel global sketch-based watermarking technique for diffusion language models that controls the entire sequence's statistics, offering an order-agnostic and context-independent a…