ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “text embedding”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLRecentMay 31, 2026

When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

Riku Kisako, Hayato Tsukagoshi, Ryohei Sasano

This paper systematically analyzes combining dimensionality reduction and quantization to compress text embeddings, showing that this combined approach achieves substantial compression (e.g., 0.1% siz…

View →
cs.CLcs.AIcs.LGEmpiricalRecentJun 11, 2026

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková +1 more

This paper introduces SkMTEB, a comprehensive text embedding benchmark for Slovak, and develops efficient, locally-deployable Slovak embeddings.

View →
cs.CLcs.AIRecentMay 27, 2026

PromptEmbedder:: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting

Yu-Che Tsai, Kuan-Yu Chen, Yuan-Hao Chen, Yu-Han Chang +3 more

PromptEmbedder introduces a dual-LLM framework that efficiently and transferably adapts text embeddings by decoupling task-specific knowledge from the backbone model, significantly reducing computatio…

View →
cs.CRcs.AIRecentMay 9, 2026

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Zhenxin Ai, Haiyun He

PASA introduces a robust, semantic-level watermarking technique that embeds and detects watermarks in the latent embedding space, successfully resisting semantic-invariant attacks like paraphrasing.

View →
cs.CVRecentJun 1, 2026

Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization

Liyuan Ma, Xueji Fang, Guo-Jun Qi

Equilibrated Diffusion introduces a frequency-aware approach to image customization, disentangling style and subject content embeddings to achieve superior subject fidelity and text adherence.

View →
cs.CRcs.LGcs.MARecentMay 1, 2026

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

Lingxi Zhang, Guangtao Zheng, Hanjie Chen

This paper analyzes the failure of current embedding-based defenses in multi-agent LLM systems and proposes using token-level confidence scores (logits) for improved robustness.

View →
cs.CRcs.CLRecentMar 24, 2026

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver +1 more

The paper proposes an empirical calibration method, TeDA, to provide a more comparable and interpretable assessment of privacy loss for text rewriting mechanisms under Local Differential Privacy (LDP)…

View →
cs.CLcs.AIcs.CRRecentApr 6, 2026

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang

XMark introduces a novel multi-bit watermarking technique that reliably embeds binary messages into LLM-generated text while maintaining high text quality and robust performance even with limited toke…

View →
cs.CLcs.AIRecentMay 29, 2026

On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets

Ana Gjorgjevikj, Barbara Koroušić Seljak, Tome Eftimov

This paper introduces robustness indicators to systematically analyze how multilingual text embedding model rankings change based on dataset composition and aggregation methods, revealing that only a…

View →
cs.AIRecentMay 27, 2026

Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings

Stanislav Kirdey, Clark Labs Inc

Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.

View →
cs.CLcs.AIcs.CRRecentMay 5, 2026

SWAN: Semantic Watermarking with Abstract Meaning Representation

Ziping Ye, Gourab Dey, Christos Christodoulopoulos, Charith Peris +6 more

SWAN introduces a novel, training-free framework that embeds watermarks directly into the semantic structure of a sentence using Abstract Meaning Representation (AMR), achieving superior robustness ag…

View →
cs.CRcs.AIcs.CLRecentMay 28, 2026

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

Yuexin Li, Wenjie Qu, Linyu Wu, Yulin Chen +4 more

AliMark proposes a novel framework that enhances the robustness of sentence-level watermarking by reformulating the problem as a bit sequence encoding and alignment task, significantly improving resil…

View →
cs.CRcs.AIcs.CLRecentMay 28, 2026

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

Yuexin Li, Wenjie Qu, Linyu Wu, Yulin Chen +4 more

AliMark proposes a novel watermarking framework that treats sentence-level watermarking as a bit sequence alignment problem, significantly enhancing robustness against structural text perturbations li…

View →
cs.IREmpiricalRecentJun 10, 2026

FAST-MEL: A Fast, Accurate, and Storage Efficient Solution for Multimodal Entity Linking

Derrien Thomas, Laurent Amsaleg, Pascale Sébillot

This paper proposes a lightweight encoder-based MEL solution called FAST-MEL that meets three objectives: high linking accuracy, computational efficiency, and storage efficiency.

View →
cs.CRcs.CLRecentApr 28, 2026

MGTEVAL: An Interactive Platform for Systemtic Evaluation of Machine-Generated Text Detectors

Yuanfan Li, Qi Zhou, Chengzhengxu Li, Zhaohan Zhang +4 more

The paper introduces MGTEVAL, a comprehensive and extensible platform designed to systematically evaluate the performance, robustness, and efficiency of machine-generated text detectors.

View →
cs.AIcs.IRRecentMay 28, 2026

Xetrieval: Mechanistically Explaining Dense Retrieval

Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more

Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.

View →
cs.CRcs.AIcs.CLRecentMay 25, 2026

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

Jiahao Huo, Wenjie Qu, Yibo Yan, Kening Zheng +4 more

SAMark introduces a self-anchored text watermarking framework that achieves high robustness (up to 90.2% TP@FP1%) against challenging paragraph-level paraphrasing attacks by establishing a step-indepe…

View →
cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →
cs.CRcs.CLcs.LGRecentJun 3, 2026

Global Sketch-Based Watermarking for Diffusion Language Models

Daniel Zhao

The paper proposes a novel global sketch-based watermarking technique for diffusion language models that controls the entire sequence's statistics, offering an order-agnostic and context-independent a…

View →