"Multimodal entity linking"

20 results for “Multimodal entity linking”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.IREmpiricalRecentJun 10, 2026

FAST-MEL: A Fast, Accurate, and Storage Efficient Solution for Multimodal Entity Linking

Derrien Thomas, Laurent Amsaleg, Pascale Sébillot

This paper proposes a lightweight encoder-based MEL solution called FAST-MEL that meets three objectives: high linking accuracy, computational efficiency, and storage efficiency.

View →

cs.CLcs.AIRecentMay 28, 2026

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more

EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.

View →

cs.CLcs.AIRecentMay 30, 2026

MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models

Ravil Mussabayev, Rustam Mussabayev

The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…

View →

cs.CLcs.AIRecentMay 29, 2026

Beyond Agreement: Scoring Panel-Surfaced Biomedical Entity Candidates for Curator Triage

Shuheng Cao, Ruiqi Chen, Renjie Cao, Zhenhao Zhang +2 more

The paper introduces BioConCal, a supervised scoring mechanism that evaluates biomedical NER candidates surfaced by multiple LLMs, significantly improving the quality of the candidate pool for human c…

View →

cs.CLRecentJun 1, 2026

When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models

Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more

This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…

View →

cs.AIcs.DBcs.IRRecentMay 29, 2026

Vector Linking via Cross-Model Local Isometric Consistency

Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more

The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…

View →

cs.IRcs.AIcs.CLRecentMay 29, 2026

Reading Between the Citations: A Typed Claim Network for Scientific Literature

Ning Ding, Sergio J. Rodríguez Méndez, Pouya G. Omran

The paper introduces a typed claim network that models cross-document references by explicitly labeling the stance (e.g., agreement, disagreement) of a citation, significantly improving downstream tas…

View →

cs.CLcs.AIRecentMay 28, 2026

Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models

Leijiang Gu, Zhen Zeng, Feng Li, Xinjian Gao +1 more

The paper proposes Localized and Disentangled Knowledge Editing (LDKE), a framework that significantly improves knowledge editing in Multimodal Large Language Models by ensuring edits are both precise…

View →

cs.CLcs.LGRecentMay 29, 2026

Scaling Multi-Hop Training Data via Graph-Constrained Path Selection

Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more

The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.

View →

cs.DCcs.AIRecentJun 1, 2026

Boosting Multimodal Federated Learning via Chained Modality Optimization

Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang +1 more

The paper proposes FedMChain, a novel federated learning framework that structures multimodal training into sequential phases to mitigate modality competition and improve model performance while reduc…

View →

cs.CLcs.IREmpiricalRecentJun 10, 2026

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Simon Lupart, Kidist Amde Mekonnen, Zahra Abbasiantaeb, Mohammad Aliannejadi

This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Cross-modal linkage risk in clinical vision-language models

Soroosh Tayebi Arasteh, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn

The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…

View →

cs.IRcs.AIRecentMay 29, 2026

LLMs Need Encoders for Semantic IDs Too

Xiangyi Chen, Zelun Wang, Xinyi Li, Yi-Ping Hsu +2 more

The paper proposes PrefixMem, a dedicated encoder for Semantic IDs (SIDs), demonstrating that structured, prefix-conditioned representations significantly improve the accuracy and recall of generative…

View →

cs.CLcs.AIRecentMay 28, 2026

Do Language Models Track Entities Across State Changes?

Zilu Tang, Qiao Zhao, Gabriel Franco, Derry Wijaya +3 more

The paper investigates how language models perform entity tracking across state changes and finds that LMs use a non-incremental, parallel aggregation strategy rather than maintaining a true internal…

View →

cs.IRcs.AIRecentMay 29, 2026

MIMO: Multilingual Information Retrieval via Monolingual Objectives

Youngjoon Jang, Seongtae Hong, Heuiseok Lim

The paper proposes MIMO, a two-stage framework that improves Multilingual Information Retrieval (MLIR) by stabilizing cross-lingual alignment and enhancing retrieval discrimination using a combination…

View →

cs.CLRecentMay 29, 2026

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…

View →

cs.AIcs.IRRecentMay 28, 2026

HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering

Joongmin Shin, Gyuho Shim, Jeongbae Park, Jaehyung Seo +1 more

HiKEY proposes a hierarchical, tree-based multimodal retrieval framework that significantly improves open-domain document question answering by addressing document routing and evidence fragmentation.

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

Catyana Heyne, Jürgen Frikel, Filippo Riccio

The paper systematically compares multimodal transformer and LLM approaches for document type classification, finding that specialized multimodal Transformers outperform LLM-based models, especially w…

View →

cs.IRcs.AIcs.MARecentJun 1, 2026

TechGraphRAG: An Agentic Graph-Augmented RAG Framework for Technical Literature Reasoning

Kanwar Bharat Singh

The paper introduces TechGraphRAG, an advanced, agentic RAG framework that enhances technical literature reasoning by integrating multi-step query refinement, external database searching, and knowledg…

View →

cs.CLcs.AIcs.CVRecentJun 4, 2026

Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents

AJ Carl P. Dy, Aivin V. Solatorio

This paper introduces a new benchmark dataset and evaluation framework for 'data snapshot extraction,' focusing on identifying and localizing semantically meaningful analytical artifacts within operat…

View →