20 results for “Multimodal entity linking”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper proposes a lightweight encoder-based MEL solution called FAST-MEL that meets three objectives: high linking accuracy, computational efficiency, and storage efficiency.
Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more
EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.
The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…
Shuheng Cao, Ruiqi Chen, Renjie Cao, Zhenhao Zhang +2 more
The paper introduces BioConCal, a supervised scoring mechanism that evaluates biomedical NER candidates surfaced by multiple LLMs, significantly improving the quality of the candidate pool for human c…
Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more
This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…
Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…
The paper introduces a typed claim network that models cross-document references by explicitly labeling the stance (e.g., agreement, disagreement) of a citation, significantly improving downstream tas…
Leijiang Gu, Zhen Zeng, Feng Li, Xinjian Gao +1 more
The paper proposes Localized and Disentangled Knowledge Editing (LDKE), a framework that significantly improves knowledge editing in Multimodal Large Language Models by ensuring edits are both precise…
Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more
The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.
Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang +1 more
The paper proposes FedMChain, a novel federated learning framework that structures multimodal training into sequential phases to mitigate modality competition and improve model performance while reduc…
This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.
The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…
Xiangyi Chen, Zelun Wang, Xinyi Li, Yi-Ping Hsu +2 more
The paper proposes PrefixMem, a dedicated encoder for Semantic IDs (SIDs), demonstrating that structured, prefix-conditioned representations significantly improve the accuracy and recall of generative…
Zilu Tang, Qiao Zhao, Gabriel Franco, Derry Wijaya +3 more
The paper investigates how language models perform entity tracking across state changes and finds that LMs use a non-incremental, parallel aggregation strategy rather than maintaining a true internal…
The paper proposes MIMO, a two-stage framework that improves Multilingual Information Retrieval (MLIR) by stabilizing cross-lingual alignment and enhancing retrieval discrimination using a combination…
The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…
Joongmin Shin, Gyuho Shim, Jeongbae Park, Jaehyung Seo +1 more
HiKEY proposes a hierarchical, tree-based multimodal retrieval framework that significantly improves open-domain document question answering by addressing document routing and evidence fragmentation.
The paper systematically compares multimodal transformer and LLM approaches for document type classification, finding that specialized multimodal Transformers outperform LLM-based models, especially w…
The paper introduces TechGraphRAG, an advanced, agentic RAG framework that enhances technical literature reasoning by integrating multi-step query refinement, external database searching, and knowledg…
This paper introduces a new benchmark dataset and evaluation framework for 'data snapshot extraction,' focusing on identifying and localizing semantically meaningful analytical artifacts within operat…