20 results for “large language models”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper analyzes the multilinguality of LLMs by examining their structural properties, finding that low-resource languages are structurally more distinct from English than high-resource languages,…
The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…
Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková +1 more
This paper introduces SkMTEB, a comprehensive text embedding benchmark for Slovak, and develops efficient, locally-deployable Slovak embeddings.
Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more
This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…
The paper proposes EPIC, an efficient and parallel decoding framework that significantly speeds up the process of constraining diffusion language model outputs using Context-Free Grammars (CFG).
The paper introduces MIDI, a novel multilingual dataset that embeds idioms in realistic sentence and conversational contexts across diverse resource levels, revealing that idiom comprehension is signi…
The paper introduces PortBERT, a family of RoBERTa-based language models for Portuguese, which achieves competitive performance while explicitly balancing efficiency and accuracy.
ProbScale is a novel framework that combines neural scaling laws and language model probing to identify highly efficient, task-specific subnetworks within pre-trained Small Language Models, achieving…
This paper demonstrates that fine-tuning small language models (SLMs) on a synthetic, solution-rich Windows event log dataset allows them to outperform larger LLMs in identifying issues and providing…
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…
Junhyuck Kim, Jihun Yun, Haechan Kim, Gyeongman Kim +2 more
The paper introduces a systematic framework to convert large Mixture-of-Experts (MoE) models into memory-efficient, fully dense architectures, achieving superior performance compared to traditional pr…
This study systematically analyzes strategies for creating reliable multilingual LLMs-as-a-judge, finding that fine-tuning smaller models with in-domain data is effective, while zero-shot evaluation w…
This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…
Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more
The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.
Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more
This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…
Jiarui Feng, Hanqing Zeng, Karish Grover, Ruizhong Qiu +10 more
The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step r…
The paper proposes using Maximum Independent Set (MIS) algorithms on similarity graphs to select a maximally diverse and non-redundant subset of prompts for LLM benchmarking, achieving consistent rank…
The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…
Yaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang +4 more
The paper introduces LLMSurgeon, a framework that estimates the domain-level data mixture of a Large Language Model (LLM) using only generated text, thereby providing a post-hoc method to audit the mo…
PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…