20 results for “DeepSeek-V3”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
The paper systematically analyzes the benefits and limits of Attention-FFN Disaggregation (AFD) for Mixture-of-Experts (MoE) LLM serving, demonstrating that AFD is crucial for achieving high throughpu…
This paper investigates the redundancy of the prompt KV cache during language model decoding, finding that the structure provided by chat templates is the primary source of redundancy, not the actual…
肖代替了视觉令牌的永久删除,通过可恢复的路由来改进视觉语言模型的性能
Geng Li, Guohao Chen, Ting Chen, Shilin Shan +5 more
OccamToken introduces a training-free, adaptive token pruning framework that replaces fixed token budgets with relative evidence testing against a register-based reference, significantly improving VLM…
Bohan Yang, Yijun Gong, Zhi Zhang, Ge Zhang +2 more
TriLens is a white-box detector that monitors the entropy of three internal streams (attention, feed-forward, residual) at every layer of a language model to detect hallucinations by tracking how inte…
This paper addresses the vulnerability of DNNs used in robotic semantic segmentation to adversarial attacks by proposing specialized detection strategies to enhance safety in robotic perception system…
DeepIPCv3 is a novel multi-modal framework that fuses LiDAR and DVS event streams using cross-modal attention to achieve state-of-the-art, highly reactive avoidance maneuvers for sudden pedestrian cro…
Liang He, Jingbo Wen, Qishi Zhan, Yixiong Chen +3 more
BudgetDraft introduces an acceptance-aware multi-view training method that trains a sparse-KV speculative decoder to maintain high acceptance rates across varying context lengths and sparsity levels,…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…
TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.
This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.
This study re-evaluates LLM package hallucination rates on a new cohort of frontier models, finding a significant reduction in overall hallucination rates but identifying a persistent, model-agnostic…
This paper proposes a 3D CNN detector that leverages temporal artifacts to accurately identify high-quality deepfake videos, demonstrating robust detection even after social media re-encoding.
Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more
This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…
PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…
Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng +2 more
AttnDiff introduces a data-efficient white-box framework that extracts intrinsic attention-based fingerprints to verify the provenance and detect unauthorized derivation of large language models (LLMs…
The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…
Yongjie Wang, Xinyue Zhang, Kunhong Yao, Zhiwei Zeng +3 more
The paper introduces the concept of Search-Time Contamination (STC), demonstrating that deep research agents can leak information from public benchmarks via web search, leading to an overestimation of…