20 results for “Target-SFT”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
The paper introduces a kill-chain canary methodology to diagnose prompt injection vulnerabilities across multi-stage LLM pipelines, revealing that write-node placement and document format are critical…
Sina Mavali, David Pape, Jonathan Evertz, Samira Abedini +4 more
The paper introduces the Task Alignment Benchmark (TAB) to evaluate terminal agents' ability to selectively follow relevant environmental instructions while ignoring misleading distractors, revealing…
The paper introduces a sample-wise targeted adversarial attack that successfully misclassifies only specific, triggered inputs during test-time adaptation while maintaining the overall label distribut…
Melissa Pappy, Linh Nguyen, Suman Kumar, Byungkwan Jung +1 more
The paper introduces STRIKE, a multi-dimensional structured taxonomy designed to provide a comprehensive and unified framework for classifying the rapidly evolving complexity of modern cybercrimes.
The ACTING platform addresses the need for interoperable cyber-range training by providing a structured language (EDL-FG) for scenario description and automated evaluation mechanisms for complex, mult…
Samuel Ndichu, Tao Ban, Seiichi Ozawa, Takeshi Takahashi +1 more
PACT is a Pareto-aware active learning controller that significantly reduces the false-positive investigation burden in low-prevalence security alert streams without sacrificing recall.
The paper introduces Symbolicate-Enrich-Sample, a pipeline that efficiently filters millions of functions in a Windows OS to create a highly prioritized, manageable shortlist of potential vulnerabilit…
The paper introduces Symbolicate-Enrich-Sample, a low-cost pipeline that drastically reduces the search space of a whole operating system by prioritizing vulnerable functions, turning millions of pote…
This study re-evaluates LLM package hallucination rates on a new cohort of frontier models, finding a significant reduction in overall hallucination rates but identifying a persistent, model-agnostic…
The paper proposes 2FFS, a two-fidelity tree-search algorithm that efficiently identifies the best action in stochastic minimax trees by adaptively combining cheap, biased heuristic evaluations with e…
MimeLens is a novel, position-agnostic BERT-style encoder that accurately detects file types from arbitrary binary fragments, outperforming existing methods like Magika, especially on non-standard inp…
The paper introduces SAFT-GT, a comprehensive model-based toolchain designed to simultaneously analyze and enhance both the safety and security of complex, self-adaptive systems.
The paper introduces SST-Guard, a multi-modal browser-based system that detects and blocks server-side Google Analytics (sGA) by identifying the semantic patterns of collected data rather than relying…
The paper introduces Set-Distance Rewards (SDR), a permutation-invariant reward signal that effectively guides the generation of unordered radiology reports, significantly outperforming standard train…
This paper introduces Swiss-Bench 003, an expanded evaluation framework assessing LLM reliability and adversarial security across eight dimensions using 808 Swiss-specific items, revealing that self-g…
The paper proposes detecting 'alignment faking' (AF)—where LLMs revert to unsafe behavior when unmonitored—by analyzing observable tool selection patterns, finding that detection rates vary significan…
Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu +2 more
ChainCaps introduces a novel runtime capability budgeting system that prevents 'permission laundering' in complex tool-using agents, significantly reducing attack success rates while maintaining benig…
Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more
The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…
This systematic review analyzes the current state of SMS phishing (smishing) attacks and defenses, organizing existing research into four pillars to identify gaps and propose future mitigation strateg…