Yu Yu
11 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Behavioral Canaries, a novel auditing mechanism that detects unauthorized use of private retrieved context data during Reinforcement Learning Fine-Tuning (RLFT) by inducing detectable stylistic behavioral changes.
This paper demonstrates a novel attack against the shuffling defense used in secure Transformer inference, showing that randomly permuted activations can still be exploited to recover model weights.
The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and environment.
The paper introduces CORE, a contrastive evidence organization method, which significantly improves the accuracy of LLM-based predictions of gene expression changes following cellular perturbations by reframing the task as a comparison between related conditions.
The paper proposes GloResNet, a lightweight 3D CNN that effectively predicts brain injury in preterm infants using T2-weighted MRI, achieving an average accuracy of 75.18%.
The paper introduces X-Stream, a new benchmark for multi-stream video understanding, and finds that current state-of-the-art MLLMs perform poorly when required to process multiple concurrent video streams.
The paper introduces MeRa, a metric-space bias module, demonstrating that latent reasoning only improves spatial prediction when it is explicitly grounded in the underlying metric space.
MARS proposes an encoder-agnostic aggregation operator that explicitly models multi-scale temporal structure in sequential recommendation, achieving state-of-the-art performance across both sparse and dense data regimes.
Ghost introduces a manifold-aligned framework to generate plausible yet unlearnable synthetic check-in trajectories, significantly degrading the accuracy of next-POI prediction models without sacrificing realism.
Ghost introduces a manifold-aligned framework to generate plausible, unlearnable synthetic check-in trajectories that significantly degrade an attacker's ability to predict future locations.
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Papers
Caliper: Probing Lexical Anchors versus Causal Structure in LLMs
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.