Hong Xu
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Dummy-Aware Weighted Attack (DAWA), a novel evaluation method that significantly reduces the reported robustness of Dummy Classes-based defenses by simultaneously targeting both the true and dummy class labels.
The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves belief state accuracy.
UniScale proposes a unified framework that jointly optimizes model routing and test-time scaling to achieve a superior, fine-grained quality-cost trade-off for large language model inference.
The paper tracks the developmental emergence of attention circuits in 1B-class language models, finding that the formation of induction and attention-sink circuits are distinct, temporally separated transitions.
Papers
When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures
The paper tracks the developmental emergence of attention circuits in 1B-class language models, finding that the formation of induction and attention-sink circuits are distinct, temporally separated t…