Aleksandr Beznosikov
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
HARP introduces a novel, adaptive, learnable orthogonal processor that significantly improves the robustness and accuracy of extreme low-bit LLM quantization compared to fixed methods.
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve practical end-to-end speedup.
Papers
Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…