Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling | ArxivCSExplorer