Scarlett Li
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1NLP×1
Frequent co-authors
Research Timeline
2026
Demystifying Data Organization for Enhanced LLM Training
This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM training.
Highlighted terms show continued research focus across papers
Papers
cs.AIcs.CLRecentMay 28, 2026
Demystifying Data Organization for Enhanced LLM Training
Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more
This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…
View →