Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization | ArxivCSExplorer