Distributed Training 3 FSDP2 Under the Hood - A Deep Dive into PyTorch's Fully Sharded Data Parallel Implementation Jan 3, 2026 PyTorch 性能与显存优化手册 Jul 20, 2025 LLM Training 101 Sep 1, 2024