Stas Bekman is currently focused on deep learning training technologies, specifically exploring and troubleshooting tools like DeepSpeed ZeRO++ and Flash Attention 4 (FA4), while also compiling resources on LLM/VLM training.
Recent Activity
Tensors, for human consumption
Highlights: Lovely Tensors provides intuitive tensor visualization and analysis tools for PyTorch, making complex tensor operations more accessible and interpretable. It focuses on human-friendly representations of tensor data through clear visualizations and statistical summaries.
Worth reading: It addresses the common pain point of debugging and understanding tensor operations in deep learning workflows with practical, ready-to-use visualization utilities.
@stas00
Highlights: Suggests that DeepSpeed's ZeRO++ feature is now available in the master branch and worth trying.
Worth reading: Provides timely update on availability of an important distributed training optimization.
@stas00
Highlights: Warns about a potential issue when experimenting with FA4 (likely FlashAttention 4) related to loading cutlass.cute.
Worth reading: Heads-up for practitioners experimenting with cutting-edge attention implementations.
@stas00
Highlights: Acknowledges a contribution that enhanced the 'Machine Learning Engineering Open book'.
Worth reading: Highlights collaborative improvement of an open educational resource for ML engineering.
@stas00
Highlights: Discusses a hardware bandwidth limit (450GB/s) and significant protocol overhead affecting performance.
Worth reading: Technical insight into performance bottlenecks in high-speed data transfer systems.
@stas00
Highlights: Suggests that DeepSpeed's ZeRO++ feature is now available in the master branch and worth trying.
Worth reading: Provides timely update on availability of an important distributed training optimization.
@stas00
Highlights: Warns about a potential issue when experimenting with FA4 (likely FlashAttention 4) related to loading cutlass.cute.
Worth reading: Heads-up for practitioners experimenting with cutting-edge attention implementations.
@stas00
Highlights: Acknowledges a contribution that enhanced the 'Machine Learning Engineering Open book'.
Worth reading: Highlights collaborative improvement of an open educational resource for ML engineering.
@stas00
Highlights: Discusses a hardware bandwidth limit (450GB/s) and significant protocol overhead affecting performance.
Worth reading: Technical insight into performance bottlenecks in high-speed data transfer systems.
@stas00
Highlights: Stas Bekman is compiling training logbooks/chronicles for LLMs and VLMs, which he considers a valuable resource.
Worth reading: Provides insight into practical documentation and tracking of AI model training processes.
@stas00
Highlights: Highlights the PyTorch team's development of a new important tool, likely related to machine learning infrastructure.
Worth reading: Shows engagement with core ML tooling developments and community updates.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open book, indicating community collaboration on educational resources.
Worth reading: Demonstrates support for open knowledge sharing in ML engineering.
@stas00
Highlights: Discusses memory savings achieved by enabling TiledMLP, a technical optimization for ML models.
Worth reading: Offers practical insight into memory efficiency techniques for ML systems.
@stas00
Highlights: Stas Bekman is compiling training logbooks/chronicles for LLM/VLM models, suggesting he's documenting training processes and methodologies.
Worth reading: Provides insight into systematic documentation practices for machine learning training workflows.
@stas00
Highlights: Acknowledgement of a contribution to the Machine Learning Engineering Open book project, indicating collaborative work on educational resources.
Worth reading: Shows community collaboration in creating open educational materials for ML engineering.
@stas00
Highlights: Discussion about Microsoft's DeepSpeed ZeRO++ optimization framework, suggesting technical evaluation of distributed training tools.
Worth reading: Provides practical guidance on when to adopt cutting-edge optimization frameworks for ML training.
@stas00
Highlights: Creative visualization of PyTorch memory profiling results for Llama-8B model, blending technical analysis with artistic presentation.
Worth reading: Demonstrates innovative approaches to visualizing and understanding model memory usage patterns.
@stas00
Highlights: Stas Bekman is compiling comprehensive training logbooks for LLM/VLM models, which serve as valuable reference materials.
Worth reading: Provides curated resources for understanding LLM/VLM training processes and best practices.
@stas00
Highlights: Collaborative improvement of the Machine Learning Engineering Open book through community contributions.
Worth reading: Shows ongoing development of open educational resources for ML engineering.
@stas00
Highlights: Demonstrates significant memory savings achieved by enabling TiledMLP in ML systems.
Worth reading: Provides practical optimization insight for memory-constrained ML deployments.
@stas00
Highlights: Visual representation of PyTorch memory profiling for Llama-8B model, presented as 'modern art'.
Worth reading: Creative visualization of technical memory profiling data for educational purposes.