All People
Stas Bekman

Stas Bekman

Hugging Face, training expert

Recent InterestsAI

Stas Bekman is currently focused on deep learning training technologies, specifically exploring and troubleshooting tools like DeepSpeed ZeRO++ and Flash Attention 4 (FA4), while also compiling resources on LLM/VLM training.

Recent Activity1 stars · 20 x-posts

Recent Activity

xl0/lovely-tensors
Jupyter Notebook1,384·starred by stas00

Tensors, for human consumption

Highlights: Lovely Tensors provides intuitive tensor visualization and analysis tools for PyTorch, making complex tensor operations more accessible and interpretable. It focuses on human-friendly representations of tensor data through clear visualizations and statistical summaries.

Worth reading: It addresses the common pain point of debugging and understanding tensor operations in deep learning workflows with practical, ready-to-use visualization utilities.

deep-learninglibrarypytorchstatisticsvisualization
Tooling
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Suggests that DeepSpeed's ZeRO++ feature is now available in the master branch and worth trying.

Worth reading: Provides timely update on availability of an important distributed training optimization.

InfraDeployment
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a potential issue when experimenting with FA4 (likely FlashAttention 4) related to loading cutlass.cute.

Worth reading: Heads-up for practitioners experimenting with cutting-edge attention implementations.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges a contribution that enhanced the 'Machine Learning Engineering Open book'.

Worth reading: Highlights collaborative improvement of an open educational resource for ML engineering.

Tooling
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage.

Highlights: Discusses a hardware bandwidth limit (450GB/s) and significant protocol overhead affecting performance.

Worth reading: Technical insight into performance bottlenecks in high-speed data transfer systems.

Infra
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Suggests that DeepSpeed's ZeRO++ feature is now available in the master branch and worth trying.

Worth reading: Provides timely update on availability of an important distributed training optimization.

InfraDeployment
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a potential issue when experimenting with FA4 (likely FlashAttention 4) related to loading cutlass.cute.

Worth reading: Heads-up for practitioners experimenting with cutting-edge attention implementations.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges a contribution that enhanced the 'Machine Learning Engineering Open book'.

Worth reading: Highlights collaborative improvement of an open educational resource for ML engineering.

Tooling
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage.

Highlights: Discusses a hardware bandwidth limit (450GB/s) and significant protocol overhead affecting performance.

Worth reading: Technical insight into performance bottlenecks in high-speed data transfer systems.

Infra
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to...

Highlights: Stas Bekman is compiling training logbooks/chronicles for LLMs and VLMs, which he considers a valuable resource.

Worth reading: Provides insight into practical documentation and tracking of AI model training processes.

LLMEvaluation
The @PyTorch team are working on a new super important tool: https://t.co/rnfpDuvgOI

Highlights: Highlights the PyTorch team's development of a new important tool, likely related to machine learning infrastructure.

Worth reading: Shows engagement with core ML tooling developments and community updates.

ToolingInfra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can...

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open book, indicating community collaboration on educational resources.

Worth reading: Demonstrates support for open knowledge sharing in ML engineering.

ToolingDeployment
To remind - this is the memory saving you get when enabling TiledMLP :) Left: normal memory...

Highlights: Discusses memory savings achieved by enabling TiledMLP, a technical optimization for ML models.

Worth reading: Offers practical insight into memory efficiency techniques for ML systems.

InfraTooling
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to...

Highlights: Stas Bekman is compiling training logbooks/chronicles for LLM/VLM models, suggesting he's documenting training processes and methodologies.

Worth reading: Provides insight into systematic documentation practices for machine learning training workflows.

LLMEvaluation
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can...

Highlights: Acknowledgement of a contribution to the Machine Learning Engineering Open book project, indicating collaborative work on educational resources.

Worth reading: Shows community collaboration in creating open educational materials for ML engineering.

ToolingDeployment
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should...

Highlights: Discussion about Microsoft's DeepSpeed ZeRO++ optimization framework, suggesting technical evaluation of distributed training tools.

Worth reading: Provides practical guidance on when to adopt cutting-edge optimization frameworks for ML training.

InfraTooling
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the...

Highlights: Creative visualization of PyTorch memory profiling results for Llama-8B model, blending technical analysis with artistic presentation.

Worth reading: Demonstrates innovative approaches to visualizing and understanding model memory usage patterns.

LLMToolingEvaluation
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to...

Highlights: Stas Bekman is compiling comprehensive training logbooks for LLM/VLM models, which serve as valuable reference materials.

Worth reading: Provides curated resources for understanding LLM/VLM training processes and best practices.

LLMFine-tuning
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can...

Highlights: Collaborative improvement of the Machine Learning Engineering Open book through community contributions.

Worth reading: Shows ongoing development of open educational resources for ML engineering.

ToolingDeployment
To remind - this is the memory saving you get when enabling TiledMLP :) Left: normal memory...

Highlights: Demonstrates significant memory savings achieved by enabling TiledMLP in ML systems.

Worth reading: Provides practical optimization insight for memory-constrained ML deployments.

InfraDeployment
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the...

Highlights: Visual representation of PyTorch memory profiling for Llama-8B model, presented as 'modern art'.

Worth reading: Creative visualization of technical memory profiling data for educational purposes.

LLMInfraTooling
1 repos · 20 x-posts · All time