All People
Stas Bekman

Stas Bekman

Hugging Face, training expert

Recent Activity108 x-posts

Recent Activity

If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Stas Bekman warns about a common issue with FA4 (Flash Attention 4) involving cutlass.cute loading.

Worth reading: Useful for developers experimenting with Flash Attention 4.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges contribution to the ML Engineering Open Book.

Worth reading: Shows collaborative development of ML resources.

LLMTooling
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns ...

Highlights: Announces a new section on training loss patterns in ML Engineering.

Worth reading: Provides educational content on understanding training loss.

Fine-tuningLLM
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the ...

Highlights: Humorously compares PyTorch memory profiler output to modern art.

Worth reading: Illustrates memory profiling challenges in LLM training.

InfraLLM
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Stas Bekman compiles LLM/VLM training logbooks, providing a valuable resource for training insights.

Worth reading: Essential for anyone involved in training large language or vision models.

LLMFine-tuningInfra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, expanding its capabilities.

Worth reading: Highlights collaborative improvements to open-source ML engineering resources.

ToolingInfra
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns ...

Highlights: Introduces a new section on understanding training loss patterns in ML engineering.

Worth reading: Provides crucial knowledge for diagnosing and improving model training.

LLMFine-tuningEvaluation
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the ...

Highlights: Uses PyTorch memory profiler output as a form of modern art, showing memory patterns of Llama-8B.

Worth reading: Creative visualization of memory profiling, useful for understanding model memory usage.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Announces a contribution to the Machine Learning Engineering Open Book.

Worth reading: Highlights community collaboration in ML engineering resources.

ToolingInfra
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Encourages trying DeepSpeed ZeRO++ as it should be functional on master.

Worth reading: Useful for those waiting to test DeepSpeed ZeRO++ optimizations.

InfraFine-tuning
When dealing with tight gpu memory situations try to change a PyTorch version to both newer and older and it might just do the trick. I get massively different memory usage patterns with pt 2.4 to 2.8. e.g. today I tried last night's nightly and it was using 6GB less memory

Highlights: Changing PyTorch version can significantly reduce GPU memory usage, with differences of up to 6GB between versions.

Worth reading: Practical tip for ML practitioners facing memory constraints during training.

InfraFine-tuning
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Stas Bekman compiles training logbooks for LLM/VLM, providing valuable resources.

Worth reading: Curated logbooks help practitioners learn from real training experiences.

LLM
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: The ML Engineering Open Book receives contributions to expand its content.

Worth reading: Open-source ML engineering book is actively improved by the community.

ToolingLLM
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns ...

Highlights: New section on understanding training loss patterns added to ML Engineering resources.

Worth reading: Helps practitioners diagnose and improve model training by analyzing loss curves.

Fine-tuningEvaluation
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage.

Highlights: Discusses bandwidth limitations and protocol overhead in GPU interconnects.

Worth reading: Provides insight into hardware performance constraints relevant to ML infrastructure.

Infra
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Introduces a new performance metric for matrix multiplication.

Worth reading: Relevant for evaluating and optimizing ML model performance.

EvaluationInfra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Announces a contribution to the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to open-source ML education.

Tooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a common issue with FA4 and cutlass.cute loading.

Worth reading: Practical troubleshooting tip for ML engineers using FA4.

InfraTooling
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Compiling LLM/VLM training logbooks as a key resource.

Worth reading: Provides curated training logs for LLM/VLM practitioners.

LLMTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: ML Engineering Open Book updated with contribution from @omarnomad.

Worth reading: Highlights collaborative improvement of ML engineering resources.

LLMInfraTooling
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns ...

Highlights: New section on understanding training loss patterns in ML Engineering.

Worth reading: Essential for debugging and improving model training.

LLMFine-tuning
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the ...

Highlights: Visualizing PyTorch memory profiling output as modern art.

Worth reading: Creative take on memory profiling for LLM training.

LLMInfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges contribution to the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to ML engineering resources.

Tooling
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the

Highlights: Humorously compares PyTorch memory profiler output to modern art.

Worth reading: Illustrates memory profiling challenges in LLM training.

InfraLLM
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage.

Highlights: Discusses bandwidth limitations and protocol overhead in the context of Jensen's math.

Worth reading: Provides insight into hardware performance constraints relevant to ML infrastructure.

Infra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.

Worth reading: Highlights collaborative development of ML engineering resources.

Tooling
Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed ...

Highlights: Announces Ulysses Sequence Parallelism from Snowflake AI Research and DeepSpeed.

Worth reading: Relevant for those interested in sequence parallelism techniques for large models.

Infra
To remind - this is the memory saving you get when enabling TiledMLP :) Left: normal memory ...

Highlights: Demonstrates memory savings from TiledMLP.

Worth reading: Useful for ML practitioners looking to optimize memory usage.

Fine-tuning
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage. 1.

Highlights: Discusses bandwidth limits and protocol overhead in the context of Jensen's math.

Worth reading: Insightful for understanding hardware bottlenecks in ML systems.

Infra
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Introduces a new performance metric for matrix multiplication.

Worth reading: Relevant for ML engineers optimizing compute performance.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.

Worth reading: Shows community collaboration in ML education.

Tooling
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

Highlights: Compiling LLM/VLM training logbooks/chronicles as a valuable resource.

Worth reading: Provides curated insights on training large models.

LLMFine-tuning
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Machine Learning Engineering Open Book updated with contribution.

Worth reading: Open book resource for ML engineering practices.

ToolingDeployment
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: DeepSpeed ZeRO++ now available on master branch.

Worth reading: Important update for distributed training efficiency.

InfraLLM
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the

Highlights: PyTorch memory profiler visualization of Llama-8B training.

Worth reading: Visual insight into memory profiling for large models.

ToolingLLM
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman thanks a contributor for improving the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to ML educational resources.

Tooling
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

Highlights: Stas has been compiling training logbooks for LLM/VLM, indicating a focus on documenting training processes.

Worth reading: Useful for researchers and engineers looking for practical training insights.

LLMInfraFine-tuning
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas acknowledges a contribution to the Machine Learning Engineering Open Book, showing community collaboration.

Worth reading: Highlights open-source contributions to ML education.

ToolingInfra
A huge thank you note to Yih-Dar SHIEH who has been doing an amazing QA work for @huggingface for

Highlights: Stas thanks a contributor for QA work at Hugging Face, emphasizing quality assurance in ML.

Worth reading: Shows appreciation for behind-the-scenes work in ML community.

ToolingInfra
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns

Highlights: Stas announces a new section on understanding training loss patterns in ML Engineering.

Worth reading: Important for practitioners debugging training runs.

LLMFine-tuningInfra
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Stas Bekman has been compiling LLM/VLM training logbooks, which are valuable resources for understanding training processes.

Worth reading: Provides curated knowledge on training large models, useful for practitioners.

LLM
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges contribution to the Machine Learning Engineering Open Book, enhancing its content.

Worth reading: Highlights collaborative improvements to an open-source ML resource.

ToolingLLM
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should ...

Highlights: Announces that DeepSpeed ZeRO++ is now usable, encouraging adoption.

Worth reading: Relevant for those interested in efficient distributed training.

Infra
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the ...

Highlights: Uses humor to illustrate PyTorch memory profiling results for Llama-8B model.

Worth reading: Shows a creative visualization of memory usage, useful for debugging.

ToolingLLM
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage.

Highlights: Discusses bandwidth limitations and protocol overhead in computing, referencing Jensen's math.

Worth reading: Relevant for understanding performance bottlenecks in high-bandwidth systems.

Infra
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Introduces a new performance metric for matrix multiplication, likely for ML workloads.

Worth reading: Provides insight into performance measurement for ML systems.

InfraTooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a common issue with Flash Attention 4 and cutlass library loading.

Worth reading: Useful for developers experimenting with FA4 in ML frameworks.

Fine-tuningTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to open-source ML resources.

Tooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a common issue with FA4 and cutlass.cute loading.

Worth reading: Helpful for developers using FlashAttention-4.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges contribution to the Machine Learning Engineering Open Book.

Worth reading: Shows community contribution to ML education.

Tooling
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas notes that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.

Worth reading: Relevant for ML engineers using DeepSpeed for distributed training.

InfraDeployment
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas introduces a new metric called Maximum Achievable Matmul for evaluating performance.

Worth reading: Important for understanding ML model performance benchmarking.

EvaluationInfra
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Stas warns about a common issue with FA4 involving cutlass.cute loading.

Worth reading: Useful for developers experimenting with FA4 (Flash Attention 4).

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas thanks a contributor for enhancing the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to ML educational resources.

LLMTooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Stas Bekman warns about a common issue with FA4 (Flash Attention 4) involving cutlass.cute loading.

Worth reading: Useful for developers using Flash Attention 4 who may encounter this error.

ToolingInfra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman thanks a contributor for improving the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to open-source ML education resources.

Tooling
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage. 1.

Highlights: Stas Bekman discusses bandwidth limitations and protocol overhead in high-performance computing.

Worth reading: Insight into hardware constraints affecting ML training throughput.

Infra
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage. 1.

Highlights: Stas Bekman discusses bandwidth limitations and protocol overhead in computing.

Worth reading: Insightful for understanding hardware bottlenecks in ML systems.

Infra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman acknowledges a contribution to the Machine Learning Engineering Open Book, expanding its content.

Worth reading: Highlights community contributions to open-source ML education.

Tooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman thanks a contributor for enhancing the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to open-source ML education.

Tooling
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the ...

Highlights: Uses a PyTorch memory profiler on Llama-8B to create a visual representation of memory usage.

Worth reading: Creative visualization of memory profiling, blending technical insight with art.

InfraTooling
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the

Highlights: Humorous take on PyTorch memory profiler output as modern art.

Worth reading: Illustrates memory profiling challenges in LLM training.

LLMInfraTooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a common issue with FA4 and cutlass.cute loading.

Worth reading: Useful for developers experimenting with FlashAttention 4.

InfraTooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Stas Bekman warns about a common issue with FlashAttention-4 where cutlass.cute fails to load.

Worth reading: Useful troubleshooting tip for developers experimenting with FA4.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman acknowledges a contribution to the Machine Learning Engineering Open Book, adding new content.

Worth reading: Shows community collaboration on an open-source ML engineering resource.

Tooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to ML education.

ToolingLLM
The @PyTorch team are working on a new super important tool: https://t.co/rnfpDuvgOI This

Highlights: Stas Bekman highlights a new PyTorch tool (meta-pytorch/torchft) as super important for the ML community.

Worth reading: Announces a significant new tool from the PyTorch team that could impact ML infrastructure.

InfraTooling
If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

Highlights: Warns about a common issue with FA4 (Flash Attention 4) related to loading cutlass.cute.

Worth reading: Helpful for developers using Flash Attention 4.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, expanding its content.

Worth reading: Highlights collaborative improvement of an open resource for ML engineering.

ToolingLLM
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.

Worth reading: Shows community collaboration on ML resources.

ToolingLLM
Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage. 1.

Highlights: Stas Bekman discusses bandwidth limitations and protocol overhead in computing.

Worth reading: Important for understanding hardware bottlenecks in ML systems.

Infra
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman thanks a contributor for enhancing the Machine Learning Engineering Open Book.

Worth reading: Highlights community contributions to an open ML engineering resource.

ToolingDeployment
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.

Worth reading: Important for understanding GPU compute utilization in ML workloads.

InfraDeployment
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating ML hardware.

Worth reading: Novel metric that could help benchmark and optimize matrix multiplication performance.

InfraTooling
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Introduces a new performance metric for matrix multiplication, likely for benchmarking ML models.

Worth reading: Provides a novel metric for evaluating ML performance.

EvaluationInfra
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.

Worth reading: Provides a novel metric for evaluating matrix multiplication performance in ML systems.

InfraEvaluation
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating compute efficiency.

Worth reading: Provides a novel metric for benchmarking ML hardware performance.

InfraEvaluation
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.

Worth reading: Useful for understanding GPU compute efficiency in ML workloads.

InfraEvaluation
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating matrix multiplication efficiency.

Worth reading: Important for understanding performance benchmarking in ML systems.

EvaluationInfra
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Introduces a new performance metric for matrix multiplication.

Worth reading: Important for understanding GPU compute efficiency.

InfraEvaluation
Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

Highlights: Introduces a new performance metric for matrix multiplication.

Worth reading: Important for understanding ML hardware performance.

InfraEvaluation
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Stas Bekman thanks Omar Nomad for a contribution to the Machine Learning Engineering Open Book, expanding its capabilities.

Worth reading: Shows community collaboration on an open resource for ML engineering.

ToolingLLM
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, enhancing its content.

Worth reading: Shows collaborative development of open-source ML resources.

InfraTooling
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can ...

Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, enhancing its utility.

Worth reading: Highlights collaborative improvements to open-source ML resources.

ToolingLLM
To remind - this is the memory saving you get when enabling TiledMLP :) Left: normal memory ...

Highlights: Demonstrates memory savings from TiledMLP, a technique for efficient ML training.

Worth reading: Practical tip for reducing memory usage in large model training.

InfraFine-tuning
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

Highlights: Machine Learning Engineering Open Book updated with community contribution.

Worth reading: Highlights collaborative improvement of ML engineering resources.

Tooling
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the ...

Highlights: Humorously compares PyTorch memory profiler output to modern art, highlighting memory usage patterns.

Worth reading: Engaging way to understand memory profiling in LLM training.

InfraLLM
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman suggests that DeepSpeed ZeRO++ is now ready to try on the master branch.

Worth reading: Relevant for ML practitioners interested in memory optimization techniques.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman notes that DeepSpeed ZeRO++ is now ready to try on the master branch, addressing previous holding off.

Worth reading: Provides an update on the availability of DeepSpeed ZeRO++ for efficient training.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman points out that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.

Worth reading: Important for ML engineers using DeepSpeed for large model training.

InfraDeployment
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Announcement that DeepSpeed ZeRO++ is available on master branch, encouraging users to try it.

Worth reading: Relevant for ML practitioners using DeepSpeed for distributed training.

InfraDeployment
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman suggests that DeepSpeed ZeRO++ is now ready to try on the master branch.

Worth reading: Relevant for those interested in DeepSpeed ZeRO++ optimization for large model training.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman indicates that DeepSpeed ZeRO++ is ready to try on the master branch.

Worth reading: Relevant for ML engineers interested in distributed training optimizations.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should ...

Highlights: Announces that DeepSpeed ZeRO++ is now ready for use, encouraging adoption.

Worth reading: Relevant for those optimizing memory and speed in large model training.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman notes that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.

Worth reading: Relevant for ML practitioners interested in memory optimization techniques like ZeRO++.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should ...

Highlights: Announces that DeepSpeed ZeRO++ is now usable, encouraging users to try it.

Worth reading: Relevant for those interested in memory optimization for large model training.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Encourages trying DeepSpeed ZeRO++ as it may now work on master branch.

Worth reading: Relevant for those using DeepSpeed for distributed training.

InfraFine-tuning
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Stas Bekman notes that DeepSpeed ZeRO++ is now available on the master branch, encouraging users to try it.

Worth reading: Relevant for ML practitioners using DeepSpeed for large model training.

InfraDeployment
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

Highlights: Encourages trying MSFTDeepSpeed ZeRO++ as it's now in master.

Worth reading: Relevant for ML practitioners interested in memory optimization.

InfraFine-tuning
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns

Highlights: New section on understanding training loss patterns in ML Engineering.

Worth reading: Essential for diagnosing training issues in deep learning.

Fine-tuning
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns

Highlights: Introduces a section on understanding training loss patterns in ML Engineering.

Worth reading: Essential for diagnosing training issues in ML models.

LLMFine-tuning
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

Highlights: Compiling logbooks/chronicles for LLM/VLM training, sharing a valuable resource.

Worth reading: Provides curated training logs for LLM/VLM practitioners.

LLMFine-tuningInfra
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Stas Bekman has been compiling logbooks/chronicles for LLM/VLM training, sharing valuable resources.

Worth reading: Provides a curated source for training large models, useful for practitioners.

LLM
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

Highlights: Stas Bekman has been compiling logbooks/chronicles of LLM/VLM training, which he considers one of the best sources for understanding training processes.

Worth reading: Provides a curated collection of training experiences and insights for LLM/VLM practitioners.

LLMFine-tuningTooling
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Stas Bekman curates training logbooks for LLMs and VLMs, providing a valuable resource.

Worth reading: Essential for practitioners tracking training methodologies and best practices.

LLMFine-tuningInfra
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to ...

Highlights: Stas Bekman compiles LLM/VLM training logbooks, providing a valuable resource for understanding training processes.

Worth reading: Offers curated insights into large model training, useful for practitioners.

LLMFine-tuningInfra
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

Highlights: Compiling LLM/VLM training logbooks as a key resource.

Worth reading: Provides curated training knowledge for ML practitioners.

LLMTooling
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

Highlights: Compiling LLM/VLM training logbooks as a valuable resource.

Worth reading: Provides curated logbooks for LLM/VLM training insights.

LLMFine-tuning
108 x-posts · All time