Recent Activity
@stas00
Highlights: Stas Bekman warns about a common issue with FA4 (Flash Attention 4) involving cutlass.cute loading.
Worth reading: Useful for developers experimenting with Flash Attention 4.
@stas00
Highlights: Acknowledges contribution to the ML Engineering Open Book.
Worth reading: Shows collaborative development of ML resources.
@stas00
Highlights: Announces a new section on training loss patterns in ML Engineering.
Worth reading: Provides educational content on understanding training loss.
@stas00
Highlights: Humorously compares PyTorch memory profiler output to modern art.
Worth reading: Illustrates memory profiling challenges in LLM training.
@stas00
Highlights: Stas Bekman compiles LLM/VLM training logbooks, providing a valuable resource for training insights.
Worth reading: Essential for anyone involved in training large language or vision models.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, expanding its capabilities.
Worth reading: Highlights collaborative improvements to open-source ML engineering resources.
@stas00
Highlights: Introduces a new section on understanding training loss patterns in ML engineering.
Worth reading: Provides crucial knowledge for diagnosing and improving model training.
@stas00
Highlights: Uses PyTorch memory profiler output as a form of modern art, showing memory patterns of Llama-8B.
Worth reading: Creative visualization of memory profiling, useful for understanding model memory usage.
@stas00
Highlights: Announces a contribution to the Machine Learning Engineering Open Book.
Worth reading: Highlights community collaboration in ML engineering resources.
@stas00
Highlights: Encourages trying DeepSpeed ZeRO++ as it should be functional on master.
Worth reading: Useful for those waiting to test DeepSpeed ZeRO++ optimizations.
@stas00
Highlights: Changing PyTorch version can significantly reduce GPU memory usage, with differences of up to 6GB between versions.
Worth reading: Practical tip for ML practitioners facing memory constraints during training.
@stas00
Highlights: Stas Bekman compiles training logbooks for LLM/VLM, providing valuable resources.
Worth reading: Curated logbooks help practitioners learn from real training experiences.
@stas00
Highlights: The ML Engineering Open Book receives contributions to expand its content.
Worth reading: Open-source ML engineering book is actively improved by the community.
@stas00
Highlights: New section on understanding training loss patterns added to ML Engineering resources.
Worth reading: Helps practitioners diagnose and improve model training by analyzing loss curves.
@stas00
Highlights: Discusses bandwidth limitations and protocol overhead in GPU interconnects.
Worth reading: Provides insight into hardware performance constraints relevant to ML infrastructure.
@stas00
Highlights: Introduces a new performance metric for matrix multiplication.
Worth reading: Relevant for evaluating and optimizing ML model performance.
@stas00
Highlights: Announces a contribution to the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to open-source ML education.
@stas00
Highlights: Warns about a common issue with FA4 and cutlass.cute loading.
Worth reading: Practical troubleshooting tip for ML engineers using FA4.
@stas00
Highlights: Compiling LLM/VLM training logbooks as a key resource.
Worth reading: Provides curated training logs for LLM/VLM practitioners.
@stas00
Highlights: ML Engineering Open Book updated with contribution from @omarnomad.
Worth reading: Highlights collaborative improvement of ML engineering resources.
@stas00
Highlights: New section on understanding training loss patterns in ML Engineering.
Worth reading: Essential for debugging and improving model training.
@stas00
Highlights: Visualizing PyTorch memory profiling output as modern art.
Worth reading: Creative take on memory profiling for LLM training.
@stas00
Highlights: Acknowledges contribution to the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to ML engineering resources.
@stas00
Highlights: Humorously compares PyTorch memory profiler output to modern art.
Worth reading: Illustrates memory profiling challenges in LLM training.
@stas00
Highlights: Discusses bandwidth limitations and protocol overhead in the context of Jensen's math.
Worth reading: Provides insight into hardware performance constraints relevant to ML infrastructure.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.
Worth reading: Highlights collaborative development of ML engineering resources.
@stas00
Highlights: Announces Ulysses Sequence Parallelism from Snowflake AI Research and DeepSpeed.
Worth reading: Relevant for those interested in sequence parallelism techniques for large models.
@stas00
Highlights: Demonstrates memory savings from TiledMLP.
Worth reading: Useful for ML practitioners looking to optimize memory usage.
@stas00
Highlights: Discusses bandwidth limits and protocol overhead in the context of Jensen's math.
Worth reading: Insightful for understanding hardware bottlenecks in ML systems.
@stas00
Highlights: Introduces a new performance metric for matrix multiplication.
Worth reading: Relevant for ML engineers optimizing compute performance.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.
Worth reading: Shows community collaboration in ML education.
@stas00
Highlights: Compiling LLM/VLM training logbooks/chronicles as a valuable resource.
Worth reading: Provides curated insights on training large models.
@stas00
Highlights: Machine Learning Engineering Open Book updated with contribution.
Worth reading: Open book resource for ML engineering practices.
@stas00
Highlights: DeepSpeed ZeRO++ now available on master branch.
Worth reading: Important update for distributed training efficiency.
@stas00
Highlights: PyTorch memory profiler visualization of Llama-8B training.
Worth reading: Visual insight into memory profiling for large models.
@stas00
Highlights: Stas Bekman thanks a contributor for improving the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to ML educational resources.
@stas00
Highlights: Stas has been compiling training logbooks for LLM/VLM, indicating a focus on documenting training processes.
Worth reading: Useful for researchers and engineers looking for practical training insights.
@stas00
Highlights: Stas acknowledges a contribution to the Machine Learning Engineering Open Book, showing community collaboration.
Worth reading: Highlights open-source contributions to ML education.
@stas00
Highlights: Stas thanks a contributor for QA work at Hugging Face, emphasizing quality assurance in ML.
Worth reading: Shows appreciation for behind-the-scenes work in ML community.
@stas00
Highlights: Stas announces a new section on understanding training loss patterns in ML Engineering.
Worth reading: Important for practitioners debugging training runs.
@stas00
Highlights: Stas Bekman has been compiling LLM/VLM training logbooks, which are valuable resources for understanding training processes.
Worth reading: Provides curated knowledge on training large models, useful for practitioners.
@stas00
Highlights: Acknowledges contribution to the Machine Learning Engineering Open Book, enhancing its content.
Worth reading: Highlights collaborative improvements to an open-source ML resource.
@stas00
Highlights: Announces that DeepSpeed ZeRO++ is now usable, encouraging adoption.
Worth reading: Relevant for those interested in efficient distributed training.
@stas00
Highlights: Uses humor to illustrate PyTorch memory profiling results for Llama-8B model.
Worth reading: Shows a creative visualization of memory usage, useful for debugging.
@stas00
Highlights: Discusses bandwidth limitations and protocol overhead in computing, referencing Jensen's math.
Worth reading: Relevant for understanding performance bottlenecks in high-bandwidth systems.
@stas00
Highlights: Introduces a new performance metric for matrix multiplication, likely for ML workloads.
Worth reading: Provides insight into performance measurement for ML systems.
@stas00
Highlights: Warns about a common issue with Flash Attention 4 and cutlass library loading.
Worth reading: Useful for developers experimenting with FA4 in ML frameworks.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to open-source ML resources.
@stas00
Highlights: Warns about a common issue with FA4 and cutlass.cute loading.
Worth reading: Helpful for developers using FlashAttention-4.
@stas00
Highlights: Acknowledges contribution to the Machine Learning Engineering Open Book.
Worth reading: Shows community contribution to ML education.
@stas00
Highlights: Stas notes that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.
Worth reading: Relevant for ML engineers using DeepSpeed for distributed training.
@stas00
Highlights: Stas introduces a new metric called Maximum Achievable Matmul for evaluating performance.
Worth reading: Important for understanding ML model performance benchmarking.
@stas00
Highlights: Stas warns about a common issue with FA4 involving cutlass.cute loading.
Worth reading: Useful for developers experimenting with FA4 (Flash Attention 4).
@stas00
Highlights: Stas thanks a contributor for enhancing the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to ML educational resources.
@stas00
Highlights: Stas Bekman warns about a common issue with FA4 (Flash Attention 4) involving cutlass.cute loading.
Worth reading: Useful for developers using Flash Attention 4 who may encounter this error.
@stas00
Highlights: Stas Bekman thanks a contributor for improving the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to open-source ML education resources.
@stas00
Highlights: Stas Bekman discusses bandwidth limitations and protocol overhead in high-performance computing.
Worth reading: Insight into hardware constraints affecting ML training throughput.
@stas00
Highlights: Stas Bekman discusses bandwidth limitations and protocol overhead in computing.
Worth reading: Insightful for understanding hardware bottlenecks in ML systems.
@stas00
Highlights: Stas Bekman acknowledges a contribution to the Machine Learning Engineering Open Book, expanding its content.
Worth reading: Highlights community contributions to open-source ML education.
@stas00
Highlights: Stas Bekman thanks a contributor for enhancing the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to open-source ML education.
@stas00
Highlights: Uses a PyTorch memory profiler on Llama-8B to create a visual representation of memory usage.
Worth reading: Creative visualization of memory profiling, blending technical insight with art.
@stas00
Highlights: Humorous take on PyTorch memory profiler output as modern art.
Worth reading: Illustrates memory profiling challenges in LLM training.
@stas00
Highlights: Warns about a common issue with FA4 and cutlass.cute loading.
Worth reading: Useful for developers experimenting with FlashAttention 4.
@stas00
Highlights: Stas Bekman warns about a common issue with FlashAttention-4 where cutlass.cute fails to load.
Worth reading: Useful troubleshooting tip for developers experimenting with FA4.
@stas00
Highlights: Stas Bekman acknowledges a contribution to the Machine Learning Engineering Open Book, adding new content.
Worth reading: Shows community collaboration on an open-source ML engineering resource.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to ML education.
@stas00
Highlights: Stas Bekman highlights a new PyTorch tool (meta-pytorch/torchft) as super important for the ML community.
Worth reading: Announces a significant new tool from the PyTorch team that could impact ML infrastructure.
@stas00
Highlights: Warns about a common issue with FA4 (Flash Attention 4) related to loading cutlass.cute.
Worth reading: Helpful for developers using Flash Attention 4.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, expanding its content.
Worth reading: Highlights collaborative improvement of an open resource for ML engineering.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book.
Worth reading: Shows community collaboration on ML resources.
@stas00
Highlights: Stas Bekman discusses bandwidth limitations and protocol overhead in computing.
Worth reading: Important for understanding hardware bottlenecks in ML systems.
@stas00
Highlights: Stas Bekman thanks a contributor for enhancing the Machine Learning Engineering Open Book.
Worth reading: Highlights community contributions to an open ML engineering resource.
@stas00
Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.
Worth reading: Important for understanding GPU compute utilization in ML workloads.
@stas00
Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating ML hardware.
Worth reading: Novel metric that could help benchmark and optimize matrix multiplication performance.
@stas00
Highlights: Introduces a new performance metric for matrix multiplication, likely for benchmarking ML models.
Worth reading: Provides a novel metric for evaluating ML performance.
@stas00
Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.
Worth reading: Provides a novel metric for evaluating matrix multiplication performance in ML systems.
@stas00
Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating compute efficiency.
Worth reading: Provides a novel metric for benchmarking ML hardware performance.
@stas00
Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.
Worth reading: Useful for understanding GPU compute efficiency in ML workloads.
@stas00
Highlights: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating matrix multiplication efficiency.
Worth reading: Important for understanding performance benchmarking in ML systems.
@stas00
Highlights: Introduces a new performance metric for matrix multiplication.
Worth reading: Important for understanding GPU compute efficiency.
@stas00
Highlights: Introduces a new performance metric for matrix multiplication.
Worth reading: Important for understanding ML hardware performance.
@stas00
Highlights: Stas Bekman thanks Omar Nomad for a contribution to the Machine Learning Engineering Open Book, expanding its capabilities.
Worth reading: Shows community collaboration on an open resource for ML engineering.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, enhancing its content.
Worth reading: Shows collaborative development of open-source ML resources.
@stas00
Highlights: Acknowledges a contribution to the Machine Learning Engineering Open Book, enhancing its utility.
Worth reading: Highlights collaborative improvements to open-source ML resources.
@stas00
Highlights: Demonstrates memory savings from TiledMLP, a technique for efficient ML training.
Worth reading: Practical tip for reducing memory usage in large model training.
@stas00
Highlights: Machine Learning Engineering Open Book updated with community contribution.
Worth reading: Highlights collaborative improvement of ML engineering resources.
@stas00
Highlights: Humorously compares PyTorch memory profiler output to modern art, highlighting memory usage patterns.
Worth reading: Engaging way to understand memory profiling in LLM training.
@stas00
Highlights: Stas Bekman suggests that DeepSpeed ZeRO++ is now ready to try on the master branch.
Worth reading: Relevant for ML practitioners interested in memory optimization techniques.
@stas00
Highlights: Stas Bekman notes that DeepSpeed ZeRO++ is now ready to try on the master branch, addressing previous holding off.
Worth reading: Provides an update on the availability of DeepSpeed ZeRO++ for efficient training.
@stas00
Highlights: Stas Bekman points out that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.
Worth reading: Important for ML engineers using DeepSpeed for large model training.
@stas00
Highlights: Announcement that DeepSpeed ZeRO++ is available on master branch, encouraging users to try it.
Worth reading: Relevant for ML practitioners using DeepSpeed for distributed training.
@stas00
Highlights: Stas Bekman suggests that DeepSpeed ZeRO++ is now ready to try on the master branch.
Worth reading: Relevant for those interested in DeepSpeed ZeRO++ optimization for large model training.
@stas00
Highlights: Stas Bekman indicates that DeepSpeed ZeRO++ is ready to try on the master branch.
Worth reading: Relevant for ML engineers interested in distributed training optimizations.
@stas00
Highlights: Announces that DeepSpeed ZeRO++ is now ready for use, encouraging adoption.
Worth reading: Relevant for those optimizing memory and speed in large model training.
@stas00
Highlights: Stas Bekman notes that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.
Worth reading: Relevant for ML practitioners interested in memory optimization techniques like ZeRO++.
@stas00
Highlights: Announces that DeepSpeed ZeRO++ is now usable, encouraging users to try it.
Worth reading: Relevant for those interested in memory optimization for large model training.
@stas00
Highlights: Encourages trying DeepSpeed ZeRO++ as it may now work on master branch.
Worth reading: Relevant for those using DeepSpeed for distributed training.
@stas00
Highlights: Stas Bekman notes that DeepSpeed ZeRO++ is now available on the master branch, encouraging users to try it.
Worth reading: Relevant for ML practitioners using DeepSpeed for large model training.
@stas00
Highlights: Encourages trying MSFTDeepSpeed ZeRO++ as it's now in master.
Worth reading: Relevant for ML practitioners interested in memory optimization.
@stas00
Highlights: New section on understanding training loss patterns in ML Engineering.
Worth reading: Essential for diagnosing training issues in deep learning.
@stas00
Highlights: Introduces a section on understanding training loss patterns in ML Engineering.
Worth reading: Essential for diagnosing training issues in ML models.
@stas00
Highlights: Compiling logbooks/chronicles for LLM/VLM training, sharing a valuable resource.
Worth reading: Provides curated training logs for LLM/VLM practitioners.
@stas00
Highlights: Stas Bekman has been compiling logbooks/chronicles for LLM/VLM training, sharing valuable resources.
Worth reading: Provides a curated source for training large models, useful for practitioners.
@stas00
Highlights: Stas Bekman has been compiling logbooks/chronicles of LLM/VLM training, which he considers one of the best sources for understanding training processes.
Worth reading: Provides a curated collection of training experiences and insights for LLM/VLM practitioners.
@stas00
Highlights: Stas Bekman curates training logbooks for LLMs and VLMs, providing a valuable resource.
Worth reading: Essential for practitioners tracking training methodologies and best practices.
@stas00
Highlights: Stas Bekman compiles LLM/VLM training logbooks, providing a valuable resource for understanding training processes.
Worth reading: Offers curated insights into large model training, useful for practitioners.
@stas00
Highlights: Compiling LLM/VLM training logbooks as a key resource.
Worth reading: Provides curated training knowledge for ML practitioners.
@stas00
Highlights: Compiling LLM/VLM training logbooks as a valuable resource.
Worth reading: Provides curated logbooks for LLM/VLM training insights.