All People
Andrej Karpathy

Andrej Karpathy

AI educator, ex-Tesla/OpenAI

Recent InterestsAI

He is actively exploring the practical applications of LLMs, particularly for building personal knowledge bases and enhancing coding workflows, while also experimenting with and tuning small-scale models like nanochat.

Recent Activity10 videos · 15 x-posts

Recent Activity

Building makemore Part 2: MLP

Building makemore Part 2: MLP

Andrej Karpathy

Sep 12, 02:43 PM·524,893 viewsYouTube

Highlights: This video demonstrates building a multilayer perceptron (MLP) for character-level language modeling, covering essential ML fundamentals like training, hyperparameter tuning, and evaluation. It provides practical insights into handling train/dev/test splits and diagnosing under/overfitting in neural networks.

Worth watching: It's worth watching for hands-on implementation of MLPs with clear explanations of core machine learning concepts, making it accessible for both beginners and practitioners looking to solidify their understanding.

Building makemore Part 3: Activations & Gradients, BatchNorm

Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

Oct 4, 04:41 PM·489,022 viewsYouTube

Highlights: This video examines the statistical challenges in training deep neural networks, focusing on how improperly scaled activations and gradients can cause instability. It introduces Batch Normalization as a key technique to stabilize training by normalizing layer inputs.

Worth watching: Worth watching for its practical insights into diagnosing and fixing common deep learning training issues, presented by an expert with clear visualizations of internal network behavior.

Building makemore Part 4: Becoming a Backprop Ninja

Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy

Oct 11, 05:56 PM·335,387 viewsYouTube

Highlights: This video demonstrates manual backpropagation through a complete 2-layer MLP with BatchNorm, covering gradients from cross entropy loss through embedding tables. It builds intuitive understanding of gradient flow at the tensor level, beyond scalar implementations like micrograd, while reinforcing core deep learning concepts.

Worth watching: Essential viewing for developers wanting to move beyond autograd black boxes and truly understand gradient computation in neural networks, presented by one of the field's most effective educators.

Building makemore Part 5: Building a WaveNet

Building makemore Part 5: Building a WaveNet

Andrej Karpathy

Nov 21, 12:32 AM·268,962 viewsYouTube

Highlights: This video demonstrates how to evolve a simple 2-layer MLP into a deeper, tree-like neural network architecture that resembles DeepMind's WaveNet (2016). It shows the practical implementation process using PyTorch's torch.nn module while explaining the underlying mechanics of deep learning development.

Worth watching: It's worth watching because it provides a clear, hands-on walkthrough of building a complex neural network from simpler components, offering valuable insights into both PyTorch fundamentals and the architectural thinking behind influential models like WaveNet.

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy

Jan 17, 04:33 PM·7,077,458 viewsYouTube

Highlights: This video provides a hands-on coding tutorial where Andrej Karpathy builds a GPT model from scratch, implementing the transformer architecture described in 'Attention is All You Need' and connecting it to real-world applications like GPT-2/3 and ChatGPT. It demonstrates the practical implementation of autoregressive language modeling while showing GitHub Copilot (itself a GPT model) assisting in writing the code, creating a meta-learning experience.

Worth watching: It's worth watching because it demystifies complex AI concepts through clear, practical coding examples and connects theoretical papers to real implementations, making advanced transformer architectures accessible to developers and enthusiasts.

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

Nov 23, 02:27 AM·3,554,837 viewsYouTube

Highlights: This talk demystifies Large Language Models (LLMs) by explaining them as a new computing paradigm analogous to operating systems, where models like ChatGPT serve as the core technical component. It covers their fundamental workings, future trajectory, and unique security challenges in an accessible way for general audiences.

Worth watching: Andrej Karpathy provides a clear, foundational understanding of LLMs from one of the field's leading educators, making complex concepts accessible while addressing practical implications and security considerations that remain highly relevant.

Let's build the GPT Tokenizer

Let's build the GPT Tokenizer

Andrej Karpathy

Feb 20, 05:11 PM·1,069,817 viewsYouTube

Highlights: The video explains that tokenizers are a separate, crucial component in LLMs, using Byte Pair Encoding to translate between text and tokens. It demonstrates building the GPT tokenizer from scratch, highlighting its distinct training process and core encode/decode functions.

Worth watching: Worth watching to understand a fundamental yet often overlooked part of how LLMs process text, presented clearly by an expert in the field.

Let's reproduce GPT-2 (124M)

Let's reproduce GPT-2 (124M)

Andrej Karpathy

Jun 9, 11:31 PM·1,048,747 viewsYouTube

Highlights: This video provides a comprehensive, hands-on walkthrough of reproducing the GPT-2 (124M) model from scratch, covering network architecture, training optimization, and hyperparameter tuning based on original papers. It demonstrates the full training pipeline with practical implementation details and concludes with generated text samples to evaluate model performance.

Worth watching: Worth watching for its educational value in understanding transformer-based language model implementation and training optimization, presented by a renowned AI educator with clear, practical demonstrations.

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

Feb 5, 06:23 PM·6,058,917 viewsYouTube

Highlights: This video provides a comprehensive overview of how Large Language Models like ChatGPT are developed, covering the full training stack from data collection to deployment. It also offers practical mental models for understanding their 'psychology' and optimizing their use in real-world applications.

Worth watching: Worth watching because Andrej Karpathy, a leading AI researcher, delivers an accessible yet thorough explanation that bridges technical depth with practical application insights, making complex LLM concepts understandable for general audiences.

How I use LLMs

How I use LLMs

Andrej Karpathy

Feb 27, 10:29 PM·2,358,015 viewsYouTube

Highlights: The video provides a practical, example-driven walkthrough of how to effectively use Large Language Models in daily life, covering everything from basic interactions to understanding pricing tiers and model selection. It demystifies the growing LLM ecosystem by showing concrete applications and explaining when to use different models.

Worth watching: Andrej Karpathy's expertise and clear teaching style make complex AI concepts accessible, offering actionable insights for both beginners and experienced users looking to optimize their LLM usage.

LLM Knowledge Bases. Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating...

Highlights: Karpathy is shifting focus from coding to using LLMs to build and manage personal knowledge bases for research, indicating a move towards knowledge compounding and organization.

Worth reading: It reveals a practical, high-level workflow shift for an AI expert, moving from pure code generation to structured knowledge management using LLMs.

LLMRAGTooling
LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected.

Highlights: Karpathy expresses the dual-nature surprise of LLM capabilities, acknowledging both their advanced and surprisingly limited aspects.

Worth reading: It captures a nuanced, expert perspective on the current state and paradoxical nature of LLM intelligence.

LLMEvaluation
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model.

Highlights: Karpathy is experimenting with automated research and fine-tuning processes for a smaller model (nanochat), indicating hands-on work in model optimization.

Worth reading: It shows direct, technical experimentation with automated fine-tuning workflows on specific model architectures.

Fine-tuningAgentTooling
LLM Knowledge Bases. Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating...

Highlights: Karpathy is shifting focus from coding to using LLMs to build and manage personal knowledge bases for research, indicating a move towards knowledge compounding and organization.

Worth reading: It reveals a practical, high-level workflow shift for an AI expert, moving from pure code generation to structured knowledge management using LLMs.

LLMRAGTooling
LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected.

Highlights: Karpathy expresses the dual-nature surprise of LLM capabilities, acknowledging both their advanced and surprisingly limited aspects.

Worth reading: It captures a nuanced, expert perspective on the current state and paradoxical nature of LLM intelligence.

LLMEvaluation
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model.

Highlights: Karpathy is experimenting with automated research and fine-tuning processes for a smaller model (nanochat), indicating hands-on work in model optimization.

Worth reading: It shows direct, technical experimentation with automated fine-tuning workflows on specific model architectures.

Fine-tuningAgentTooling
LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected. In any case they

Highlights: LLMs represent a novel form of intelligence that defies simple categorization, exhibiting surprising capabilities alongside significant limitations.

Worth reading: Captures the nuanced, paradoxical nature of current LLM capabilities in a concise observation.

LLMEvaluation
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM

Highlights: Shares practical insights and observations from extensive recent experience using Claude for coding, focusing on workflow improvements.

Worth reading: Provides firsthand experience on integrating LLMs into real-world coding practices.

LLMTooling
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model.

Highlights: Describes an experiment with automated research (autoresearch) for fine-tuning a model called 'nanochat' over an extended period.

Worth reading: Demonstrates practical application of automated AI research and fine-tuning techniques.

AgentFine-tuning
The hottest new programming language is English

Highlights: English is becoming the primary interface for programming and interacting with AI systems, suggesting a shift toward natural language as a programming paradigm.

Worth reading: It highlights the fundamental shift in how humans will interact with and instruct computational systems.

LLMTooling
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is...

Highlights: Public understanding of AI capabilities is lagging, partly because many formed opinions based on outdated or limited (free-tier) experiences with models like ChatGPT.

Worth reading: It addresses the perception gap in AI progress, which is crucial for realistic public and professional discourse.

LLMEvaluation
LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected. In any case they...

Highlights: Large Language Models represent a novel form of intelligence with surprising capabilities and surprising limitations, defying simple categorization.

Worth reading: It captures the dual-nature and complexity of modern AI systems that experts are grappling to understand.

LLMEvaluation
Very interested in what the coming era of highly bespoke software might look like. Example from this morning - I've become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over https://t.co/EDULdIpWmE

Highlights: Karpathy is experimenting with personalized software for health tracking, specifically aiming to lower his resting heart rate through a structured approach.

Worth reading: It illustrates the trend towards highly customized, personal software applications driven by individual needs.

Tooling
2025 LLM Year in Review

Highlights: Karpathy published a review article summarizing key developments and trends in the LLM field for the year 2025.

Worth reading: Provides an expert retrospective on the state of LLM technology from a leading AI researcher.

LLMEvaluation
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in

Highlights: Karpathy's coding workflow has dramatically shifted from mostly manual coding to predominantly using AI agents for code generation, with human input reduced to editing and touch-ups.

Worth reading: It demonstrates a significant, rapid shift in developer productivity and workflow due to advances in LLM coding assistants.

AgentLLMTooling
10 videos · 15 x-posts · All time