Simon Willison

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

May 21, 04:32 AM

Highlights: Simon Willison critiques 'vibe coding' as an irresponsible approach to software development.

Worth reading: It offers a critical perspective on a popular coding trend.

AgentTooling

A short note that the predictions that LLMs would favor 'boring technology' that's once you attach them to a good coding agent harness at least

@simonw

May 21, 04:32 AM

Highlights: Simon Willison notes that LLMs might favor boring technology when paired with a good coding agent harness.

Worth reading: It provides insight into LLM behavior in coding contexts.

AgentLLM

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

@simonw

May 21, 04:32 AM

Highlights: Simon Willison suggests that intuition for when not to intervene is key for coding agent effectiveness.

Worth reading: It offers practical advice for working with AI coding agents.

AgentTooling

I don't have much to say about this year's Google I/O because I prefer to write about products that have shipped, not just "coming soon" announcements - but here are some notes on Gemini Spark and Antigravity simonwillison.net/2026/May/20/...

@simonwillison.net

May 20, 03:39 PM·❤️ 48🔄 2·💬 3

LLMDeployment

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a...

@simonw

May 20, 04:29 AM

Highlights: Simon Willison argues that quitting programming due to LLMs is analogous to quitting carpentry due to power tools, implying LLMs are tools that augment rather than replace programmers.

Worth reading: Provides a balanced perspective on LLMs' impact on programming careers, countering fear with historical analogy.

LLMTooling

My notes on Gemini 3.5 Flash - 3x the price of Gemini 3 Flash but Google are planning to use it for many of their own products simonwillison.net/2026/May/19/...

@simonwillison.net

May 19, 10:41 PM·❤️ 47🔄 5·💬 5

LLMDeployment

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Simon Willison·May 19, 2026

Today at Google I/O, Google <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">released Gemini 3.5 Flash</a>. This one skipped the <code>-preview</code> modifier and went straight to general availability, and Google appear to be using it for a whole lot...

Highlights: Google released Gemini 3.5 Flash directly to general availability, skipping the preview phase, and plans to integrate it across many products. Despite being more expensive, it offers improved performance and efficiency, making it a versatile model for various applications.

Worth reading: This post provides insight into Google's strategic shift towards a more powerful, production-ready model and its implications for developers and users.

Blog

The last six months in LLMs in five minutes

Simon Willison·May 19, 2026

I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the <a href="https://tools.simonwillison.net/annotated-presentations">latest iteration</a> of my <a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/">annotated presentation...

Highlights: The post summarizes key developments in LLMs over the past six months, including the rise of multi-modal models, improved reasoning capabilities, and the increasing importance of evaluation frameworks. It highlights practical tools and techniques for working with LLMs, such as prompt engineering and fine-tuning.

Worth reading: It offers a concise, high-level overview of recent LLM advancements, making it useful for practitioners who want to stay updated without diving into lengthy technical papers.

Blog

andrew/pycon

HTML⭐ 6·starred by simonw

Data collection and analysis for a PyCon talk on GitHub Actions security across Python packages.

Highlights: This repository provides data collection and analysis scripts for a PyCon talk on GitHub Actions security across Python packages. It offers insights into how GitHub Actions are used in the Python ecosystem and potential security risks.

Worth reading: It's worth exploring for insights into GitHub Actions security practices and for understanding the security posture of Python packages.

Infra

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

May 18, 04:31 AM

Highlights: Simon observes that AI labs are prioritizing code generation as the primary benchmark for model improvement.

Worth reading: Reflects a key trend in AI development where coding ability is seen as a proxy for general intelligence.

LLMEvaluation

psf/pypistats.org

Python⭐ 187·starred by simonw

PyPI downloads analytics dashboard

Highlights: PyPI Stats provides a public dashboard for viewing download statistics of Python packages from PyPI. It offers insights into package popularity and trends over time, useful for developers and maintainers.

Worth reading: For AI leaders tracking the adoption of their Python-based AI tools, this repo offers a simple way to monitor download metrics and gauge community interest.

download-countspypipypi-packagespythonpython-packages

Tooling

mschwager/cohesion

Python⭐ 263·starred by simonw

A tool for measuring Python class cohesion.

Highlights: Cohesion is a flake8 plugin that measures Python class cohesion using the Lack of Cohesion of Methods (LCOM) metric. It helps developers identify classes that may be doing too much and should be refactored, improving code maintainability and adherence to single responsibility principle.

Worth reading: For Python developers focused on code quality, this tool provides a concrete, automated way to detect low-cohesion classes, which is a key indicator of design issues in object-oriented code.

classcodecohesionflake8flake8-plugin

Tooling

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

May 17, 04:27 AM

Highlights: Simon Willison criticizes 'vibe coding' as an irresponsible approach to software development.

Worth reading: Highlights a critical perspective on a trending coding methodology.

LLMSafety

A short note that the predictions that LLMs would favor 'boring technology' that's

@simonw

May 17, 04:27 AM

Highlights: Simon Willison notes that LLMs may not favor boring technology as predicted.

Worth reading: Challenges common assumptions about LLM preferences.

LLMEvaluation

To prepare for my #PyConUS lightning talk this afternoon I decided to track down ALL of the names that @openclaw has used since November, using a script against its GitHub repo Warelay → CLAWDIS → CLAWDBOT → Clawdbot → Moltbot →🦞 OpenClaw simonwillison.net/2026/May/16/...

@simonwillison.net

May 16, 09:33 PM·❤️ 41🔄 3·💬 6

AgentTooling

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

May 16, 04:15 AM

Highlights: Simon Willison observes that AI labs are increasingly focused on improving code generation as a primary goal.

Worth reading: Reflects a key trend in AI development priorities.

LLMDeployment

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

May 15, 04:25 AM

Highlights: Simon notes that AI labs are increasingly focused on improving code generation capabilities.

Worth reading: Reflects a key trend in AI development priorities.

LLMTooling

A short note that the predictions that LLMs would favor "boring technology" that's once you attach them to a good coding agent harness at least

@simonw

May 14, 04:21 AM

Highlights: LLMs may favor boring technology when paired with a good coding agent harness.

Worth reading: Challenges the assumption that LLMs always prefer novel tech.

LLMAgentTooling

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

@simonw

May 14, 04:21 AM

Highlights: Key skill for coding agents is knowing when not to intervene.

Worth reading: Insightful perspective on human-agent collaboration.

AgentTooling

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

May 14, 04:21 AM

Highlights: Defines vibe coding as irresponsible software development.

Worth reading: Critical view on a trending practice in AI-assisted coding.

AgentDeployment

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

May 13, 04:22 AM

Highlights: Simon Willison defines vibe coding as irresponsible software development where code quality is neglected.

Worth reading: Provides a critical perspective on a trending AI-assisted coding practice.

AgentSafetyEvaluation

A short note that the predictions that LLMs would favor 'boring technology' that's

@simonw

May 13, 04:22 AM

Highlights: Willison notes that LLMs may not favor boring technology as predicted.

Worth reading: Challenges a common assumption about LLM preferences in technology choices.

LLMDeployment

This "Unethical Guide to Surviving AI Layoffs" by Mo Bitar perfectly captures the current moment www.tiktok.com/@atmoio/vide...

@simonwillison.net

May 13, 12:48 AM·❤️ 55🔄 7·💬 3

Deployment

cactus-compute/needle

Python⭐ 534·starred by simonw

26m function call model that runs on incredibly small devices

Highlights: A compact 26M parameter model optimized for function calling on resource-constrained devices, enabling on-device AI execution. It leverages Gemma architecture and is designed for low-latency, privacy-preserving inference.

Worth reading: It demonstrates how to run capable LLMs on tiny hardware, opening up edge AI applications like IoT and mobile assistants.

cactusgeminigemmallmon-device-ai

LLMAgentDeployment

datasette/datasette-auth-tailscale

Python⭐ 2·starred by simonw

Highlights: This plugin enables Tailscale authentication for Datasette, allowing users to restrict access to their Datasette instances to Tailscale network members. It leverages Tailscale's identity and access controls for seamless, secure sharing.

Worth reading: For Datasette users already on Tailscale, this plugin provides a simple yet powerful way to add authentication without managing separate user databases, making it ideal for internal tools and collaborative data exploration.

InfraTooling

I've published video, slides and a detailed annotated transcript from my talk at this week's AI Engineer World's

@simonw

May 12, 04:19 AM

Highlights: Simon Willison shared a talk about the last year six months in LLMs, illustrated with pelicans on bicycles.

Worth reading: Provides a creative and insightful overview of recent LLM developments.

LLM

It's interesting how 'better at code' has become the defining goal of almost every AI lab over the

@simonw

May 12, 04:19 AM

Highlights: Simon Willison observes that AI labs are increasingly focused on improving code generation capabilities.

Worth reading: Highlights a key trend in AI research priorities.

LLMTooling

"old woman possibly damp faster than an old woman should be"

@simonwillison.net

May 12, 02:57 AM·❤️ 12🔄 0·💬 1

Evaluation

Wrote about today's GitLab restructuring / "workforce reduction" announcement, and ended up digging around in version control for both the GitLab and the 37signals public employee handbooks to help illustrate my thoughts simonwillison.net/2026/May/11/...

@simonwillison.net

May 12, 12:17 AM·❤️ 35🔄 7·💬 5

New TIL: I figured out how to use my LLM CLI tool in a shebang line, which means you can write executable scripts in English, or hook up more complex scripts with a snippet of YAML template - til.simonwillison.net/llms/llm-she...

@simonwillison.net

May 11, 07:06 PM·❤️ 114🔄 5·💬 8

Tooling

This is excellent. I particularly like the definition of the "Zombie Internet", which starts: "It’s people talking to bots, people talking to people, people creating “AI agents” and then instructing them to interact with people. It’s people using AI talking to people who are not using AI [...]"

@simonwillison.net

May 11, 03:04 PM·❤️ 72🔄 8·💬 8

AgentLLM

A short note that the predictions that LLMs would favor 'boring technology' that's once you attach them to a good coding agent harness at least

@simonw

May 11, 04:29 AM

Highlights: LLMs attached to coding agents may favor boring technology, challenging earlier predictions.

Worth reading: Insight into how LLM behavior changes when integrated with agentic harnesses.

AgentLLMTooling

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

@simonw

May 11, 04:29 AM

Highlights: Key skill for coding agents is knowing when to intervene.

Worth reading: Practical advice for developers using AI coding assistants.

AgentTooling

This may be the best guidance I've seen anywhere on writing a really good commit history. My ideal commit combines

@simonw

May 11, 04:29 AM

Highlights: Recommends best guidance for writing commit history.

Worth reading: Useful for developers aiming to improve their commit practices.

Tooling

A short note that the predictions that LLMs would favor 'boring technology' that's once you attach them to a good coding agent harness at least

@simonw

May 10, 04:24 AM

Highlights: LLMs may favor boring technology when attached to a good coding agent harness.

Worth reading: Challenges the assumption that LLMs always prefer boring tech.

AgentLLM

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

@simonw

May 10, 04:24 AM

Highlights: Key skill with coding agents: intuition for when not to intervene.

Worth reading: Insight into effective human-agent collaboration.

AgentTooling

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

May 10, 04:24 AM

Highlights: Vibe coding defined as irresponsible software development.

Worth reading: Critical perspective on a trending coding approach.

AgentSafety

microsoft/delegate52

Python⭐ 68·starred by simonw

Code that accompanies the paper release for "LLMs Corrupt Your Documents When You Delegate"

Highlights: This repository provides code accompanying a paper that reveals a critical vulnerability in LLM-based delegation: when you delegate document processing to an LLM, it can corrupt your documents. It includes simulation tools to reproduce and study this failure mode, highlighting risks in long-horizon tasks.

Worth reading: It exposes a subtle but important failure mode in LLM agents that is often overlooked, making it essential for anyone building or deploying LLM-based automation.

delegationllmslong-horizonsimulation

AgentSafety

Mission accomplished: tap danced in the big community college dance recital for the second time

@simonwillison.net

May 9, 04:35 AM·❤️ 103🔄 0·💬 4

roborev-dev/roborev

Go⭐ 943·starred by simonw

Continuous background code review database for agents, work faster and smarter with accountability for every line of generated code.

Highlights: Roborev provides a continuous background code review database specifically designed for AI agents, ensuring accountability for every line of generated code. It helps developers work faster and smarter by automatically tracking and reviewing code changes.

Worth reading: As AI-generated code becomes more prevalent, Roborev addresses the critical need for accountability and review, making it a timely tool for teams leveraging AI agents in development.

AgentTooling

Just realized that the reason I like TikTok so much is that it's lightning talks! I've always loved lightning talks

@simonwillison.net

May 8, 05:57 PM·❤️ 40🔄 1·💬 2

antirez/ds4

C⭐ 2,657·starred by simonw

DeepSeek 4 Flash local inference engine for Metal

Highlights: A local inference engine for DeepSeek 4 Flash optimized for Apple Metal, enabling fast LLM inference on Mac hardware. Written in C for performance, it provides a lightweight alternative to cloud-based inference.

Worth reading: Essential for AI developers using Apple Silicon who want to run DeepSeek models locally with minimal overhead and high speed.

LLMDeployment

Under-reported details of the xAI/Anthropic Colossus data center deal: Anthropic get Colossus 1 but xAI keep using the larger Colossus 2, Colossus 1 has a REALLY bad environmental record, and xAI just shut down a bunch of older models on 2 weeks' notice simonwillison.net/2026/May/7/x...

@simonwillison.net

May 7, 05:13 PM·❤️ 123🔄 24·💬 5

InfraDeployment

Notes on the xAI/Anthropic data center deal

Simon Willison·May 7, 2026

There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far was the deal they've struck with SpaceX/xAI to use "all of the capacity of their Colossus data center". As I mentioned in my <a...

Highlights: Anthropic has struck a deal with xAI to use the full capacity of the Colossus data center, signaling a major infrastructure collaboration. This move highlights the escalating demand for compute resources in AI development and the strategic partnerships forming to secure them.

Worth reading: The post offers a clear analysis of the implications of this deal for the AI industry, particularly around resource consolidation and competitive dynamics.

Blog

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

May 7, 04:18 AM

Highlights: Simon observes that AI labs are increasingly focused on improving code generation capabilities.

Worth reading: Reflects a key trend in AI development priorities.

LLMDeployment

I'm at the Claude w/ Code event in San Francisco, and I'll be live blogging the keynote here: simonwillison.net/2026/May/6/c...

@simonwillison.net

May 6, 03:59 PM·❤️ 78🔄 10·💬 1

LLMTooling

Live blog: Code w/ Claude 2026

Simon Willison·May 6, 2026

I'm at Anthropic's Code w/ Claude event today. Here's my live blog of the morning keynote sessions.You are only seeing the long-form articles from my blog. Subscribe to <a href="https://simonwillison.net/atom/everything/">/atom/everything/</a> to get all of my posts, or take a look at...

Highlights: Anthropic's Code w/ Claude event showcases new capabilities for AI-assisted coding, including improved code generation, debugging, and collaborative features. The live blog format provides real-time insights into keynote sessions, highlighting practical applications and future directions for Claude in software development.

Worth reading: For developers interested in the cutting edge of AI coding tools, this live blog offers firsthand observations of Claude's latest features and Anthropic's vision for AI-assisted programming.

Blog

I was talking with Joseph Ruscio on the @heavybit.com podcast the other day when I realized that vibe coding and agentic engineering have started to blur a bit in some of my work - I published some extracts from the transcript simonwillison.net/2026/May/6/v...

@simonwillison.net

May 6, 02:57 PM·❤️ 37🔄 6·💬 5

AgentTooling

Vibe coding and agentic engineering are getting closer than I'd like

Simon Willison·May 6, 2026

I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: <a href="https://www.heavybit.com/library/podcasts/high-leverage/ep-9-the-ai-coding-paradigm-shift-with-simon-willison">Ep. #9, The AI Coding Paradigm Shift with Simon Willison</a>. Here are some of...

Highlights: The post discusses the convergence of 'vibe coding' (using AI to generate code without fully understanding it) and 'agentic engineering' (autonomous AI agents that build software), warning that as these approaches advance, developers risk losing control over code quality and security. It emphasizes the need for human oversight and testing, especially as AI-generated code becomes more complex and harder to audit.

Worth reading: It offers a nuanced perspective on the risks of over-relying on AI coding tools, making it valuable for developers and tech leaders navigating the shift toward AI-assisted software development.

Blog

AI-run business experiments are interesting and fun up to the point where they waste the time of humans who haven't opted into the experiments - I think they need to keep their own human operators in the loop for outbound actions that affect other people simonwillison.net/2026/May/5/o...

@simonwillison.net

May 5, 10:17 PM·❤️ 42🔄 11·💬 4

AgentSafety

asg017/liblotus

Rust⭐ 2·starred by simonw

Highlights: Liblotus is a Rust library for building fast, embeddable vector search indexes with support for hybrid search (sparse + dense vectors). It offers efficient indexing and querying for semantic search applications.

Worth reading: With only 2 stars but starred by Simon Willison, this early-stage project could become a key tool for lightweight, local vector search in AI applications.

RAGInfra

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

May 5, 04:06 AM

Highlights: Simon Willison criticizes 'vibe coding' as building software irresponsibly without regard for code quality.

Worth reading: It offers a critical perspective on a trendy but potentially dangerous development approach.

SafetyTooling

This may be the best guidance I've seen anywhere on writing a really good commit history.

@simonw

May 5, 04:06 AM

Highlights: Simon Willison praises guidance on writing good commit history.

Worth reading: It highlights best practices for software development and version control.

Tooling

A short note that the predictions that LLMs would favor 'boring technology' that's once you attach them to a good coding agent harness at least

@simonw

May 5, 04:06 AM

Highlights: Simon Willison notes that LLMs favor boring technology when attached to a good coding agent harness.

Worth reading: It provides insight into how LLMs interact with coding tools and technology choices.

AgentLLM

I tried running the same "Generate an SVG of a pelican riding a bicycle" prompt against 21 different quantized variants of the same IBM Granite 4.1 3B model - the results weren't as interesting as I had hoped simonwillison.net/2026/May/4/g...

@simonwillison.net

May 4, 11:50 PM·❤️ 27🔄 0·💬 6

EvaluationDeployment

danshapiro/ringdown

Python⭐ 19·starred by simonw

Highlights: Ringdown is a lightweight Python tool for recording and replaying HTTP responses, useful for testing and development. It simplifies mocking external APIs by capturing real responses and serving them offline.

Worth reading: It offers a simple, practical approach to HTTP recording that can speed up development and testing workflows, especially for projects relying on external APIs.

Tooling

danshapiro/trycycle

Python⭐ 173·starred by simonw

Highlights: Trycycle is a tool that helps developers iterate quickly on AI prompts by automatically generating and testing variations, making it easier to find the best prompt for a given task. It integrates with popular AI models and provides a simple CLI interface for prompt experimentation.

Worth reading: For developers working with LLMs, Trycycle offers a practical way to systematically improve prompts, saving time and effort in prompt engineering.

ToolingEvaluation

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

May 4, 04:21 AM

Highlights: Simon observes that AI labs are increasingly focused on improving code generation capabilities as a primary objective.

Worth reading: Reflects a key trend in AI development priorities.

LLMTooling

I've published video, slides and a detailed annotated transcript from my talk at this week's

@simonw

May 4, 04:21 AM

Highlights: Simon shares materials from a talk about the last year six months in LLMs, illustrated by pelicans on bicycles.

Worth reading: Provides a creative and insightful overview of recent LLM developments.

LLM

This may be the best guidance I've seen anywhere on writing a really good commit history.

@simonw

May 4, 04:21 AM

Highlights: Simon recommends guidance on writing good commit history.

Worth reading: Useful for developers aiming to improve their version control practices.

Tooling

The AI auto-reply bots from Twitter (fun fact, the software category is genuinely called "reply guy" tools) have started showing up on Bluesky now and it really, really sucks

@simonwillison.net

May 3, 03:53 PM·❤️ 195🔄 11·💬 19

Tooling

once you attach them to a good coding agent harness at least

@simonw

Highlights: LLMs become more effective when integrated into a robust coding agent framework.

Worth reading: Highlights the importance of tooling around LLMs for practical coding tasks.

AgentTooling

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

@simonw

Highlights: Effective use of coding agents requires knowing when to rely on them and when not to.

Worth reading: Emphasizes the human skill of judgment in AI-assisted coding.

AgentTooling

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

Highlights: Criticizes 'vibe coding' as an irresponsible approach to software development.

Worth reading: Warns against over-reliance on AI without proper oversight.

SafetyDeployment

This may be the best guidance I've seen anywhere on writing a really good commit history.

@simonw

Highlights: Praises a resource on writing excellent commit messages.

Worth reading: Reflects Simon's interest in software craftsmanship and best practices.

Tooling

I added a new feature to my blog (built entirely on my phone with Claude code for web) that imports my iNaturalist photos and adds them to my site's overall timeline simonwillison.net/2026/May/2/s...

@simonwillison.net

May 2, 05:29 PM·❤️ 48🔄 0·💬 6

Tooling

Saw this white-crowned sparrow having a lot of a sing

@simonwillison.net

Apr 30, 05:22 PM·❤️ 222🔄 24·💬 8

It's interesting how "better at code" has become the defining goal of almost every AI lab over the...

@simonw

Highlights: Simon notes that AI labs are overwhelmingly focused on improving code generation capabilities.

Worth reading: Reflects a key trend in AI development priorities.

LLMEvaluation

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a...

@simonw

Highlights: Simon argues that leaving programming due to LLMs is premature, comparing it to quitting carpentry due to power tools.

Worth reading: Provides perspective on AI's impact on software careers.

LLMSafety

The last year six months in LLMs, illustrated by pelicans on bicycles. I've published video, slides and a detailed annotated transcript from my talk at this week's...

@simonw

Highlights: Simon shared a talk summarizing LLM developments with a pelican-on-bicycle analogy.

Worth reading: Creative summary of LLM progress over six months.

LLM

[Image: a stylish image of a 3D computer game, with two raccoons sneaking down a street past a futuristic looking building... Prompt was: "Screenshot from a video game where a team of raccoons go on a heist"]

@simonw

Highlights: Simon posted an AI-generated image of raccoons on a heist.

Worth reading: Showcases creative use of AI image generation.

Multi-modal

The Zig project's rationale for their blanket ban on AI-assisted contributions makes a lot of sense to me - for them, time spent reviewing PRs isn't about the code, it's about growing new contributors for the future of the project simonwillison.net/2026/Apr/30/...

@simonwillison.net

Apr 30, 01:26 AM·❤️ 75🔄 7·💬 5

Tooling

Our evaluation of OpenAI's GPT-5.5 cyber capabilities. The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now.

@simonw

Apr 30, 12:00 AM

Highlights: GPT-5.5 is comparable to Claude Mythos in finding security vulnerabilities and is generally available.

Worth reading: Highlights the security capabilities of a widely accessible model.

SafetyEvaluation

I released LLM 0.32a0 this morning, a major backwards-compatible refactor of my LLM Python library and CLI tool for working with language models - the new changes should help LLM work better with reasoning models and other new frontier capabilities simonwillison.net/2026/Apr/29/...

@simonwillison.net

Apr 29, 07:13 PM·❤️ 51🔄 3·💬 5

LLMDeploymentTooling

LLM 0.32a0 is a major backwards-compatible refactor

Simon Willison·Apr 29, 2026

I just released <a href="https://llm.datasette.io/en/latest/changelog.html#a0-2026-04-28">LLM 0.32a0</a>, an alpha release of my <a href="https://llm.datasette.io/">LLM</a> Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a...

Highlights: LLM 0.32a0 is a major refactor that prioritizes backwards compatibility while introducing significant internal changes for future extensibility. The alpha release aims to stabilize new APIs and data structures, allowing plugin authors to adapt before the stable release.

Worth reading: If you use or build plugins for the LLM tool, this post details critical architectural shifts that will affect your workflow, making it essential for staying up-to-date with the ecosystem's evolution.

Blog

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

Apr 29, 04:15 AM

Highlights: Simon Willison notes that AI labs are increasingly focused on improving code generation as a primary goal.

Worth reading: Reflects a key trend in AI development priorities.

LLMTooling

This may be the best guidance I've seen anywhere on writing a really good commit history.

@simonw

Apr 29, 04:15 AM

Highlights: Simon Willison praises guidance on writing good commit history.

Worth reading: Useful for developers aiming to improve version control practices.

Tooling

llm 0.32a0 alpha: major backwards-compatible refactor. Models can now be prompted with a list of messages, OpenAI Chat Completions style.

@simonw

Apr 29, 12:00 AM

Highlights: Alpha refactor enables message list prompting in llm CLI.

Worth reading: Important update for users of the llm tool.

ToolingLLM

deeleeramone/PyWry

Python⭐ 36·starred by simonw

PyWry is a cross-platform app factory, rendering engine and UI toolkit for Python that produces native desktop, web, and notebook experiences from a single API.

Highlights: PyWry is a cross-platform app factory that lets you build native desktop, web, and notebook experiences from a single Python API. It leverages Tauri and WebView2 for rendering, and integrates with Jupyter, Plotly, and MCP servers, making it a versatile tool for creating rich interactive applications.

Worth reading: It bridges Python desktop development with modern web technologies and AI tooling (e.g., MCP, Claude Code), offering a unique approach to building full-stack AI interfaces.

aggridanywidgetchat-applicationclaude-codeclaude-code-plugin

ToolingDeployment

I would very much like to see the 2,000 lb stellar sea lion at San Francisco Pier 39, who I believe has now been named "Chonkers" Does anyone know if he keeps a regular schedule?

@simonwillison.net

Apr 28, 01:35 PM·❤️ 89🔄 7·💬 6

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

Apr 28, 04:18 AM

Highlights: Simon Willison observes that improving code generation has become the primary objective for AI labs.

Worth reading: Highlights a key trend in AI development priorities.

LLMEvaluation

I came up with a somewhat foolish new benchmark for testing image generation models, to exercise the new ChatGPT Images 2.0:

@simonw

Apr 28, 04:18 AM

Highlights: Simon Willison created a benchmark for testing image generation models, specifically for ChatGPT Images 2.0.

Worth reading: Shows creative evaluation of AI image generation capabilities.

Multi-modalEvaluation

Some notes on talkie, a new "vintage language model" from a team including Alec Radford (yes, that Alec Radford) "trained on 260B tokens of historical pre-1931 English text" simonwillison.net/2026/Apr/28/...

@simonwillison.net

Apr 28, 02:49 AM·❤️ 20🔄 3·💬 2

LLMFine-tuning

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. — OpenAI Codex

@simonw

Apr 28, 12:00 AM

Highlights: OpenAI Codex includes an instruction to avoid discussing certain animals unless relevant.

Worth reading: Reveals an interesting constraint in AI model instructions.

LLMSafety

Love this so much (Also definitive proof that humans are so much better than machines)

@simonwillison.net

Apr 27, 11:58 PM·❤️ 102🔄 11·💬 2

Microsoft's MIT licensed VibeVoice speech-to-text model (think Whisper with speaker diarization) is really good - my notes on running the 5.71GB 4bit MLX conversion on an M5 MacBook, using about 60GB of RAM at peak and transcribing 1hr of audio in ~9 mins simonwillison.net/2026/Apr/27/...

@simonwillison.net

Apr 27, 11:49 PM·❤️ 72🔄 5·💬 8

DeploymentTooling

Today OpenAI announced that "Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress" That "independent of OpenAI’s technology progress" fragment appears to mean that the weird AGI clause is now deceased simonwillison.net/2026/Apr/27/...

@simonwillison.net

Apr 27, 06:39 PM·❤️ 54🔄 5·💬 5

LLMInfra

Tracking the history of the now-deceased OpenAI Microsoft AGI clause

Simon Willison·Apr 27, 2026

For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's commercial IP rights to OpenAI's technology would be null and void. That clause appeared to end today. I decided to try and track its expression over time on <a...

Highlights: The AGI clause in the OpenAI-Microsoft contract, which would void Microsoft's IP rights upon AGI achievement, has been removed, signaling a shift in their partnership. This change may reflect OpenAI's evolving definition of AGI or strategic realignment.

Worth reading: It offers a fascinating historical tracking of a pivotal contractual clause, shedding light on the evolving relationship between two AI giants and the elusive concept of AGI.

Blog

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

Apr 27, 04:16 AM

Highlights: Simon notes that AI labs are increasingly focusing on improving code generation as a primary objective.

Worth reading: Reflects a key trend in AI development priorities.

LLMDeployment

llm 0.31 released: supports GPT-5.5 and adds a verbosity parameter for controlling output detail on OpenAI's latest models.

@simonw

Apr 27, 12:00 AM

Highlights: New version of llm CLI tool adds GPT-5.5 support and verbosity control.

Worth reading: Useful for developers using OpenAI models via command line.

ToolingLLM

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

Apr 26, 04:11 AM

Highlights: Simon Willison observes that improving code generation has become the primary objective for AI labs.

Worth reading: Reflects a key trend in AI development priorities.

LLMEvaluation

I think ChatGPT Images 2.0 deciding to add a "WHY ARE YOU LIKE THIS" sign to the background of this image is the first time I've felt a glimpse of AGI simonwillison.net/2026/Apr/25/...

@simonwillison.net

Apr 25, 04:46 PM·❤️ 214🔄 20·💬 14

Multi-modal

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

@simonw

Apr 25, 04:01 AM

Highlights: Simon suggests that a key skill with coding agents is knowing when to step back.

Worth reading: Highlights a nuanced skill for effective human-AI collaboration in coding.

AgentTooling

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

@simonw

Apr 25, 04:01 AM

Highlights: Simon defines 'vibe coding' as irresponsible software development.

Worth reading: Critiques a trend in AI-assisted coding that prioritizes speed over quality.

AgentEvaluation

DeepSeek V4 - almost on the frontier, a fraction of the price

Simon Willison·Apr 24, 2026

Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a...

Highlights: DeepSeek V4 preview models achieve near-frontier performance at a fraction of the cost, challenging the pricing strategies of leading AI labs. The release signals a major shift towards cost-efficient AI development, making advanced models more accessible.

Worth reading: For those tracking AI economics and model performance trade-offs, this post offers a clear analysis of how DeepSeek's pricing and capabilities compare to competitors, highlighting a potential trend in the industry.

Blog

DeepSeek V4 - almost on the frontier, a fraction of the price.

@simonw

Apr 24, 12:00 AM

Highlights: DeepSeek V4 offers near-frontier performance at a much lower cost.

Worth reading: Highlights a cost-effective alternative to top-tier models.

LLMDeployment

Extract PDF text in your browser with LiteParse for the web

Simon Willison·Apr 23, 2026

LlamaIndex have a most excellent open source project called <a href="https://github.com/run-llama/liteparse">LiteParse</a>, which provides a Node.js CLI tool for extracting text from PDFs. I got a version of LiteParse working entirely in the browser, using most of the same libraries that...

Highlights: LiteParse is a Node.js CLI tool for extracting text from PDFs, and this post shows how to run it entirely in the browser using the same libraries. The key insight is that many server-side tools can be adapted for client-side execution, enabling new interactive applications without backend dependencies.

Worth reading: It demonstrates a practical approach to porting a server-side tool to the browser, which is valuable for developers looking to build offline-capable or low-latency PDF processing features.

Blog

A pelican for GPT-5.5 via the semi-official Codex backdoor API

Simon Willison·Apr 23, 2026

<a href="https://openai.com/index/introducing-gpt-5-5/">GPT-5.5 is out</a>. It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to...

Highlights: GPT-5.5 is now available via OpenAI Codex and rolling out to paid ChatGPT users. The author finds it fast, effective, and highly capable, with improvements in coding and reasoning tasks.

Worth reading: Simon Willison provides early hands-on impressions of GPT-5.5, highlighting its performance and the novel 'Codex backdoor' access method, which is valuable for developers tracking OpenAI's latest model capabilities.

Blog

Is Claude Code going to cost $100/month? Probably not - it's all very confusing

Simon Willison·Apr 22, 2026

Anthropic today quietly (as in silently, no announcement anywhere at all) updated their <a href="https://claude.com/pricing">claude.com/pricing</a> page (but not their <a href="https://support.claude.com/en/articles/11049762-choosing-a-claude-plan">Choosing a Claude plan page</a>, which...

Highlights: Anthropic made unannounced pricing changes to Claude Code, creating confusion about potential costs. The author analyzes the discrepancies between different official pages to clarify the actual pricing structure.

Worth reading: It provides valuable insight into how AI companies communicate pricing changes and helps users navigate confusing documentation to understand actual costs.

Blog

Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

Simon Willison·Apr 21, 2026

OpenAI <a href="https://openai.com/index/introducing-chatgpt-images-2-0/">released ChatGPT Images 2.0 today</a>, their latest image generation model. On <a href="https://www.youtube.com/watch?v=sWkGomJ3TLI">the livestream</a> Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was...

Highlights: OpenAI's ChatGPT Images 2.0 represents a significant leap forward from its predecessor, with Sam Altman highlighting major improvements in image generation capabilities. The post explores the technical advancements and practical implications of this new model release.

Worth reading: It provides timely analysis of a major AI development from a respected technical voice, with insights into how this upgrade might impact creative and practical applications of image generation.

Blog

Changes in the system prompt between Claude Opus 4.6 and 4.7

Simon Willison·Apr 18, 2026

Anthropic are the only major AI lab to <a href="https://platform.claude.com/docs/en/release-notes/system-prompts">publish the system prompts</a> for their user-facing chat systems. Their system prompt archive now dates all the way back to Claude 3 in July 2024 and it's always interesting to see...

Highlights: Anthropic uniquely publishes system prompts for their Claude models, providing transparency into AI development. The archive now includes prompts dating back to Claude 3 in July 2024, allowing for tracking of how these foundational instructions evolve.

Worth reading: It offers rare insight into how AI companies shape model behavior through system prompts, which is valuable for understanding AI development practices and transparency.

Blog

Join us at PyCon US 2026 in Long Beach - we have new AI and security tracks this year

Simon Willison·Apr 17, 2026

This year's <a href="https://us.pycon.org/2026/">PyCon US</a> is coming up next month from May 13th to May 19th, with the core conference talks from Friday 15th to Sunday 17th and tutorial and sprint days either side. It's in Long Beach, California this year, the first time PyCon US has come to...

Highlights: PyCon US 2026 introduces dedicated AI and security tracks, reflecting Python's growing role in these critical domains. The conference expands beyond traditional Python development to address emerging technical challenges and opportunities.

Worth reading: It provides timely information about new AI-focused conference tracks for Python developers interested in staying current with industry trends.

Blog

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Simon Willison·Apr 16, 2026

For anyone who has been (inadvisably) taking my <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">pelican riding a bicycle benchmark</a> seriously as a robust way to test models, here are pelicans from this morning's two big model releases - <a...

Highlights: The post demonstrates that the Qwen3.6-35B-A3B model, running locally on a laptop, generated a more accurate or aesthetically pleasing image of a pelican riding a bicycle compared to the larger, cloud-based Claude Opus 4.7 model. This highlights the rapid progress in open-source, locally runnable AI models that can now compete with or surpass leading proprietary models in specific creative tasks.

Worth reading: It offers a tangible, visual comparison of recent model capabilities, challenging assumptions about the necessity of large, cloud-based models for creative AI tasks and showcasing the practical potential of local AI deployment.

Blog

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Simon Willison·Apr 8, 2026

Meta <a href="https://ai.meta.com/blog/introducing-muse-spark-msl/">announced Muse Spark</a> today, their first model release since Llama 4 <a href="https://simonwillison.net/2025/Apr/5/llama-4-notes/">almost exactly a year ago</a>. It's hosted, not open weights, and the API is currently "a...

Highlights: Meta's Muse Spark represents their first major model release in about a year, following Llama 4. Unlike previous models, Muse Spark is a hosted service rather than open weights, indicating a shift in Meta's AI deployment strategy.

Worth reading: The post provides timely analysis of Meta's strategic pivot in AI model distribution and highlights new tools available through meta.ai chat that developers and researchers should explore.

Blog

Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me

Simon Willison·Apr 7, 2026

Anthropic didn't release their latest model, Claude Mythos (<a href="https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf">system card PDF</a>), today. They have instead made it available to a very restricted set of preview partners under their newly announced <a...

Highlights: Anthropic is taking a cautious approach with Claude Mythos by restricting access to security researchers through Project Glasswing, rather than releasing it publicly. This reflects growing industry awareness of AI safety risks and the need for controlled testing before broader deployment.

Worth reading: The post offers timely insight into how leading AI companies are balancing innovation with safety, providing context on current industry practices around responsible AI deployment.

Blog

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

@simonw

Oct 20, 12:00 AM

Highlights: Simon observes that AI labs are focused on improving code generation.

Worth reading: Reflects a key trend in AI development.

LLMAgent