2026-04-20
Self-healing browser harness that enables LLMs to complete any task.
A visualization grammar.
Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.
New TIL on fetching data from a Datasette instance into Google Sheets using importdata(), named custom functions or Google Apps Script til.simonwillison.net/google-sheet...
I upgraded my Claude token counter tool to compare different models and Opus 4.7 appears to use 1.46x times the tokens for text and up to 3x the tokens for images - it's priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump simonwillison.net/2026/Apr/20/...
@canadiens.com win game 1 against Tampa Bay!! 🤩🙌 Slafkovsky with the game-winning goal in OT and a hat trick! He had a great game! Anderson too! What an exciting game overall! #gohabsgo #montreal #canadiens youtu.be/m4ic4oYfapY?...
The imaginary optimal selfish scenario for OpenAI, in retrospect, was to keep Reasoners a secret, skip releasing o1 and o1-preview, and release o3 as GPT-5 There would have been no Deep Seek moment, other labs may not have discovered Reasoners quickly, and OpenAI's lead would have been hard to beat
Congratulations to @upolehsan.bsky.social, who won the Georgia Tech College of Computing Doctoral Dissertation Award. The impact of his work on human-centered explainable AI (XAI) cannot be understated. The last chapter of Upol's dissertation also just won an Honorable Mention at CHI. #ProudAdvisor
A TLDR is that unless the training dynamics of leading LLMs change or open model builders run out of money, this ~6 month performance gap from closed to open models is here to stay. www.interconnects.ai/p/reading-to...
Getting LLMs to simulate “true” randomness or generate diverse outputs is surprisingly difficult. We found a simple prompting trick that solves this by having the model generate and manipulate a random string. To be presented at #ICLR2026 this week! Blog: pub.sakana.ai/ssot
I am very proud of our team for releasing EDINET-Bench, and it is fantastic to see a Japanese financial dataset recognized at #ICLR2026 this week. We need more diverse, non-English datasets to evaluate models in the real world. Paper: openreview.net/forum?id=Dxn...
Je suis passé à Découverte de @cbcradiocanada.bsky.social pour discuter des risques de l’IA, des raisons scientifiques qui expliquent certains des comportements inquiétants des modèles, et des solutions techniques sur lesquelles nous travaillons à @law-zero.bsky.social pour une IA plus sécuritaire.
Classic study gave 146 economist teams the same dataset & got wildly different answers New paper reruns it with agentic AI. Claude Code & Codex land near the human median but with far tighter dispersion & no extremes This suggests that agentic AI is now useful for doing scalable economics research
Wait, how did this get into my feed without @hypervisible.blacksky.app already quoting it
If your ostensibly critical paper talks about "recent advances in AI" I have a hard time taking it seriously. Advances towards what? Measured how?
Identifying the elements of a theory of engineering architecture.
The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future.