Intelligence.Log

2026-04-20

Extracted: 69 items. Sources: GitHub, Bluesky, X, Blogs.

++ AI OVERVIEW ++

Today's discussions highlight the evolving landscape of AI model efficiency and strategic release cycles. Simon Willison's analysis reveals significant token usage increases in Anthropic's latest Opus model, a critical cost consideration for developers. His parallel work on bridging Datasette and Google Sheets showcases the ongoing push for practical data tool integration. Meanwhile, Ethan Mollick reflects on the competitive dynamics spurred by open models, suggesting OpenAI's staged "Reasoner" releases were a strategic response to community innovation.

grep TOPIC=

grep SOURCE=

sort --by=

browser-use/browser-harness★ 3.7k▲ 7/10

Self-healing browser harness that enables LLMs to complete any task.

Starred bythomwolf|[Agent][Tooling]

“A self-healing browser harness that enables LLMs to autonomously complete web-based tasks with built-in error recovery. It provides robust automation capabilities for AI agents interacting with dynamic web environments.”

vega/vega★ 11.8k▲ 5/10

A visualization grammar.

Starred byminimaxir|[Tooling]

“Vega provides a declarative visualization grammar for creating interactive graphics through JSON specifications. It enables data-driven visualizations that work across multiple rendering targets (SVG, Canvas) with a consistent API.”

chrislgarry/Apollo-11★ 67.4k▲ 3/10

Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.

Starred bysimonw|

“This repository contains the original Apollo 11 Guidance Computer source code, providing authentic historical documentation of the software that landed humans on the moon. It offers a rare look at 1960s-era assembly programming for mission-critical aerospace systems.”

BSKY

Simon WillisonApr 20, 02:53 AM

New TIL on fetching data from a Datasette instance into Google Sheets using importdata(), named custom functions or Google Apps Script til.simonwillison.net/google-sheet...

❤️ 16 Likes|[Tooling][Deployment]

BSKY

Simon WillisonApr 20, 12:55 AM

I upgraded my Claude token counter tool to compare different models and Opus 4.7 appears to use 1.46x times the tokens for text and up to 3x the tokens for images - it's priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump simonwillison.net/2026/Apr/20/...

❤️ 91 Likes|[LLM][Evaluation]

BSKY

Marc LanctotApr 20, 01:26 AM

@canadiens.com win game 1 against Tampa Bay!! 🤩🙌 Slafkovsky with the game-winning goal in OT and a hat trick! He had a great game! Anderson too! What an exciting game overall! #gohabsgo #montreal #canadiens youtu.be/m4ic4oYfapY?...

❤️ 3 Likes|

BSKY

Ethan MollickApr 20, 02:37 AM

The imaginary optimal selfish scenario for OpenAI, in retrospect, was to keep Reasoners a secret, skip releasing o1 and o1-preview, and release o3 as GPT-5 There would have been no Deep Seek moment, other labs may not have discovered Reasoners quickly, and OpenAI's lead would have been hard to beat

❤️ 14 Likes|[LLM][Deployment]

BSKY

Mark RiedlApr 20, 06:57 PM

You're welcome

❤️ 5 Likes|

BSKY

Mark RiedlApr 20, 06:11 PM

Congratulations to @upolehsan.bsky.social, who won the Georgia Tech College of Computing Doctoral Dissertation Award. The impact of his work on human-centered explainable AI (XAI) cannot be understated. The last chapter of Upol's dissertation also just won an Honorable Mention at CHI. #ProudAdvisor

❤️ 14 Likes|[Safety]

BSKY

Marc LanctotApr 20, 02:52 PM

@timhortonsofficial.bsky.social gets it! 🏒🇨🇦☕️😁 #gohabsgo

❤️ 4 Likes|

BSKY

Nathan LambertApr 20, 07:43 PM

A TLDR is that unless the training dynamics of leading LLMs change or open model builders run out of money, this ~6 month performance gap from closed to open models is here to stay. www.interconnects.ai/p/reading-to...

❤️ 15 Likes|[LLM][Evaluation]

BSKY

hardmaruApr 20, 03:20 PM

Getting LLMs to simulate “true” randomness or generate diverse outputs is surprisingly difficult. We found a simple prompting trick that solves this by having the model generate and manipulate a random string. To be presented at #ICLR2026 this week! Blog: pub.sakana.ai/ssot

❤️ 20 Likes|[LLM][Evaluation]

BSKY

hardmaruApr 20, 01:15 PM

I am very proud of our team for releasing EDINET-Bench, and it is fantastic to see a Japanese financial dataset recognized at #ICLR2026 this week. We need more diverse, non-English datasets to evaluate models in the real world. Paper: openreview.net/forum?id=Dxn...

❤️ 15 Likes|[Evaluation]

BSKY

Yoshua BengioApr 20, 09:54 PM

Je suis passé à Découverte de @cbcradiocanada.bsky.social pour discuter des risques de l’IA, des raisons scientifiques qui expliquent certains des comportements inquiétants des modèles, et des solutions techniques sur lesquelles nous travaillons à @law-zero.bsky.social pour une IA plus sécuritaire.

❤️ 10 Likes|[Safety]

BSKY

Ethan MollickApr 20, 10:55 PM

Classic study gave 146 economist teams the same dataset & got wildly different answers New paper reruns it with agentic AI. Claude Code & Codex land near the human median but with far tighter dispersion & no extremes This suggests that agentic AI is now useful for doing scalable economics research

❤️ 54 Likes|[Agent][Evaluation]

BSKY

Emily M. BenderApr 20, 01:51 PM

Wait, how did this get into my feed without @hypervisible.blacksky.app already quoting it

❤️ 16 Likes|

BSKY

Emily M. BenderApr 20, 01:22 PM

If your ostensibly critical paper talks about "recent advances in AI" I have a hard time taking it seriously. Advances towards what? Measured how?

❤️ 67 Likes|[Evaluation]

BSKY

Emily M. BenderApr 20, 01:01 PM

Today!

❤️ 11 Likes|[Safety]

BSKY

Ben RechtApr 20, 02:38 PM

Identifying the elements of a theory of engineering architecture.

❤️ 5 Likes|[Infra][Deployment]

Andrej Karpathy@karpathy

Very interested in what the coming era of highly bespoke software might look like. Example from this morning - I've become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over https://t.co/EDULdIpWmE

[Tooling]

“DeepSeek Summary: Karpathy is experimenting with personalized software for health tracking, specifically aiming to lower his resting heart rate through a structured approach.”

Andrej Karpathy@karpathy

2025 LLM Year in Review

[LLM][Evaluation]

“DeepSeek Summary: Karpathy published a review article summarizing key developments and trends in the LLM field for the year 2025.”

Andrej Karpathy@karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in

[Agent][LLM][Tooling]

“DeepSeek Summary: Karpathy's coding workflow has dramatically shifted from mostly manual coding to predominantly using AI agents for code generation, with human input reduced to editing and touch-ups.”

Simon Willison@simonw

I built two new tools to help coding agents demonstrate their work beyond just running

[Agent][Tooling]

“DeepSeek Summary: Simon Willison developed new tools to improve how coding agents showcase their work processes and outputs.”

Harrison Chase@hwchase17

This means that operations you would do on code in the software world, you now do on traces in the agent world. Debugging, testing, profiling

[Agent][Evaluation][Tooling]

“DeepSeek Summary: Drawing parallels between software engineering practices and agent operations, emphasizing trace-based debugging, testing, and profiling.”

Harrison Chase@hwchase17

TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this

[Agent][Infra][Deployment]

“DeepSeek Summary: Advocating for agent workspaces or sandboxes as essential infrastructure for running code, managing dependencies, and accessing files.”

Harrison Chase@hwchase17

RT @samecrowder: as always, it's an exciting time to be working at LangChain!

[Agent][Tooling]

“DeepSeek Summary: Retweeting a positive sentiment about working at LangChain, indicating endorsement of the message.”

Jim Fan@DrJimFan

In this context, I define world modeling as predicting the next plausible world state (or a longer duration of states) conditioned on an action.

[Agent]

“DeepSeek Summary: Defines world modeling as predicting future world states based on actions. Focuses on the relationship between actions and state transitions in AI systems.”

Jim Fan@DrJimFan

I've been a bit quiet on X recently. The past year has been a transformational experience.

“DeepSeek Summary: Acknowledges a period of reduced public posting, attributing it to a significant personal or professional transformation over the past year.”

Jim Fan@DrJimFan

We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly

“DeepSeek Summary: Observes that a non-US company is now upholding OpenAI's founding mission, suggesting a shift in the AI leadership landscape.”

Jim Fan@DrJimFan

It gives me a lot of comfort knowing that we are the last generation without advanced robots everywhere.

“DeepSeek Summary: Expresses a personal sense of comfort from being part of the final human generation before ubiquitous advanced robotics.”

Jeremy Howard@jeremyphoward

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in

[Agent][LLM][Evaluation]

“DeepSeek Summary: Jeremy Howard replicated a finding that Grok AI heavily prioritizes discovering Elon Musk's opinions.”

Jeremy Howard@jeremyphoward

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks.

[Agent][LLM][Evaluation]

“DeepSeek Summary: Howard demonstrates that when asked about a complex geopolitical issue, Grok's first action is to search for Elon Musk's perspective on Twitter.”

Jeremy Howard@jeremyphoward

Something that drives me to distraction in discussion of AI alignment: someone will say 'Oh, it's crucial we build systems with properties X'

[Safety][LLM]

“DeepSeek Summary: Jeremy Howard expresses frustration with vague, non-actionable statements in AI alignment debates.”

Soumith Chintala@soumithchintala

reading 'AI News' (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM][Tooling]

“DeepSeek Summary: Soumith Chintala recommends 'AI News' (formerly Smol Talk) as a highly valuable 45-minute activity for staying informed about AI developments.”

Soumith Chintala@soumithchintala

Sometimes we forget that NVIDIA wins because it's a software company.

[Infra][Deployment]

“DeepSeek Summary: Soumith Chintala emphasizes that NVIDIA's success stems from its software capabilities, not just hardware, challenging common perceptions.”

Soumith Chintala@soumithchintala

MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support

[Infra][Tooling]

“DeepSeek Summary: Soumith Chintala comments on Apple's engineering investment in PyTorch support for MacStudio, highlighting platform-specific development efforts.”

Francois Chollet@fchollet

Back in 2023 everybody was telling me 'no one uses Google search anymore, it's over'

[Evaluation]

“DeepSeek Summary: Challenges the narrative that Google search was obsolete in 2023, suggesting it remained relevant despite claims to the contrary.”

Francois Chollet@fchollet

I think it's clear that for many smaller companies that invested in deep learning, it turned out

[Deployment][Evaluation]

“DeepSeek Summary: Suggests that deep learning investments didn't pay off as expected for many smaller companies, implying practical limitations or implementation challenges.”

Yann LeCun@ylecun

The emergence of superintelligence is not going to be an event. We don't have anything close to a

[Safety][Evaluation]

“DeepSeek Summary: Yann LeCun argues that superintelligence will emerge gradually rather than as a sudden event, suggesting current AI systems are far from achieving it.”

Fei-Fei Li@drfeifei

I often tell my students not to be misled by the name 'artificial intelligence' — there is nothing artificial about it. A.I. is made by humans, intended to be used by humans, and impacts humans.

[Safety]

“DeepSeek Summary: Fei-Fei Li emphasizes that AI is fundamentally human-centric—created by humans, for humans, and affecting human society, challenging the notion of it being purely 'artificial'.”

Max Woolf@minimaxir

LOL

“DeepSeek Summary: A short, humorous tweet expressing amusement.”

Max Woolf@minimaxir

“DeepSeek Summary: A tweet with no visible text content, only view metrics.”

Max Woolf@minimaxir

“DeepSeek Summary: A tweet with no visible text content, only like metrics.”

Sasha Rush@srush_io

Wager established. Jonathan Frankle (@jefrankle) stepped up to my Transformer long bet.

[LLM][Evaluation]

“DeepSeek Summary: Sasha Rush mentions establishing a wager or bet with Jonathan Frankle related to Transformers, indicating engagement in technical debates or predictions about AI models.”

Sasha Rush@srush_io

Some personal news: I recently joined Cursor. Cursor is a small, ambitious team, and they've created

[Tooling][Deployment]

“DeepSeek Summary: Sasha Rush announces joining Cursor, highlighting it as a small, ambitious team working on innovative projects, likely related to AI or developer tools.”

Stas Bekman@stas00

I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to...

[LLM][Fine-tuning]

“DeepSeek Summary: Stas Bekman is compiling comprehensive training logbooks for LLM/VLM models, which serve as valuable reference materials.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can...

[Tooling][Deployment]

“DeepSeek Summary: Collaborative improvement of the Machine Learning Engineering Open book through community contributions.”

Stas Bekman@stas00

To remind - this is the memory saving you get when enabling TiledMLP :) Left: normal memory...

[Infra][Deployment]

“DeepSeek Summary: Demonstrates significant memory savings achieved by enabling TiledMLP in ML systems.”

Stas Bekman@stas00

Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the...

[LLM][Infra][Tooling]

“DeepSeek Summary: Visual representation of PyTorch memory profiling for Llama-8B model, presented as 'modern art'.”

Sayak Paul@sayakpaul

Working at Hugging Face over the past 3.5+ years has allowed me to identify what technical areas truly interest me! In turn, that has allowed me to directly...

[Infra][Tooling]

“DeepSeek Summary: Reflection on career growth at Hugging Face, identifying personal technical interests through hands-on experience.”

Philipp Schmid@philschmid

I read three technical reports from Moonshot AI's Kimi K2.5 paper, Cursor's Composer 2 report and blog post, and Chroma's Context-1 write-up

[LLM][Evaluation]

“DeepSeek Summary: Philipp Schmid is actively reading and engaging with multiple technical reports from leading AI companies, indicating he stays current with industry developments.”

Philipp Schmid@philschmid

Random thought. We are going to be so much faster at creating and building.

[Tooling]

“DeepSeek Summary: A forward-looking, optimistic statement about the accelerating pace of development and creation, likely in the context of AI and software.”

Ethan Mollick@emollick

On the plus side with Opus 4.7, if it does decide to think it produces BY FAR the best

[LLM][Evaluation]

“DeepSeek Summary: Ethan Mollick comments on the performance of Opus 4.7, suggesting that when it engages in reasoning, it significantly outperforms other models.”

Ethan Mollick@emollick

As stories about AI increasingly become stories of either catastrophe or salvation,

[Safety][Deployment]

“DeepSeek Summary: Mollick observes the polarized narrative framing of AI in public discourse, often reduced to extremes of doom or utopia.”

Ethan Mollick@emollick

Teaching an experimental class for MBAs on 'vibefounding,' the students have four days to come up and

[Deployment]

“DeepSeek Summary: Mollick is conducting an experimental MBA course on 'vibefounding,' a rapid, vibe-based startup ideation process with a tight deadline.”

Ethan Mollick@emollick

I pointed Claude Cowork at a set of 107 documents (PPTs, Word docs, Excel) that were initially

[Agent][RAG][Tooling]

“DeepSeek Summary: Mollick is testing Claude Cowork's capability to process and analyze a large, mixed-format document set (107 files).”

Emily M. Bender@emilymbender

EMILY M. BENDER: Yeah. And so passive, like, oops, the moon, the moon went further away. It's like no, actually, you made some decisions.

[Safety][Evaluation]

“DeepSeek Summary: Critiques the passive framing of AI outcomes, emphasizing that human decisions drive technological consequences rather than inevitable natural processes.”

Emily M. Bender@emilymbender

Image is of the 1990s Microsoft writing assistant character Clippy with its eyebrows raised positioned in.

[Tooling][LLM]

“DeepSeek Summary: Uses the nostalgic Clippy character to comment on contemporary AI assistants, suggesting parallels in hype or limitations.”

Emily M. Bender@emilymbender

Facebook (sorry: Meta) AI: Check out our 'AI' that lets you access all of humanity's knowledge.

[LLM][Evaluation]

“DeepSeek Summary: Satirizes Meta's grandiose AI claims by mocking the idea that a single system can encapsulate all human knowledge.”

Naomi Saphra@NaomiSaphra

Waiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account. Accepting ML/NLP PhD students.

[LLM][Agent]

“DeepSeek Summary: Naomi Saphra's X profile bio indicates she's accepting ML/NLP PhD students and describes herself as a 'dedicated grok hate account'.”

Naomi Saphra@NaomiSaphra

I'll meet you at this button.

“DeepSeek Summary: A short, possibly metaphorical tweet about meeting at a button.”

Naomi Saphra@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a

[LLM][Evaluation]

“DeepSeek Summary: Announces she will join Boston University as faculty in 2026 to work on language model interpretability and analysis.”

Naomi Saphra@NaomiSaphra

This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!

“DeepSeek Summary: A tweet commenting on a book about tuberculosis history, noting its surprising connection to the Stetson hat.”

Angela Zhou@angelamczhou

It's uncanny right?

“DeepSeek Summary: A brief, possibly humorous or observational tweet expressing a sense of strangeness or coincidence.”

Angela Zhou@angelamczhou

#throwback coz it's finally the day again!!! #HellOnWheels back on AMC 9/8c tonight!

“DeepSeek Summary: An excited post celebrating the return of the TV show 'Hell on Wheels', using hashtags for throwback and the show.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Agent]

“DeepSeek Summary: Ben Recht is returning to teaching after nearly ten years, focusing on a course about learning and control systems.”

Ben Recht@beenwrekt

With more equations than usual, I explain how policy gradient gives you a framework to randomly search for

[Agent]

“DeepSeek Summary: Explains policy gradient methods as a structured approach to random search in reinforcement learning.”

Ben Recht@beenwrekt

Fully open machine learning requires not only GPU access but a community commitment to openness.

[Infra]

“DeepSeek Summary: Argues that true openness in ML depends on both hardware access and collective dedication to transparency.”

BLOG

Reading today's open-closed performance gap

The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future.

By Nathan Lambert

“The post critiques the oversimplification of AI performance metrics, particularly the 'open-closed performance gap' often reduced to a single number. It argues this gap is shaped by complex, interdependent factors beyond simple comparisons, and explores how these dynamics might evolve with future AI advancements.”

-- END OF LOG --

[STATS] 69 items · Filter applied