Intelligence.Log

2026-04-20

Extracted: 69 items. Sources: GitHub, Bluesky, X, Blogs.
++ AI OVERVIEW ++
Today's discussions highlight the evolving landscape of AI model efficiency and strategic release cycles. Simon Willison's analysis reveals significant token usage increases in Anthropic's latest Opus model, a critical cost consideration for developers. His parallel work on bridging Datasette and Google Sheets showcases the ongoing push for practical data tool integration. Meanwhile, Ethan Mollick reflects on the competitive dynamics spurred by open models, suggesting OpenAI's staged "Reasoner" releases were a strategic response to community innovation.
grep TOPIC=
grep SOURCE=
sort --by=
GH

Self-healing browser harness that enables LLMs to complete any task.

Starred bythomwolf|[Agent][Tooling]
A self-healing browser harness that enables LLMs to autonomously complete web-based tasks with built-in error recovery. It provides robust automation capabilities for AI agents interacting with dynamic web environments.
GH
vega/vega11.8k5/10

A visualization grammar.

Starred byminimaxir|[Tooling]
Vega provides a declarative visualization grammar for creating interactive graphics through JSON specifications. It enables data-driven visualizations that work across multiple rendering targets (SVG, Canvas) with a consistent API.
GH
chrislgarry/Apollo-1167.4k3/10

Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.

Starred bysimonw|
This repository contains the original Apollo 11 Guidance Computer source code, providing authentic historical documentation of the software that landed humans on the moon. It offers a rare look at 1960s-era assembly programming for mission-critical aerospace systems.
BSKY
simonwillison.netSimon Willison

New TIL on fetching data from a Datasette instance into Google Sheets using importdata(), named custom functions or Google Apps Script til.simonwillison.net/google-sheet...

❤️ 16 Likes|[Tooling][Deployment]
BSKY
simonwillison.netSimon Willison

I upgraded my Claude token counter tool to compare different models and Opus 4.7 appears to use 1.46x times the tokens for text and up to 3x the tokens for images - it's priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump simonwillison.net/2026/Apr/20/...

❤️ 91 Likes|[LLM][Evaluation]
BSKY
sharky6000.bsky.socialMarc Lanctot

@canadiens.com win game 1 against Tampa Bay!! 🤩🙌 Slafkovsky with the game-winning goal in OT and a hat trick! He had a great game! Anderson too! What an exciting game overall! #gohabsgo #montreal #canadiens youtu.be/m4ic4oYfapY?...

❤️ 3 Likes|
BSKY
emollick.bsky.socialEthan Mollick

The imaginary optimal selfish scenario for OpenAI, in retrospect, was to keep Reasoners a secret, skip releasing o1 and o1-preview, and release o3 as GPT-5 There would have been no Deep Seek moment, other labs may not have discovered Reasoners quickly, and OpenAI's lead would have been hard to beat

❤️ 14 Likes|[LLM][Deployment]
BSKY
markriedl.bsky.socialMark Riedl

You're welcome

❤️ 5 Likes|
BSKY
markriedl.bsky.socialMark Riedl

Congratulations to @upolehsan.bsky.social, who won the Georgia Tech College of Computing Doctoral Dissertation Award. The impact of his work on human-centered explainable AI (XAI) cannot be understated. The last chapter of Upol's dissertation also just won an Honorable Mention at CHI. #ProudAdvisor

❤️ 14 Likes|[Safety]
BSKY
sharky6000.bsky.socialMarc Lanctot

@timhortonsofficial.bsky.social gets it! 🏒🇨🇦☕️😁 #gohabsgo

❤️ 4 Likes|
BSKY
natolambert.bsky.socialNathan Lambert

A TLDR is that unless the training dynamics of leading LLMs change or open model builders run out of money, this ~6 month performance gap from closed to open models is here to stay. www.interconnects.ai/p/reading-to...

❤️ 15 Likes|[LLM][Evaluation]
BSKY
hardmaru.bsky.socialhardmaru

Getting LLMs to simulate “true” randomness or generate diverse outputs is surprisingly difficult. We found a simple prompting trick that solves this by having the model generate and manipulate a random string. To be presented at #ICLR2026 this week! Blog: pub.sakana.ai/ssot

❤️ 20 Likes|[LLM][Evaluation]
BSKY
hardmaru.bsky.socialhardmaru

I am very proud of our team for releasing EDINET-Bench, and it is fantastic to see a Japanese financial dataset recognized at #ICLR2026 this week. We need more diverse, non-English datasets to evaluate models in the real world. Paper: openreview.net/forum?id=Dxn...

❤️ 15 Likes|[Evaluation]
BSKY
yoshuabengio.bsky.socialYoshua Bengio

Je suis passé à Découverte de @cbcradiocanada.bsky.social pour discuter des risques de l’IA, des raisons scientifiques qui expliquent certains des comportements inquiétants des modèles, et des solutions techniques sur lesquelles nous travaillons à @law-zero.bsky.social pour une IA plus sécuritaire.

❤️ 10 Likes|[Safety]
BSKY
emollick.bsky.socialEthan Mollick

Classic study gave 146 economist teams the same dataset & got wildly different answers New paper reruns it with agentic AI. Claude Code & Codex land near the human median but with far tighter dispersion & no extremes This suggests that agentic AI is now useful for doing scalable economics research

❤️ 54 Likes|[Agent][Evaluation]
BSKY
emilymbender.bsky.socialEmily M. Bender

Wait, how did this get into my feed without @hypervisible.blacksky.app already quoting it

❤️ 16 Likes|
BSKY
emilymbender.bsky.socialEmily M. Bender

If your ostensibly critical paper talks about "recent advances in AI" I have a hard time taking it seriously. Advances towards what? Measured how?

❤️ 67 Likes|[Evaluation]
BSKY
emilymbender.bsky.socialEmily M. Bender

Today!

❤️ 11 Likes|[Safety]
BSKY
beenwrekt.bsky.socialBen Recht

Identifying the elements of a theory of engineering architecture.

❤️ 5 Likes|[Infra][Deployment]
X
2025 LLM Year in Review
[LLM][Evaluation]
“DeepSeek Summary: Karpathy published a review article summarizing key developments and trends in the LLM field for the year 2025.
X
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in
[Agent][LLM][Tooling]
“DeepSeek Summary: Karpathy's coding workflow has dramatically shifted from mostly manual coding to predominantly using AI agents for code generation, with human input reduced to editing and touch-ups.
X
I built two new tools to help coding agents demonstrate their work beyond just running
[Agent][Tooling]
“DeepSeek Summary: Simon Willison developed new tools to improve how coding agents showcase their work processes and outputs.
X
hwchase17Harrison Chase
This means that operations you would do on code in the software world, you now do on traces in the agent world. Debugging, testing, profiling
[Agent][Evaluation][Tooling]
“DeepSeek Summary: Drawing parallels between software engineering practices and agent operations, emphasizing trace-based debugging, testing, and profiling.
X
hwchase17Harrison Chase
TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this
[Agent][Infra][Deployment]
“DeepSeek Summary: Advocating for agent workspaces or sandboxes as essential infrastructure for running code, managing dependencies, and accessing files.
X
hwchase17Harrison Chase
RT @samecrowder: as always, it's an exciting time to be working at LangChain!
[Agent][Tooling]
“DeepSeek Summary: Retweeting a positive sentiment about working at LangChain, indicating endorsement of the message.
X
DrJimFanJim Fan
In this context, I define world modeling as predicting the next plausible world state (or a longer duration of states) conditioned on an action.
[Agent]
“DeepSeek Summary: Defines world modeling as predicting future world states based on actions. Focuses on the relationship between actions and state transitions in AI systems.
X
DrJimFanJim Fan
I've been a bit quiet on X recently. The past year has been a transformational experience.
“DeepSeek Summary: Acknowledges a period of reduced public posting, attributing it to a significant personal or professional transformation over the past year.
X
DrJimFanJim Fan
We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly
“DeepSeek Summary: Observes that a non-US company is now upholding OpenAI's founding mission, suggesting a shift in the AI leadership landscape.
X
DrJimFanJim Fan
It gives me a lot of comfort knowing that we are the last generation without advanced robots everywhere.
“DeepSeek Summary: Expresses a personal sense of comfort from being part of the final human generation before ubiquitous advanced robotics.
X
jeremyphowardJeremy Howard
I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in
[Agent][LLM][Evaluation]
“DeepSeek Summary: Jeremy Howard replicated a finding that Grok AI heavily prioritizes discovering Elon Musk's opinions.
X
jeremyphowardJeremy Howard
Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks.
[Agent][LLM][Evaluation]
“DeepSeek Summary: Howard demonstrates that when asked about a complex geopolitical issue, Grok's first action is to search for Elon Musk's perspective on Twitter.
X
jeremyphowardJeremy Howard
Something that drives me to distraction in discussion of AI alignment: someone will say 'Oh, it's crucial we build systems with properties X'
[Safety][LLM]
“DeepSeek Summary: Jeremy Howard expresses frustration with vague, non-actionable statements in AI alignment debates.
X
soumithchintalaSoumith Chintala
reading 'AI News' (previously Smol Talk) is probably the highest-leverage 45 mins
[LLM][Tooling]
“DeepSeek Summary: Soumith Chintala recommends 'AI News' (formerly Smol Talk) as a highly valuable 45-minute activity for staying informed about AI developments.
X
soumithchintalaSoumith Chintala
Sometimes we forget that NVIDIA wins because it's a software company.
[Infra][Deployment]
“DeepSeek Summary: Soumith Chintala emphasizes that NVIDIA's success stems from its software capabilities, not just hardware, challenging common perceptions.
X
soumithchintalaSoumith Chintala
MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support
[Infra][Tooling]
“DeepSeek Summary: Soumith Chintala comments on Apple's engineering investment in PyTorch support for MacStudio, highlighting platform-specific development efforts.
X
Back in 2023 everybody was telling me 'no one uses Google search anymore, it's over'
[Evaluation]
“DeepSeek Summary: Challenges the narrative that Google search was obsolete in 2023, suggesting it remained relevant despite claims to the contrary.
X
I think it's clear that for many smaller companies that invested in deep learning, it turned out
[Deployment][Evaluation]
“DeepSeek Summary: Suggests that deep learning investments didn't pay off as expected for many smaller companies, implying practical limitations or implementation challenges.
X
y
Yann LeCun
The emergence of superintelligence is not going to be an event. We don't have anything close to a
[Safety][Evaluation]
“DeepSeek Summary: Yann LeCun argues that superintelligence will emerge gradually rather than as a sudden event, suggesting current AI systems are far from achieving it.
X
d
Fei-Fei Li
I often tell my students not to be misled by the name 'artificial intelligence' — there is nothing artificial about it. A.I. is made by humans, intended to be used by humans, and impacts humans.
[Safety]
“DeepSeek Summary: Fei-Fei Li emphasizes that AI is fundamentally human-centric—created by humans, for humans, and affecting human society, challenging the notion of it being purely 'artificial'.
X
minimaxirMax Woolf
LOL
“DeepSeek Summary: A short, humorous tweet expressing amusement.
X
minimaxirMax Woolf
“DeepSeek Summary: A tweet with no visible text content, only view metrics.
X
minimaxirMax Woolf
“DeepSeek Summary: A tweet with no visible text content, only like metrics.
X
srush_ioSasha Rush
Wager established. Jonathan Frankle (@jefrankle) stepped up to my Transformer long bet.
[LLM][Evaluation]
“DeepSeek Summary: Sasha Rush mentions establishing a wager or bet with Jonathan Frankle related to Transformers, indicating engagement in technical debates or predictions about AI models.
X
srush_ioSasha Rush
Some personal news: I recently joined Cursor. Cursor is a small, ambitious team, and they've created
[Tooling][Deployment]
“DeepSeek Summary: Sasha Rush announces joining Cursor, highlighting it as a small, ambitious team working on innovative projects, likely related to AI or developer tools.
X
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to...
[LLM][Fine-tuning]
“DeepSeek Summary: Stas Bekman is compiling comprehensive training logbooks for LLM/VLM models, which serve as valuable reference materials.
X
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can...
[Tooling][Deployment]
“DeepSeek Summary: Collaborative improvement of the Machine Learning Engineering Open book through community contributions.
X
To remind - this is the memory saving you get when enabling TiledMLP :) Left: normal memory...
[Infra][Deployment]
“DeepSeek Summary: Demonstrates significant memory savings achieved by enabling TiledMLP in ML systems.
X
Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the...
[LLM][Infra][Tooling]
“DeepSeek Summary: Visual representation of PyTorch memory profiling for Llama-8B model, presented as 'modern art'.
X
sayakpaulSayak Paul
Working at Hugging Face over the past 3.5+ years has allowed me to identify what technical areas truly interest me! In turn, that has allowed me to directly...
[Infra][Tooling]
“DeepSeek Summary: Reflection on career growth at Hugging Face, identifying personal technical interests through hands-on experience.
X
philschmidPhilipp Schmid
I read three technical reports from Moonshot AI's Kimi K2.5 paper, Cursor's Composer 2 report and blog post, and Chroma's Context-1 write-up
[LLM][Evaluation]
“DeepSeek Summary: Philipp Schmid is actively reading and engaging with multiple technical reports from leading AI companies, indicating he stays current with industry developments.
X
philschmidPhilipp Schmid
Random thought. We are going to be so much faster at creating and building.
[Tooling]
“DeepSeek Summary: A forward-looking, optimistic statement about the accelerating pace of development and creation, likely in the context of AI and software.
X
e
Ethan Mollick
On the plus side with Opus 4.7, if it does decide to think it produces BY FAR the best
[LLM][Evaluation]
“DeepSeek Summary: Ethan Mollick comments on the performance of Opus 4.7, suggesting that when it engages in reasoning, it significantly outperforms other models.
X
e
Ethan Mollick
As stories about AI increasingly become stories of either catastrophe or salvation,
[Safety][Deployment]
“DeepSeek Summary: Mollick observes the polarized narrative framing of AI in public discourse, often reduced to extremes of doom or utopia.
X
e
Ethan Mollick
Teaching an experimental class for MBAs on 'vibefounding,' the students have four days to come up and
[Deployment]
“DeepSeek Summary: Mollick is conducting an experimental MBA course on 'vibefounding,' a rapid, vibe-based startup ideation process with a tight deadline.
X
e
Ethan Mollick
I pointed Claude Cowork at a set of 107 documents (PPTs, Word docs, Excel) that were initially
[Agent][RAG][Tooling]
“DeepSeek Summary: Mollick is testing Claude Cowork's capability to process and analyze a large, mixed-format document set (107 files).
X
e
Emily M. Bender
EMILY M. BENDER: Yeah. And so passive, like, oops, the moon, the moon went further away. It's like no, actually, you made some decisions.
[Safety][Evaluation]
“DeepSeek Summary: Critiques the passive framing of AI outcomes, emphasizing that human decisions drive technological consequences rather than inevitable natural processes.
X
e
Emily M. Bender
Image is of the 1990s Microsoft writing assistant character Clippy with its eyebrows raised positioned in.
[Tooling][LLM]
“DeepSeek Summary: Uses the nostalgic Clippy character to comment on contemporary AI assistants, suggesting parallels in hype or limitations.
X
e
Emily M. Bender
Facebook (sorry: Meta) AI: Check out our 'AI' that lets you access all of humanity's knowledge.
[LLM][Evaluation]
“DeepSeek Summary: Satirizes Meta's grandiose AI claims by mocking the idea that a single system can encapsulate all human knowledge.
X
N
Naomi Saphra
Waiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account. Accepting ML/NLP PhD students.
[LLM][Agent]
“DeepSeek Summary: Naomi Saphra's X profile bio indicates she's accepting ML/NLP PhD students and describes herself as a 'dedicated grok hate account'.
X
N
Naomi Saphra
I'll meet you at this button.
“DeepSeek Summary: A short, possibly metaphorical tweet about meeting at a button.
X
N
Naomi Saphra
Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a
[LLM][Evaluation]
“DeepSeek Summary: Announces she will join Boston University as faculty in 2026 to work on language model interpretability and analysis.
X
N
Naomi Saphra
This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!
“DeepSeek Summary: A tweet commenting on a book about tuberculosis history, noting its surprising connection to the Stetson hat.
X
a
Angela Zhou
It's uncanny right?
“DeepSeek Summary: A brief, possibly humorous or observational tweet expressing a sense of strangeness or coincidence.
X
a
Angela Zhou
#throwback coz it's finally the day again!!! #HellOnWheels back on AMC 9/8c tonight!
“DeepSeek Summary: An excited post celebrating the return of the TV show 'Hell on Wheels', using hashtags for throwback and the show.
X
b
Ben Recht
For the first time in almost a decade, I'm teaching a class on learning and control.
[Agent]
“DeepSeek Summary: Ben Recht is returning to teaching after nearly ten years, focusing on a course about learning and control systems.
X
b
Ben Recht
With more equations than usual, I explain how policy gradient gives you a framework to randomly search for
[Agent]
“DeepSeek Summary: Explains policy gradient methods as a structured approach to random search in reinforcement learning.
X
b
Ben Recht
Fully open machine learning requires not only GPU access but a community commitment to openness.
[Infra]
“DeepSeek Summary: Argues that true openness in ML depends on both hardware access and collective dedication to transparency.
BLOG

The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future.

The post critiques the oversimplification of AI performance metrics, particularly the 'open-closed performance gap' often reduced to a single number. It argues this gap is shaped by complex, interdependent factors beyond simple comparisons, and explores how these dynamics might evolve with future AI advancements.
-- END OF LOG --
[STATS] 69 items · Filter applied
Powered by Horizon + DeepSeek