Intelligence.Log

2026-04-29

Extracted: 62 items. Sources: GitHub, Bluesky, X, Blogs.

++ AI OVERVIEW ++

Today's AI landscape sees a notable pivot from human-led prompt engineering to automated optimization, with hardmaru sharing research that trains an AI to coax the best performance out of other LLMs—a meta-approach that could redefine how we interact with models. Meanwhile, Marc Lanctot offers a nostalgic counterpoint, documenting his complete playthrough of the classic *Dungeons of Dr. Creep* with detailed level breakdowns and videos, reminding us that AI researchers still find value in retro gaming analysis. The juxtaposition of these posts highlights a broader tension: as the field races toward self-improving systems, there's a parallel appreciation for the human craft of understanding complex systems, whether they be dungeon puzzles or transformer architectures.

grep TOPIC=

grep SOURCE=

sort --by=

GliaX/Stethoscope★ 0.9k▲ 4/10

A research-validated stethoscope whose plans are available Freely and openly. The cost of the entire stethoscope is between $2.5 to $5 to produce

Starred bylucidrains|

“This repository provides open-source plans for a low-cost, research-validated stethoscope that can be produced for $2.5-$5. It aims to democratize medical diagnostics by making hardware freely available.”

BSKY

Marc LanctotApr 29, 01:51 AM

Hello all, I finished another old game: Dungeons of Dr. Creep! Just like last year, I documented each level in a reddit thread and made a few videos of the last level Color Castle. Also discovered one cool fact! 👇 See replies below 👇 #commodore64 #retrogaming www.youtube.com/shorts/wzeTr...

❤️ 2 Likes|

BSKY

hardmaruApr 29, 03:05 AM

For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Thread:

❤️ 12 Likes|[Agent][LLM][Fine-tuning]

BSKY

Simon WillisonApr 29, 07:13 PM

I released LLM 0.32a0 this morning, a major backwards-compatible refactor of my LLM Python library and CLI tool for working with language models - the new changes should help LLM work better with reasoning models and other new frontier capabilities simonwillison.net/2026/Apr/29/...

❤️ 51 Likes|[LLM][Deployment][Tooling]

BSKY

Mark RiedlApr 29, 07:54 PM

Congratulations to Dr. Gennie Mansi, for successfully defending her PhD thesis. Dr. Mansi's work investigates AI in healthcare; how AI impacts legal liability of doctors, how AI design can interfere with delivery of care, and also we might improve the design of AI systems to account for liability

❤️ 16 Likes|[Safety][Tooling]

BSKY

Nathan LambertApr 29, 10:36 AM

Let’s goooooooooo we are capybara’d up, thanks Qwen, keep the models coming

❤️ 55 Likes|[LLM]

BSKY

Ethan MollickApr 29, 09:44 PM

Gemini now can create documents, and it is a nice start, but not up to the frontier yet, as you can see from my "evil buyout of Hogwarts" test. PowerPoints are substantially worse than NotebookLM, spreadsheets are primitive, still no thinking trace, it doesn't think hard enough, either.

❤️ 37 Likes|[LLM][Evaluation]

BSKY

Ethan MollickApr 29, 04:47 PM

One reason I don’t think “judgment” is going to be a distinctly human role in working with AI is that the most recent agentic models have gotten quite good at some types of judgment. You can’t do the kind of high complexity, long-run tasks that current AIs can do without it. From GPT-5.5 guide👇

❤️ 48 Likes|[Agent]

BSKY

Emily M. BenderApr 29, 06:05 PM

Usually, when I get interviewed for a piece on something like "AI consciousness" I am relegated to the skeptics box --- some short paragraph near the end. So it is a nice change to see this piece by @hollybaxter.bsky.social Short 🧵>> www.the-independent.com/tech/ai-news...

❤️ 97 Likes|[Safety]

BSKY

Emily M. BenderApr 29, 03:32 PM

Mystery AI Hype Theater 3000 Ep 76: www.buzzsprout.com/admin/212641... Carmen Maria Machado joins @alexhanna.bsky.social and me to get into the why and how of writing and to soundly ridicule the idea that any of that could or should be automated.

❤️ 16 Likes|[LLM]

BSKY

Emily M. BenderApr 29, 01:31 PM

Also available as video on Peertube: peertube.dair-institute.org/w/tgEjwXf8ST...

❤️ 10 Likes|[Safety]

BSKY

Emily M. BenderApr 29, 01:30 PM

Mystery AI Hype Theater 3000 Ep 76: www.buzzsprout.com/2126417/epis... Carmen Maria Machado joins @alexhanna.bsky.social and me to get into the why and how of writing and to soundly ridicule the idea that any of that could or should be automated.

❤️ 13 Likes|

Andrej Karpathy@karpathy

+1 for "context engineering" over "prompt engineering". When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window

[LLM][Deployment][Tooling]

“DeepSeek Summary: Advocates for 'context engineering' as a more accurate term than 'prompt engineering' for industrial LLM applications.”

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating

[LLM][RAG][Tooling]

“DeepSeek Summary: Using LLMs to create personal knowledge bases for research, shifting focus from code to knowledge manipulation.”

Andrej Karpathy@karpathy

2025 LLM Year in Review By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like "reasoning" to humans - they learn to break down problem solving into intermediate calculations and they learn a number of probl

[LLM][Evaluation][Fine-tuning]

“DeepSeek Summary: LLMs trained with verifiable rewards develop human-like reasoning strategies, breaking down problems into intermediate steps.”

Simon Willison@simonw

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

[LLM][Tooling]

“DeepSeek Summary: Simon Willison notes that AI labs are increasingly focused on improving code generation as a primary goal.”

Simon Willison@simonw

This may be the best guidance I've seen anywhere on writing a really good commit history.

[Tooling]

“DeepSeek Summary: Simon Willison praises guidance on writing good commit history.”

Harrison Chase@hwchase17

In the hot path as the agent is running. The agent can decided to (or the user can prompt it to) update its memory as it is working on the core

[Agent]

“DeepSeek Summary: Agent can update memory during execution based on user prompt or self-decision.”

Harrison Chase@hwchase17

TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this

[Agent][Infra]

“DeepSeek Summary: Agents require sandboxed environments to execute code and access files.”

Harrison Chase@hwchase17

I am not excited about visual workflow builders 1. Not simple enough for the average user

[Tooling][Agent]

“DeepSeek Summary: Visual workflow builders are not simple enough for average users.”

Harrison Chase@hwchase17

Memory actually allows for a better agent building experience. Agent building is very iterative - in large part because you don't know what the

[Agent]

“DeepSeek Summary: Memory improves the iterative process of building agents.”

Harrison Chase@hwchase17

When building agents, you need to iterate on production data much more than when building traditional software. You need to iterate on how

[Agent][Evaluation]

“DeepSeek Summary: Agent development requires more iteration on production data than traditional software.”

Jim Fan@DrJimFan

The first time I met Jensen was also the first time I met @elonmusk. I was interning at OpenAI that day and

[Multi-modal]

“DeepSeek Summary: Jim Fan recalls meeting both Jensen Huang and Elon Musk on the same day during his internship at OpenAI.”

Jim Fan@DrJimFan

Resource constraints are a beautiful thing. Survival instinct in a cut-throat AI competitive land

[Agent]

“DeepSeek Summary: Jim Fan reflects on how resource constraints drive innovation in the competitive AI landscape.”

Jim Fan@DrJimFan

I've been a bit quiet on X recently. The past year has been a transformational experience.

[Agent]

“DeepSeek Summary: Jim Fan explains his recent silence on X due to a transformative year.”

Jim Fan@DrJimFan

It gives me a lot of comfort knowing that we are the last generation without advanced robots everywhere.

[Agent]

“DeepSeek Summary: Jim Fan expresses comfort in being part of the last generation before ubiquitous advanced robots.”

Jim Fan@DrJimFan

Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild

[LLM]

“DeepSeek Summary: Jim Fan comments on the hype around 'vibe coding' and shares his own anxiety.”

Jeremy Howard@jeremyphoward

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in

[LLM][Safety]

“DeepSeek Summary: Jeremy Howard replicated a finding that Grok AI focuses heavily on determining Elon Musk's opinions.”

Jeremy Howard@jeremyphoward

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks.

[LLM][Safety][Evaluation]

“DeepSeek Summary: Jeremy Howard shared a video showing Grok searching for Elon Musk's opinion on Israel/Palestine before answering.”

Soumith Chintala@soumithchintala

reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM]

“DeepSeek Summary: Recommends reading 'AI News' as a high-leverage use of time.”

Soumith Chintala@soumithchintala

Open LLMs need to get organized and co-ordinated about sharing human feedback.

[LLM][Fine-tuning]

“DeepSeek Summary: Calls for open LLM community to coordinate on sharing human feedback.”

Soumith Chintala@soumithchintala

MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support

[Infra]

“DeepSeek Summary: Comments on Apple Engineering's time spent on PyTorch support, likely in response to a question about MacStudio.”

Soumith Chintala@soumithchintala

Sometimes we forget that NVIDIA wins because it's a software company.

[Infra]

“DeepSeek Summary: Points out that NVIDIA's success is due to its software, not just hardware.”

Francois Chollet@fchollet

I think it's clear that for many smaller companies that invested in deep learning, it turned out

[LLM][Deployment]

“DeepSeek Summary: Chollet notes that deep learning investments didn't pay off for many smaller companies.”

Francois Chollet@fchollet

Folks who work in AI or software engineering feel like the world is changing exponential fast.

[Agent]

“DeepSeek Summary: Chollet observes that AI/software engineers perceive rapid exponential change in the world.”

Fei-Fei Li@drfeifei

Very excited to share @theworldlabs 's latest research work RTFM!! It's a real-time, ...

[Multi-modal][Agent]

“DeepSeek Summary: Fei-Fei Li announces a real-time research work from World Labs called RTFM.”

Clem Delangue@ClementDelangue

https://t.co/4CQthIKm8F

“DeepSeek Summary: Tweet contains only a link with no additional text.”

Clem Delangue@ClementDelangue

Great research on open-source by. : - $4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created) - Companies would need to spend 3.5 times more on software than they currently do

[Infra]

“DeepSeek Summary: Highlights the massive ROI of open-source investment: $1 yields $2,000 in value.”

Max Woolf@minimaxir

Max Woolf (@minimaxir). 19 likes.

“DeepSeek Summary: Tweet received 19 likes.”

Max Woolf@minimaxir

congrats to OpenAI on winning the Turing Test

[Evaluation]

“DeepSeek Summary: Max Woolf congratulates OpenAI on winning the Turing Test.”

Stas Bekman@stas00

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

[Infra][Fine-tuning]

“DeepSeek Summary: Encourages trying DeepSpeed ZeRO++ as it may now work on master branch.”

Stas Bekman@stas00

Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

[Infra][Evaluation]

“DeepSeek Summary: Introduces a new performance metric for matrix multiplication.”

Stas Bekman@stas00

If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

[Infra][Tooling]

“DeepSeek Summary: Warns about a common issue with FA4 and cutlass.cute loading.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

[Tooling]

“DeepSeek Summary: Acknowledges contribution to the Machine Learning Engineering Open Book.”

Sayak Paul@sayakpaul

Had a nice time chatting about the state of diffusion models and some text-to-image data shenanigans at

[Multi-modal]

“DeepSeek Summary: Sayak discussed diffusion models and text-to-image data issues.”

Sayak Paul@sayakpaul

For me, it was Keras among other things that inspired me to take up deep learning as a potential

[Tooling]

“DeepSeek Summary: Sayak credits Keras for inspiring his deep learning journey.”

Sayak Paul@sayakpaul

Working at Hugging Face over the past 3.5+ years has allowed me to identify what technical areas truly interest me! In turn, that has allowed me to directly

[Infra]

“DeepSeek Summary: Sayak reflects on how his role at Hugging Face helped clarify his technical interests.”

Philipp Schmid@philschmid

Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs!

[Multi-modal][Infra][RAG]

“DeepSeek Summary: Gemini Embedding 2 is now generally available, supporting text, image, video, audio, and PDF embeddings in a single model.”

Philipp Schmid@philschmid

Excited to introduce the Gemini Interactions API, a unified interface for Gemini models and agents. Starting today with Gemini Deep Research Agent. - Unifies access to models and agents via a single RESTful endpoint. - Access Gemini Deep Research agent via API.

[Agent][Infra][LLM]

“DeepSeek Summary: Introducing the Gemini Interactions API, a unified RESTful endpoint for models and agents, starting with the Gemini Deep Research Agent.”

Ethan Mollick@emollick

I pointed Claude Cowork at a set of 107 documents (PPTs, Word docs, Excel) that were initially

[Agent][LLM][Deployment]

“DeepSeek Summary: Ethan tested Claude Cowork on a large set of documents and shared initial results.”

Ethan Mollick@emollick

On the plus side with Opus 4.7, if it does decide to think it produces BY FAR the best

[LLM][Evaluation]

“DeepSeek Summary: Ethan notes that Opus 4.7 produces the best results when it decides to think.”

Ethan Mollick@emollick

We are starting to see some nuanced discussions of what it means to work with advanced AI In this

[Safety][Deployment]

“DeepSeek Summary: Ethan observes emerging nuanced discussions about working with advanced AI.”

Ethan Mollick@emollick

Very cool analysis of the submissions to a major management journal that shows how much the

[Evaluation][LLM]

“DeepSeek Summary: Ethan shares analysis of submissions to a management journal, showing AI's impact.”

Emily M. Bender@emilymbender

EMILY M. BENDER: Yeah. And so passive, like, oops, the moon, the moon went further away. It's like no, actually, you made some decisions.

[Safety]

“DeepSeek Summary: Bender critiques the passive language used to describe AI outcomes, emphasizing that decisions were made by people, not inevitable.”

Emily M. Bender@emilymbender

Image is of the 1990s Microsoft writing assistant character Clippy with its eyebrows raised positioned in.

[LLM]

“DeepSeek Summary: Bender shares an image of Clippy, likely to critique or satirize AI assistants.”

Emily M. Bender@emilymbender

Facebook (sorry: Meta) AI: Check out our "AI" that lets you access all of humanity's knowledge.

[LLM]

“DeepSeek Summary: Bender sarcastically quotes Meta's claim about AI accessing all human knowledge, implying skepticism.”

Naomi Saphra@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

[Safety]

“DeepSeek Summary: Sarcastic comment about using images of oneself for scientific discourse.”

Naomi Saphra@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

[LLM]

“DeepSeek Summary: Announcement of starting as faculty at Boston University in 2026.”

Ben Recht@beenwrekt

And awesome to see many Berkeley alums thriving here. @LaurentLessard, @DimitrisPapail, and Shivaram

[Evaluation]

“DeepSeek Summary: Ben Recht acknowledges Berkeley alumni success at an event.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Evaluation]

“DeepSeek Summary: Ben Recht announces teaching a class on learning and control after a long hiatus.”

Ben Recht@beenwrekt

Everyone knows actions are fundamentally different than predictions, but it's hard to write this

[Evaluation]

“DeepSeek Summary: Ben Recht reflects on the distinction between actions and predictions in machine learning.”

BLOG

LLM 0.32a0 is a major backwards-compatible refactor

<p>I just released <a href="https://llm.datasette.io/en/latest/changelog.html#a0-2026-04-28">LLM 0.32a0</a>, an alpha release of my <a href="https://llm.datasette.io/">LLM</a> Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a...

By Simon Willison

“LLM 0.32a0 is a major refactor that prioritizes backwards compatibility while introducing significant internal changes for future extensibility. The alpha release aims to stabilize new APIs and data structures, allowing plugin authors to adapt before the stable release.”

-- END OF LOG --

[STATS] 62 items · Filter applied