Intelligence.Log

2026-04-25

Extracted: 51 items. Sources: GitHub, Bluesky, X.

++ AI OVERVIEW ++

Today’s discourse centers on the operational challenges of multi-agent AI systems, with Ethan Mollick pinpointing organizational design and collaborative benchmarking as the next "critical frontier" for enterprise value. Meanwhile, Emily M. Bender introduces the term "demythifying" from a review of *The AI Con*, signaling a continued pushback against AI hype. On GitHub, repositories focused on agent orchestration frameworks and evaluation toolkits saw a surge in stars, reflecting the community’s pivot from single-model capabilities to managing agent swarms at scale. The tension between scaling agentic systems and maintaining rigorous, myth-busting critique remains the dominant theme of the day.

◆ Signal

Co-Starred · Last 7 days

Repos independently starred by multiple AI leaders in the week ending 2026-04-25. Stronger signal = more overlap.

huggingface/ml-intern

×2 starrers▲ 7/10★ 688

🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models

by:cfahlgren1 pcuenca

[Agent][LLM][Tooling]

grep TOPIC=

grep SOURCE=

sort --by=

ROCm/FlyDSL★ 0.2k▲ 4/10

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Starred bytridao|[Tooling]

“FlyDSL is a Python front-end for a flexible layout DSL, enabling dynamic and customizable UI layouts. It focuses on simplifying layout design for Python applications.”

BSKY

Ethan MollickApr 25, 12:06 AM

Organizational design for agents is hard, benchmarking agents working in concert is hard. Together, this is the next critical frontier for making AI matter in large-scale valuable tasks, and we really don’t know very much about it. www.strangeloopcanon.com/p/when-align...

❤️ 20 Likes|[Agent][Evaluation]

BSKY

Emily M. BenderApr 25, 03:22 AM

Favorite new to me word, from a review of The AI Con: demythifying.

❤️ 21 Likes|[Safety]

BSKY

Simon WillisonApr 25, 04:46 PM

I think ChatGPT Images 2.0 deciding to add a "WHY ARE YOU LIKE THIS" sign to the background of this image is the first time I've felt a glimpse of AGI simonwillison.net/2026/Apr/25/...

❤️ 214 Likes|[Multi-modal]

BSKY

Ethan MollickApr 25, 07:28 PM

If you believe that AI is going to have a big impact on work and life, the only real tool for mitigating bad impacts and channeling usage for good will be government policy And that policy will necessarily be very complicated: AI will impact employment & healthcare & education & etc. differently

❤️ 65 Likes|[Safety]

BSKY

Ethan MollickApr 25, 03:14 PM

I think that academia has not absorbed the fact that AI agents are now good enough to independently reconstruct complex papers without access to code or the papers themselves; just the methods & data. They aren’t perfect but the errors are often in the human paper, not the AI making a mistake.

❤️ 88 Likes|[Agent][Evaluation]

BSKY

angela zhouApr 25, 09:05 PM

im just a rat that types and writes papers (cough revises) at verve coffee

❤️ 4 Likes|

Andrej Karpathy@karpathy

Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :) I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded

[Agent][Tooling]

“DeepSeek Summary: Karpathy bought a Mac mini to experiment with 'claws' (likely a typo for 'Claude' or 'Claw' agent), noting that Apple Store staff said they are selling well and customers are confused. He is cautious about running OpenClaw due to security concerns with vibe-coded code.”

Andrej Karpathy@karpathy

2025 LLM Year in Review

[LLM]

“DeepSeek Summary: Karpathy posted a summary of LLM developments in 2025, likely reflecting on key trends and milestones.”

Andrej Karpathy@karpathy

Very interested in what the coming era of highly bespoke software ... Example from this morning - I've become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over https://t.co/EDULdIpWmE

[Tooling]

“DeepSeek Summary: Karpathy expresses interest in bespoke software and shares a personal experiment to lower his resting heart rate using a structured approach.”

Simon Willison@simonw

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

[Agent][Tooling]

“DeepSeek Summary: Simon suggests that a key skill with coding agents is knowing when to step back.”

Simon Willison@simonw

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

[Agent][Evaluation]

“DeepSeek Summary: Simon defines 'vibe coding' as irresponsible software development.”

Harrison Chase@hwchase17

im excited about agent harnesses because i think are the first stable agent abstractions we can build on top (which is why we're investing so much in deepagents) we always wanted to run llms in a loop and have them call tools (remember autoGPT? that's all that was) but the

[Agent][Infra][Tooling]

“DeepSeek Summary: Agent harnesses provide stable abstractions for building agent loops with tool calling, a key evolution from early attempts like AutoGPT.”

Harrison Chase@hwchase17

This means that operations you would do on code in the software world, you now do on traces in the agent world. Debugging, testing, profiling

[Evaluation][Deployment][Agent]

“DeepSeek Summary: Traces in agent systems replace code as the primary artifact for debugging, testing, and profiling.”

Harrison Chase@hwchase17

TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this

[Infra][Safety][Agent]

“DeepSeek Summary: Agents require sandboxed workspaces to execute code and access resources safely.”

Harrison Chase@hwchase17

When you ship traditional software to production, you have a good sense of what to expect. Users click buttons, fill out forms,

[Deployment][Agent]

“DeepSeek Summary: Traditional software deployment has predictable user interactions, unlike agent systems.”

Jim Fan@DrJimFan

I've been a bit quiet on X recently. The past year has been a transformational experience.

[Agent]

“DeepSeek Summary: Jim Fan acknowledges his recent silence on X and describes the past year as transformational.”

Jeremy Howard@jeremyphoward

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in

[Safety][Evaluation]

“DeepSeek Summary: Jeremy Howard replicated a finding that Grok focuses almost entirely on determining Elon Musk's thoughts.”

Jeremy Howard@jeremyphoward

Absolutely any time I try to explore something even slightly against commonly accepted beliefs,

[Safety][Evaluation]

“DeepSeek Summary: Jeremy Howard notes that exploring ideas against commonly accepted beliefs is met with resistance.”

Soumith Chintala@soumithchintala

reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM]

“DeepSeek Summary: Soumith recommends reading 'AI News' as a high-leverage activity.”

Francois Chollet@fchollet

I think it's clear that for many smaller companies that invested in deep learning, it turned out

[Evaluation]

“DeepSeek Summary: Deep learning investments may not have paid off for smaller companies.”

Francois Chollet@fchollet

Folks who work in AI or software engineering feel like the world is changing exponential fast.

[Agent]

“DeepSeek Summary: AI and software engineers perceive rapid exponential change in the world.”

David Ha@hardmaru

Don't miss David Ha @hardmaru's keynote at @ALifeConf #ALIFE2021 on "World Models and Attention for Reinforcement Learning"!

[Agent][Multi-modal]

“DeepSeek Summary: David Ha is giving a keynote on world models and attention for reinforcement learning at ALIFE 2021.”

David Ha@hardmaru

It's spectacular to have followed David Ha's (@hardmaru) incredible career arc —MD of Fixed Income at Goldman Sachs —restarted his career

[Agent]

“DeepSeek Summary: David Ha transitioned from a managing director at Goldman Sachs to a career in AI research.”

Yann LeCun@ylecun

It seems to me that before "urgently figuring out how to control AI systems much smarter than us" we need

[Safety]

“DeepSeek Summary: LeCun questions the urgency of controlling superintelligent AI, implying such systems don't exist yet.”

Yann LeCun@ylecun

An A.I. Pioneer Warns the Tech 'Herd' Is Marching Into a Dead End. www.nytimes.com.

[LLM]

“DeepSeek Summary: LeCun shares a NYT article warning that the AI field is heading in the wrong direction.”

Yann LeCun@ylecun

The emergence of superintelligence is not going to be an event. We don't have anything close to a

[Safety]

“DeepSeek Summary: LeCun argues superintelligence will not appear suddenly and we are far from it.”

Fei-Fei Li@drfeifei

Very excited to share @theworldlabs 's latest research work RTFM!! It's a real-time, ...

[Multi-modal]

“DeepSeek Summary: Fei-Fei Li announces World Labs' RTFM research, a real-time 3D world generation model.”

Max Woolf@minimaxir

me irl

[Tooling]

“DeepSeek Summary: Max Woolf posted a self-referential meme 'me irl'.”

Sasha Rush@srush_io

On the infra side, composer 2 uses CP. This is (i think?) the first real detail from using CP on MLA. My understanding is that each rank first computes the compressed KVs, all gather this compressed latents. while the all gather is in flight, compute the Q proj

[Infra][LLM]

“DeepSeek Summary: Sasha discusses infrastructure details of composer 2 using CP (context parallelism) on MLA, describing the process of computing compressed KVs and all-gathering latents.”

Sasha Rush@srush_io

⛏️

[LLM]

“DeepSeek Summary: A single pickaxe emoji, possibly indicating a mining or digging metaphor.”

Sasha Rush@srush_io

“DeepSeek Summary: No text content available from search snippet.”

Stas Bekman@stas00

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

[Infra][Fine-tuning]

“DeepSeek Summary: Stas Bekman notes that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.”

Stas Bekman@stas00

Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

[Evaluation][Infra]

“DeepSeek Summary: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating matrix multiplication efficiency.”

Stas Bekman@stas00

If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

[Tooling][Infra]

“DeepSeek Summary: Stas Bekman warns about a common issue with FA4 (Flash Attention 4) involving cutlass.cute loading.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

[Tooling]

“DeepSeek Summary: Stas Bekman thanks a contributor for improving the Machine Learning Engineering Open Book.”

Sayak Paul@sayakpaul

Install `diffusers` from source and start using Kontext from @bfl_ml 🧨 Use your favorite optims, too :) Training is also supported (@linoy_tsaban and yours truly) 🤗

[Multi-modal][Fine-tuning][Deployment]

“DeepSeek Summary: Announces support for Kontext from Black Forest Labs in diffusers, with training support.”

Sayak Paul@sayakpaul

Release notes: Release Diffusers 0.34.0: New Image and Video Models, Better torch.

[Multi-modal][Deployment][Infra]

“DeepSeek Summary: Announces Diffusers 0.34.0 release with new image and video models and torch improvements.”

Philipp Schmid@philschmid

Guide: ReAct agent from scratch with Gemini 2.5 and LangGraph | Gemini API | Google AI for Developers. ai.google.dev.

[Agent][LLM]

“DeepSeek Summary: Philipp Schmid published a guide on building a ReAct agent from scratch using Gemini 2.5 and LangGraph.”

Ethan Mollick@emollick

AI is actually pretty good at ideas as well.

[LLM]

“DeepSeek Summary: Ethan Mollick notes that AI can generate good ideas, challenging the notion that creativity is exclusively human.”

Ethan Mollick@emollick

[Evaluation]

“DeepSeek Summary: Mollick reflects on viral AI content, noting that fabricated graphs gained high engagement.”

Ethan Mollick@emollick

So much work is going into faking continual learning and memory for AIs,

[LLM][Fine-tuning]

“DeepSeek Summary: Mollick criticizes efforts to simulate continuous learning and memory in AI models.”

Ethan Mollick@emollick

If it helps, I teach at a business school & many of my smartest students are hired by funds because they can reliably turn their only-human

[Deployment]

“DeepSeek Summary: Mollick notes that human judgment remains valuable, as his students are hired for their unique human skills.”

Emily M. Bender@emilymbender

@kohntom A synthetic text extruding machine is not well-matched to any application where the accuracy of the content matters. This is clearly one such application.

[LLM][Safety][Evaluation]

“DeepSeek Summary: Bender criticizes LLMs as 'synthetic text extruding machines' unsuitable for accuracy-critical applications.”

Naomi Saphra@NaomiSaphra

This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!

“DeepSeek Summary: Naomi Saphra comments on a book about tuberculosis, noting its engaging start.”

Naomi Saphra@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

[Evaluation]

“DeepSeek Summary: Announces a new preprint on causal interpretation, emphasizing its coherence and testability.”

Naomi Saphra@NaomiSaphra

New preprint! Phase transitions! We love to see them during LM training.

[LLM]

“DeepSeek Summary: Announces a new preprint about phase transitions in language model training.”

Angela Zhou@angelamczhou

#throwback to the beginnings of a beautiful friendship =D @ansonmount @HellOnWheelsAMC #HellonWheels #onlocation.

[Deployment]

“DeepSeek Summary: Angela Zhou shares a throwback post about her friendship with co-stars on the set of Hell on Wheels.”

Ben Recht@beenwrekt

I weigh in on the Trump administration’s newfound obsession with Gold Standard Science and reproducibility. Though it’s not all in bad faith, it’s likely to backfire.

[Evaluation]

“DeepSeek Summary: Critique of the Trump administration's focus on reproducibility in science, warning it may backfire despite some good faith.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Infra]

“DeepSeek Summary: Announcement of teaching a class on learning and control after a long hiatus.”

Ben Recht@beenwrekt

Revisiting Sutton's Bitter Lesson in the wake of GPT-5.

[LLM]

“DeepSeek Summary: Revisiting a classic AI lesson in context of latest GPT advancements.”

-- END OF LOG --

[STATS] 51 items · Filter applied