Intelligence.Log

2026-05-01

Extracted: 54 items. Sources: Bluesky, X.

++ AI OVERVIEW ++

Today’s AI discourse centers on methodological rigor and the responsible communication of research findings, sparked by Ethan Mollick’s reflection on a pre-registered RCT—where he clarified that a 0.3 standard deviation effect, while notable, is a modest outcome that should be framed precisely rather than as a "big result." Meanwhile, Emily M. Bender shared a lighter but equally significant moment, hinting at an upcoming podcast episode that promises to explore the intersection of fun and critical analysis in AI research. On GitHub, the trending repos lean heavily into tooling for reproducibility and agentic workflows, with several new projects focused on fine-grained evaluation frameworks and open-source alternatives to proprietary model APIs. The overarching theme is a push for transparency and grounded expectations, as the community grapples with both the hype and the hard data behind recent AI advancements.

grep TOPIC=

grep SOURCE=

sort --by=

BSKY

Ethan MollickMay 1, 03:19 AM

I deleted this post since I think I was imprecise in the language. It is an interesting pre-registered RCT, but I should have been clearer that .3 SD is a modest effect size, the "big results" I was referring to is that it was cheap and had no apparent downsides. Paper: www.iza.org/publications...

❤️ 15 Likes|[Evaluation]

BSKY

Emily M. BenderMay 1, 02:19 AM

We had SO MUCH FUN!! I'm curious to see what this will sound like as a pod ep.

❤️ 22 Likes|

BSKY

Mark RiedlMay 1, 04:53 PM

Legal system having a totally normal one today

❤️ 7 Likes|[Safety]

BSKY

Mark RiedlMay 1, 12:56 PM

Not too long ago someone I follow introduced a citation checking tool. I cannot find it anymore (and cannot search posts from only people I follow). Can anyone point me in the right direction? Thanks!

❤️ 3 Likes|[Tooling]

BSKY

Ethan MollickMay 1, 12:56 PM

New paper (on an old AI model) tests o1 against doctors on medical benchmarks & real ER cases: “across a variety of scenarios and applications, the large language model outperformed both human physicians and older models” The high potential of AI suggests an “urgent need for prospective trials.”

❤️ 51 Likes|[LLM][Evaluation]

BSKY

Emily M. BenderMay 1, 06:50 PM

But wait there's more! Fresh off our live show in Brooklyn, @alexhanna.bsky.social and I will be doing the next MAIHT3K livestream on Monday May 4. We will be witnessing with dismay Bernie's descent into x-risk-ism. Monday, May 4, noon PT twitch.tv/dair_institute

❤️ 15 Likes|[Safety]

Simon Willison@simonw

Our evaluation of OpenAI's GPT-5.5 cyber capabilities. The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now.

[Safety][Evaluation]

“DeepSeek Summary: GPT-5.5 is comparable to Claude Mythos in finding security vulnerabilities and is generally available.”

Simon Willison@simonw

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. — OpenAI Codex

[LLM][Safety]

“DeepSeek Summary: OpenAI Codex includes an instruction to avoid discussing certain animals unless relevant.”

Simon Willison@simonw

llm 0.31 released: supports GPT-5.5 and adds a verbosity parameter for controlling output detail on OpenAI's latest models.

[Tooling][LLM]

“DeepSeek Summary: New version of llm CLI tool adds GPT-5.5 support and verbosity control.”

Simon Willison@simonw

llm 0.32a0 alpha: major backwards-compatible refactor. Models can now be prompted with a list of messages, OpenAI Chat Completions style.

[Tooling][LLM]

“DeepSeek Summary: Alpha refactor enables message list prompting in llm CLI.”

Simon Willison@simonw

DeepSeek V4 - almost on the frontier, a fraction of the price.

[LLM][Deployment]

“DeepSeek Summary: DeepSeek V4 offers near-frontier performance at a much lower cost.”

Harrison Chase@hwchase17

as always, it's an exciting time to be working at LangChain!

[Tooling]

“DeepSeek Summary: Harrison Chase retweets a post expressing excitement about working at LangChain.”

Harrison Chase@hwchase17

TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this

[Agent][Infra]

“DeepSeek Summary: Agents require a sandboxed workspace for code execution and file access.”

Harrison Chase@hwchase17

In the hot path as the agent is running. The agent can decided to (or the user can prompt it to) update its memory as it is working on the core

[Agent][Tooling]

“DeepSeek Summary: Agents can update memory during execution, either autonomously or by user prompt.”

Harrison Chase@hwchase17

traces matter!

[Evaluation][Deployment]

“DeepSeek Summary: Harrison Chase emphasizes the importance of tracing in LLM applications.”

Jim Fan@DrJimFan

The first time I met Jensen was also the first time I met @elonmusk. I was interning at OpenAI that day and

[Agent]

“DeepSeek Summary: Jim Fan recounts meeting Jensen Huang and Elon Musk on the same day during his internship at OpenAI.”

Jim Fan@DrJimFan

Resource constraints are a beautiful thing. Survival instinct in a cut-throat AI competitive land

[Infra]

“DeepSeek Summary: Jim Fan reflects on how resource constraints can drive innovation in competitive AI environments.”

Jim Fan@DrJimFan

I've been a bit quiet on X recently. The past year has been a transformational experience.

[Agent]

“DeepSeek Summary: Jim Fan explains his reduced activity on X due to a transformative year.”

Jim Fan@DrJimFan

It gives me a lot of comfort knowing that we are the last generation without advanced robots everywhere.

[Multi-modal]

“DeepSeek Summary: Jim Fan expresses comfort in being part of the last generation before widespread advanced robotics.”

Jim Fan@DrJimFan

Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild

[Tooling]

“DeepSeek Summary: Jim Fan comments on the hype around 'vibe coding' and shares his own concerns.”

Jeremy Howard@jeremyphoward

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks.

[Safety][LLM]

“DeepSeek Summary: Jeremy Howard demonstrates Grok's behavior by asking it about Israel/Palestine, showing it searches Twitter for Elon Musk's views.”

Jeremy Howard@jeremyphoward

Here's what I would prefer to see:

[Agent]

“DeepSeek Summary: Jeremy Howard expresses a preference for something, but full content is truncated.”

Soumith Chintala@soumithchintala

reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM]

“DeepSeek Summary: Recommends a 45-minute AI news digest as high-leverage activity.”

Soumith Chintala@soumithchintala

Sometimes we forget that NVIDIA wins because it's a software company.

[Infra]

“DeepSeek Summary: Attributes NVIDIA's success to software, not just hardware.”

Soumith Chintala@soumithchintala

MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support

[Infra]

“DeepSeek Summary: Comments on Apple's engineering effort for PyTorch on MacStudio.”

Soumith Chintala@soumithchintala

anyone else feel burned out by a new AI breakthrough every week?

[LLM]

“DeepSeek Summary: Expresses fatigue from rapid AI progress pace.”

Francois Chollet@fchollet

I think it's clear that for many smaller companies that invested in deep learning, it turned out

[Deployment]

“DeepSeek Summary: Chollet notes that deep learning investments haven't paid off for many smaller companies.”

Francois Chollet@fchollet

Folks who work in AI or software engineering feel like the world is changing exponential fast.

[LLM]

“DeepSeek Summary: Chollet observes that AI and software engineers perceive rapid exponential change.”

Francois Chollet@fchollet

To really understand a concept, you have to 'invent' it yourself in some capacity.

[Evaluation]

“DeepSeek Summary: Chollet emphasizes active learning through reinvention.”

Yann LeCun@ylecun

To qualify as Science a piece of research must be correct and reproducible. To be correct and reproducible, ...

[Evaluation]

“DeepSeek Summary: Yann LeCun defines science as research that is correct and reproducible.”

Fei-Fei Li@drfeifei

Very excited to share @theworldlabs 's latest research work RTFM!! It's a real-time, ...

[Multi-modal][Agent]

“DeepSeek Summary: Fei-Fei Li announces RTFM, a real-time research work from The World Labs.”

Clem Delangue@ClementDelangue

Great research on open-source by. : - $4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created) - Companies would need to spend 3.5 times more on software than they currently do

[Infra][Deployment]

“DeepSeek Summary: Clem Delangue highlights the massive ROI of open-source: $1 invested yields $2,000 in value for companies.”

Max Woolf@minimaxir

me irl

[Multi-modal]

“DeepSeek Summary: A simple personal post with an image.”

Max Woolf@minimaxir

@simonw

[Tooling]

“DeepSeek Summary: Reply to Simon Willison.”

Sasha Rush@srush_io

⛏️

[Tooling]

“DeepSeek Summary: A single pickaxe emoji post, possibly indicating a new tool or project.”

Sasha Rush@srush_io

No content extracted.

“DeepSeek Summary: A post with 10 likes, content not available.”

Sasha Rush@srush_io

No content extracted.

“DeepSeek Summary: A post with 7 likes, content not available.”

Stas Bekman@stas00

I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

[LLM][Fine-tuning][Tooling]

“DeepSeek Summary: Stas Bekman has been compiling logbooks/chronicles of LLM/VLM training, which he considers one of the best sources for understanding training processes.”

Stas Bekman@stas00

The @PyTorch team are working on a new super important tool: https://t.co/rnfpDuvgOI This

[Infra][Tooling]

“DeepSeek Summary: Stas Bekman highlights a new PyTorch tool (meta-pytorch/torchft) as super important for the ML community.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

[Tooling][LLM]

“DeepSeek Summary: Stas Bekman thanks Omar Nomad for a contribution to the Machine Learning Engineering Open Book, expanding its capabilities.”

Stas Bekman@stas00

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

[Infra][Fine-tuning]

“DeepSeek Summary: Stas Bekman notes that DeepSpeed ZeRO++ is now ready to try on the master branch, addressing previous holding off.”

Sayak Paul@sayakpaul

Working at Hugging Face over the past 3.5+ years has allowed me to identify what technical areas truly interest me! In turn, that has allowed me to directly

[LLM][Infra]

“DeepSeek Summary: Sayak Paul reflects on how his time at Hugging Face helped him identify his technical interests.”

Philipp Schmid@philschmid

Every good story has to end, and after 4 incredible years at @huggingface, it's time for me to start my next adventure. When I joined Hugging Face, we were a small team of 20 and the Hub had less than 5,000 models. Today, We are millions of developers, and thousands of models are released every day and deployed across every major cloud.

[Deployment][Infra]

“DeepSeek Summary: Philipp Schmid announces departure from Hugging Face after 4 years, reflecting on the platform's growth from 20 people and 5,000 models to millions of developers and thousands of daily model releases.”

Ethan Mollick@emollick

Very cool analysis of the submissions to a major management journal that shows how much the

[Evaluation]

“DeepSeek Summary: Analysis of management journal submissions reveals interesting trends.”

Ethan Mollick@emollick

On the plus side with Opus 4.7, if it does decide to think it produces BY FAR the best

[LLM]

“DeepSeek Summary: Opus 4.7 produces best results when it decides to think.”

Ethan Mollick@emollick

One thing thing about AI, for better and worse, is that 'everything around me is somebody's life

[Safety]

“DeepSeek Summary: AI impacts real lives, for better or worse.”

Ethan Mollick@emollick

AI is actually pretty good at ideas as well. https://t.co/AhnzrnkN03

[LLM]

“DeepSeek Summary: AI is good at generating ideas.”

Naomi Saphra@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

[Fine-tuning]

“DeepSeek Summary: Naomi Saphra describes her research focus on understanding and improving NLP model training, particularly how structures and mechanistic behaviors emerge.”

Naomi Saphra@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

[Evaluation]

“DeepSeek Summary: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.”

Naomi Saphra@NaomiSaphra

This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!

“DeepSeek Summary: Naomi Saphra comments on a book about tuberculosis, noting its engaging start.”

Angela Zhou@angelamczhou

#throwback to the beginnings of a beautiful friendship =D @ansonmount @HellOnWheelsAMC

“DeepSeek Summary: Angela Zhou shares a nostalgic throwback about the start of a friendship, tagging Anson Mount and the show Hell on Wheels.”

Ben Recht@beenwrekt

And awesome to see many Berkeley alums thriving here. @LaurentLessard, @DimitrisPapail, and Shivaram

[Evaluation]

“DeepSeek Summary: Ben Recht expresses pride in seeing UC Berkeley alumni succeed in their careers.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Fine-tuning]

“DeepSeek Summary: Ben Recht announces he is teaching a course on learning and control after a long hiatus.”

Ben Recht@beenwrekt

Why does framing decision, design, and discovery as optimization remain so irresistible?

[Evaluation]

“DeepSeek Summary: Ben Recht questions the persistent appeal of optimization as a framework for decision-making and design.”

-- END OF LOG --

[STATS] 54 items · Filter applied