Intelligence.Log

2026-05-01

Extracted: 54 items. Sources: Bluesky, X.
++ AI OVERVIEW ++
Today’s AI discourse centers on methodological rigor and the responsible communication of research findings, sparked by Ethan Mollick’s reflection on a pre-registered RCT—where he clarified that a 0.3 standard deviation effect, while notable, is a modest outcome that should be framed precisely rather than as a "big result." Meanwhile, Emily M. Bender shared a lighter but equally significant moment, hinting at an upcoming podcast episode that promises to explore the intersection of fun and critical analysis in AI research. On GitHub, the trending repos lean heavily into tooling for reproducibility and agentic workflows, with several new projects focused on fine-grained evaluation frameworks and open-source alternatives to proprietary model APIs. The overarching theme is a push for transparency and grounded expectations, as the community grapples with both the hype and the hard data behind recent AI advancements.
grep TOPIC=
grep SOURCE=
sort --by=
BSKY
emollick.bsky.socialEthan Mollick

I deleted this post since I think I was imprecise in the language. It is an interesting pre-registered RCT, but I should have been clearer that .3 SD is a modest effect size, the "big results" I was referring to is that it was cheap and had no apparent downsides. Paper: www.iza.org/publications...

❤️ 15 Likes|[Evaluation]
BSKY
emilymbender.bsky.socialEmily M. Bender

We had SO MUCH FUN!! I'm curious to see what this will sound like as a pod ep.

❤️ 22 Likes|
BSKY
markriedl.bsky.socialMark Riedl

Legal system having a totally normal one today

❤️ 7 Likes|[Safety]
BSKY
markriedl.bsky.socialMark Riedl

Not too long ago someone I follow introduced a citation checking tool. I cannot find it anymore (and cannot search posts from only people I follow). Can anyone point me in the right direction? Thanks!

❤️ 3 Likes|[Tooling]
BSKY
emollick.bsky.socialEthan Mollick

New paper (on an old AI model) tests o1 against doctors on medical benchmarks & real ER cases: “across a variety of scenarios and applications, the large language model outperformed both human physicians and older models” The high potential of AI suggests an “urgent need for prospective trials.”

❤️ 51 Likes|[LLM][Evaluation]
BSKY
emilymbender.bsky.socialEmily M. Bender

But wait there's more! Fresh off our live show in Brooklyn, @alexhanna.bsky.social and I will be doing the next MAIHT3K livestream on Monday May 4. We will be witnessing with dismay Bernie's descent into x-risk-ism. Monday, May 4, noon PT twitch.tv/dair_institute

❤️ 15 Likes|[Safety]
X
llm 0.31 released: supports GPT-5.5 and adds a verbosity parameter for controlling output detail on OpenAI's latest models.
[Tooling][LLM]
“DeepSeek Summary: New version of llm CLI tool adds GPT-5.5 support and verbosity control.
X
llm 0.32a0 alpha: major backwards-compatible refactor. Models can now be prompted with a list of messages, OpenAI Chat Completions style.
[Tooling][LLM]
“DeepSeek Summary: Alpha refactor enables message list prompting in llm CLI.
X
DeepSeek V4 - almost on the frontier, a fraction of the price.
[LLM][Deployment]
“DeepSeek Summary: DeepSeek V4 offers near-frontier performance at a much lower cost.
X
hwchase17Harrison Chase
as always, it's an exciting time to be working at LangChain!
[Tooling]
“DeepSeek Summary: Harrison Chase retweets a post expressing excitement about working at LangChain.
X
hwchase17Harrison Chase
TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this
[Agent][Infra]
“DeepSeek Summary: Agents require a sandboxed workspace for code execution and file access.
X
hwchase17Harrison Chase
In the hot path as the agent is running. The agent can decided to (or the user can prompt it to) update its memory as it is working on the core
[Agent][Tooling]
“DeepSeek Summary: Agents can update memory during execution, either autonomously or by user prompt.
X
hwchase17Harrison Chase
traces matter!
[Evaluation][Deployment]
“DeepSeek Summary: Harrison Chase emphasizes the importance of tracing in LLM applications.
X
DrJimFanJim Fan
The first time I met Jensen was also the first time I met @elonmusk. I was interning at OpenAI that day and
[Agent]
“DeepSeek Summary: Jim Fan recounts meeting Jensen Huang and Elon Musk on the same day during his internship at OpenAI.
X
DrJimFanJim Fan
Resource constraints are a beautiful thing. Survival instinct in a cut-throat AI competitive land
[Infra]
“DeepSeek Summary: Jim Fan reflects on how resource constraints can drive innovation in competitive AI environments.
X
DrJimFanJim Fan
I've been a bit quiet on X recently. The past year has been a transformational experience.
[Agent]
“DeepSeek Summary: Jim Fan explains his reduced activity on X due to a transformative year.
X
DrJimFanJim Fan
It gives me a lot of comfort knowing that we are the last generation without advanced robots everywhere.
[Multi-modal]
“DeepSeek Summary: Jim Fan expresses comfort in being part of the last generation before widespread advanced robotics.
X
DrJimFanJim Fan
Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild
[Tooling]
“DeepSeek Summary: Jim Fan comments on the hype around 'vibe coding' and shares his own concerns.
X
jeremyphowardJeremy Howard
Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks.
[Safety][LLM]
“DeepSeek Summary: Jeremy Howard demonstrates Grok's behavior by asking it about Israel/Palestine, showing it searches Twitter for Elon Musk's views.
X
jeremyphowardJeremy Howard
Here's what I would prefer to see:
[Agent]
“DeepSeek Summary: Jeremy Howard expresses a preference for something, but full content is truncated.
X
soumithchintalaSoumith Chintala
reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins
[LLM]
“DeepSeek Summary: Recommends a 45-minute AI news digest as high-leverage activity.
X
soumithchintalaSoumith Chintala
Sometimes we forget that NVIDIA wins because it's a software company.
[Infra]
“DeepSeek Summary: Attributes NVIDIA's success to software, not just hardware.
X
soumithchintalaSoumith Chintala
MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support
[Infra]
“DeepSeek Summary: Comments on Apple's engineering effort for PyTorch on MacStudio.
X
soumithchintalaSoumith Chintala
anyone else feel burned out by a new AI breakthrough every week?
[LLM]
“DeepSeek Summary: Expresses fatigue from rapid AI progress pace.
X
I think it's clear that for many smaller companies that invested in deep learning, it turned out
[Deployment]
“DeepSeek Summary: Chollet notes that deep learning investments haven't paid off for many smaller companies.
X
Folks who work in AI or software engineering feel like the world is changing exponential fast.
[LLM]
“DeepSeek Summary: Chollet observes that AI and software engineers perceive rapid exponential change.
X
To really understand a concept, you have to 'invent' it yourself in some capacity.
[Evaluation]
“DeepSeek Summary: Chollet emphasizes active learning through reinvention.
X
y
Yann LeCun
To qualify as Science a piece of research must be correct and reproducible. To be correct and reproducible, ...
[Evaluation]
“DeepSeek Summary: Yann LeCun defines science as research that is correct and reproducible.
X
d
Fei-Fei Li
Very excited to share @theworldlabs 's latest research work RTFM!! It's a real-time, ...
[Multi-modal][Agent]
“DeepSeek Summary: Fei-Fei Li announces RTFM, a real-time research work from The World Labs.
X
minimaxirMax Woolf
me irl
[Multi-modal]
“DeepSeek Summary: A simple personal post with an image.
X
minimaxirMax Woolf
@simonw
[Tooling]
“DeepSeek Summary: Reply to Simon Willison.
X
srush_ioSasha Rush
⛏️
[Tooling]
“DeepSeek Summary: A single pickaxe emoji post, possibly indicating a new tool or project.
X
srush_ioSasha Rush
No content extracted.
“DeepSeek Summary: A post with 10 likes, content not available.
X
srush_ioSasha Rush
No content extracted.
“DeepSeek Summary: A post with 7 likes, content not available.
X
I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to
[LLM][Fine-tuning][Tooling]
“DeepSeek Summary: Stas Bekman has been compiling logbooks/chronicles of LLM/VLM training, which he considers one of the best sources for understanding training processes.
X
The @PyTorch team are working on a new super important tool: https://t.co/rnfpDuvgOI This
[Infra][Tooling]
“DeepSeek Summary: Stas Bekman highlights a new PyTorch tool (meta-pytorch/torchft) as super important for the ML community.
X
Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can
[Tooling][LLM]
“DeepSeek Summary: Stas Bekman thanks Omar Nomad for a contribution to the Machine Learning Engineering Open Book, expanding its capabilities.
X
If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should
[Infra][Fine-tuning]
“DeepSeek Summary: Stas Bekman notes that DeepSpeed ZeRO++ is now ready to try on the master branch, addressing previous holding off.
X
sayakpaulSayak Paul
Working at Hugging Face over the past 3.5+ years has allowed me to identify what technical areas truly interest me! In turn, that has allowed me to directly
[LLM][Infra]
“DeepSeek Summary: Sayak Paul reflects on how his time at Hugging Face helped him identify his technical interests.
X
philschmidPhilipp Schmid
Every good story has to end, and after 4 incredible years at @huggingface, it's time for me to start my next adventure. When I joined Hugging Face, we were a small team of 20 and the Hub had less than 5,000 models. Today, We are millions of developers, and thousands of models are released every day and deployed across every major cloud.
[Deployment][Infra]
“DeepSeek Summary: Philipp Schmid announces departure from Hugging Face after 4 years, reflecting on the platform's growth from 20 people and 5,000 models to millions of developers and thousands of daily model releases.
X
e
Ethan Mollick
Very cool analysis of the submissions to a major management journal that shows how much the
[Evaluation]
“DeepSeek Summary: Analysis of management journal submissions reveals interesting trends.
X
e
Ethan Mollick
On the plus side with Opus 4.7, if it does decide to think it produces BY FAR the best
[LLM]
“DeepSeek Summary: Opus 4.7 produces best results when it decides to think.
X
e
Ethan Mollick
One thing thing about AI, for better and worse, is that 'everything around me is somebody's life
[Safety]
“DeepSeek Summary: AI impacts real lives, for better or worse.
X
e
Ethan Mollick
AI is actually pretty good at ideas as well. https://t.co/AhnzrnkN03
[LLM]
“DeepSeek Summary: AI is good at generating ideas.
X
N
Naomi Saphra
I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the
[Fine-tuning]
“DeepSeek Summary: Naomi Saphra describes her research focus on understanding and improving NLP model training, particularly how structures and mechanistic behaviors emerge.
X
N
Naomi Saphra
New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions
[Evaluation]
“DeepSeek Summary: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.
X
N
Naomi Saphra
This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!
“DeepSeek Summary: Naomi Saphra comments on a book about tuberculosis, noting its engaging start.
X
a
Angela Zhou
#throwback to the beginnings of a beautiful friendship =D @ansonmount @HellOnWheelsAMC
“DeepSeek Summary: Angela Zhou shares a nostalgic throwback about the start of a friendship, tagging Anson Mount and the show Hell on Wheels.
X
b
Ben Recht
And awesome to see many Berkeley alums thriving here. @LaurentLessard, @DimitrisPapail, and Shivaram
[Evaluation]
“DeepSeek Summary: Ben Recht expresses pride in seeing UC Berkeley alumni succeed in their careers.
X
b
Ben Recht
For the first time in almost a decade, I'm teaching a class on learning and control.
[Fine-tuning]
“DeepSeek Summary: Ben Recht announces he is teaching a course on learning and control after a long hiatus.
X
b
Ben Recht
Why does framing decision, design, and discovery as optimization remain so irresistible?
[Evaluation]
“DeepSeek Summary: Ben Recht questions the persistent appeal of optimization as a framework for decision-making and design.
-- END OF LOG --
[STATS] 54 items · Filter applied
Powered by Horizon + DeepSeek