Intelligence.Log

2026-05-04

Extracted: 64 items. Sources: GitHub, Bluesky, X, Blogs.

++ AI OVERVIEW ++

Today’s discourse centered on the persistent challenge of AI hallucination in academic publishing, with Mark Riedl sharing a new reference checker he built to catch fake citations in papers under review—highlighting that PDF-to-text extraction remains an "open problem" and formats vary wildly. While the technical community grappled with research integrity, a lighter note came from Marc Lanctot, whose Bluesky post went viral for celebrating the Tampa Bay Lightning’s improbable playoff elimination by a team of underdogs, including a Costco employee and a rookie goalie. The juxtaposition underscores a day where AI’s reliability issues and the chaos of sports both demanded attention, though the former remains the more pressing trend for developers and researchers.

grep TOPIC=

grep SOURCE=

sort --by=

danshapiro/trycycle★ 0.2k▲ 6/10

Starred bysimonw|[Tooling][Evaluation]

“Trycycle is a tool that helps developers iterate quickly on AI prompts by automatically generating and testing variations, making it easier to find the best prompt for a given task. It integrates with popular AI models and provides a simple CLI interface for prompt experimentation.”

danshapiro/ringdown★ 0.0k▲ 4/10

Starred bysimonw|[Tooling]

“Ringdown is a lightweight Python tool for recording and replaying HTTP responses, useful for testing and development. It simplifies mocking external APIs by capturing real responses and serving them offline.”

Fusion/pngsource★ 0.1k▲ 3/10

Embed Embed source code in png files

Starred byminimaxir|[Tooling]

“This repository allows embedding source code into PNG images, enabling a novel way to distribute code alongside visual assets. It provides a simple tool to encode and decode code within image files, useful for sharing or obfuscation.”

BSKY

Mark RiedlMay 4, 01:29 AM

I wrote a reference checker to see if papers I am reviewing have hallucinated references. It's a ghastly problem. PDF-to-structured-text is still an open problem. Reference formats can vary and some are hard to parse. Even when references are correct, there can be sloppiness.

❤️ 30 Likes|[Evaluation][Tooling]

BSKY

Marc LanctotMay 4, 12:57 AM

The Tampa Bay Lightning literally just got eliminated by a Costco employee, a European, a rookie goalie, and an bunch of irrelevant players 🤣🤣🤣 Oh and with just 9 shots on net! 😁 Na na na na 🎵, na na na na 🎶, eyyaayyy goodbye 👋👋👋 #gohabsgo round two bring on the Sabres and see you in Buffalo!! 🥳

❤️ 9 Likes|

BSKY

Simon WillisonMay 4, 11:50 PM

I tried running the same "Generate an SVG of a pelican riding a bicycle" prompt against 21 different quantized variants of the same IBM Granite 4.1 3B model - the results weren't as interesting as I had hoped simonwillison.net/2026/May/4/g...

❤️ 27 Likes|[Evaluation][Deployment]

BSKY

Mark RiedlMay 4, 08:46 PM

It's going to be a pin, or a pen, or earbuds, or a phone...

❤️ 0 Likes|[Deployment]

BSKY

Mark RiedlMay 4, 06:54 PM

oof

❤️ 9 Likes|

BSKY

Mark RiedlMay 4, 06:39 PM

On this May the Fourth, let us step back for a moment to think about how, very soon, "The Mandalorian & Grogu" will supplant "Attack of the Clones" for the Star Wars movie with the cringiest title.

❤️ 2 Likes|

BSKY

Mark RiedlMay 4, 06:34 PM

That viral paper on the benefits of ChatGPT in education was using unsound meta-review methodologies. This does not mean that there are no benefits or anti-benefits of AI, only that the conclusions drawn in the paper cannot be drawn www.404media.co/nature-retra...

❤️ 34 Likes|[Evaluation]

BSKY

Nathan LambertMay 4, 04:44 PM

We need to create a new term for the attacks some Chinese labs are doing on APIs that is different than distillation or else we risk tarnishing a crucial technique that is crucial to AI diffusion, academic research & the open-source ecosystem. www.interconnects.ai/p/the-distil...

❤️ 18 Likes|[Safety]

BSKY

Ethan MollickMay 4, 05:51 PM

It is somewhat comforting that now, whenever I see a post about “here’s the thing that keeps me up at night” I know that there is absolutely no chance that this is being written by a human who is staying up all night.

❤️ 49 Likes|[Safety]

BSKY

Ethan MollickMay 4, 04:54 PM

This is from the co-founder of Anthropic, interesting that he refers to public sources when he is also obviously privy to lots of internal sources that he cannot discuss. I assume he sees the same thing at Anthropic. importai.substack.com/p/import-ai-...

❤️ 69 Likes|[LLM][Safety]

BSKY

Ethan MollickMay 4, 04:35 AM

Poems that ChatGPT, Claude, and Gemini all seem to "like" or suggest when you ask for poetry related to being/making LLMs: Rilke's "Archaic Torso of Apollo" Stevens' "Idea of Order at Key West" Borges's "The Golem" (or "The Other Tiger") Pessoa's "Autopsychography" Pretty apt choices!

❤️ 43 Likes|[LLM]

BSKY

Emily M. BenderMay 4, 01:01 PM

Today!

❤️ 2 Likes|

BSKY

Ben RechtMay 4, 09:25 PM

Easy Bay Friends: Tomorrow at Berkeley, the Social Science Matrix is hosting a conversation between Marion Fourcade and me about The Irrational Decision. More info and registration link here: matrix.berkeley.edu/events/the-i...

❤️ 5 Likes|

BSKY

Ben RechtMay 4, 09:16 PM

5/4 for 5/4

❤️ 4 Likes|

Andrej Karpathy@karpathy

The hottest new programming language is English

[LLM][Tooling]

“DeepSeek Summary: Karpathy suggests that natural language is becoming the dominant way to program, thanks to AI.”

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest.

[LLM][RAG]

“DeepSeek Summary: Karpathy advocates using LLMs to create personal knowledge bases for research.”

Simon Willison@simonw

It's interesting how "better at code" has become the defining goal of almost every AI lab over the

[LLM][Tooling]

“DeepSeek Summary: Simon observes that AI labs are increasingly focused on improving code generation capabilities as a primary objective.”

Simon Willison@simonw

I've published video, slides and a detailed annotated transcript from my talk at this week's

[LLM]

“DeepSeek Summary: Simon shares materials from a talk about the last year six months in LLMs, illustrated by pelicans on bicycles.”

Simon Willison@simonw

This may be the best guidance I've seen anywhere on writing a really good commit history.

[Tooling]

“DeepSeek Summary: Simon recommends guidance on writing good commit history.”

Harrison Chase@hwchase17

A brilliant surgeon without instruments, nurses, or an operating room is almost useless. The skill is real. But without the system around them, it goes nowhere.

[Infra][Agent][Tooling]

“DeepSeek Summary: Skill alone is insufficient without supporting infrastructure.”

Harrison Chase@hwchase17

RT @samecrowder: as always, it's an exciting time to be working at LangChain!

[LLM]

“DeepSeek Summary: Retweet expressing excitement about working at LangChain.”

Harrison Chase@hwchase17

Christian was a big part of the idea of middleware! He's going to help make langchain and langgraph agents more

[Agent][Infra]

“DeepSeek Summary: Acknowledges contribution to middleware concept for LangChain agents.”

Harrison Chase@hwchase17

TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this

[Agent][Infra][Tooling]

“DeepSeek Summary: Agents require sandboxed workspaces for code execution.”

Jim Fan@DrJimFan

Resource constraints are a beautiful thing. Survival instinct in a cut-throat AI competitive land

[Agent]

“DeepSeek Summary: Resource constraints drive innovation and survival in competitive AI landscape.”

Jim Fan@DrJimFan

I've been a bit quiet on X recently. The past year has been a transformational experience.

[Multi-modal]

“DeepSeek Summary: Jim Fan reflects on a transformative year and his reduced activity on X.”

Jim Fan@DrJimFan

It gives me a lot of comfort knowing that we are the last generation without advanced robots everywhere.

[Multi-modal]

“DeepSeek Summary: Perspective on the imminent ubiquity of advanced robotics.”

Jim Fan@DrJimFan

Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild

[Agent]

“DeepSeek Summary: Commentary on the 'vibe coding' trend and its implications.”

Jeremy Howard@jeremyphoward

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks.

[Safety][LLM]

“DeepSeek Summary: Jeremy Howard posted a video of asking Grok about Israel/Palestine, noting it first searches Twitter for Elon Musk's views.”

Soumith Chintala@soumithchintala

reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM]

“DeepSeek Summary: Recommends a newsletter as high-leverage reading.”

Soumith Chintala@soumithchintala

Sometimes we forget that NVIDIA wins because it's a software company.

[Infra]

“DeepSeek Summary: Attributes NVIDIA's success to software, not just hardware.”

Soumith Chintala@soumithchintala

Open LLMs need to get organized and co-ordinated about sharing human feedback.

[LLM][Safety]

“DeepSeek Summary: Calls for coordination among open LLM developers on human feedback.”

Soumith Chintala@soumithchintala

MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support

[Infra]

“DeepSeek Summary: Comments on Apple's engineering effort for PyTorch on Mac Studio.”

Francois Chollet@fchollet

I think it's clear that for many smaller companies that invested in deep learning, it turned out

[Deployment]

“DeepSeek Summary: Smaller companies that invested in deep learning faced challenges.”

Francois Chollet@fchollet

Folks who work in AI or software engineering feel like the world is changing exponential fast.

[Evaluation]

“DeepSeek Summary: AI and software engineers perceive rapid exponential change.”

Yann LeCun@ylecun

Yann LeCun's $1B Bet Against LLMs

[LLM][Agent]

“DeepSeek Summary: Yann LeCun is taking a $1 billion bet against large language models, promoting alternative AI approaches.”

Fei-Fei Li@drfeifei

Very excited to share @theworldlabs 's latest research work RTFM!! It's a real-time, ...

[Multi-modal]

“DeepSeek Summary: Fei-Fei Li announces RTFM research from World Labs, focusing on real-time spatial intelligence.”

Max Woolf@minimaxir

LOL

“DeepSeek Summary: Max Woolf posted a simple reaction 'LOL'.”

Sasha Rush@srush_io

#acl2020nlp Lot of threads online about likes and dislikes for the conference. Twitter is fleeting, github is forever. Send issues or PRs: https://github.com/Mini-Conf/Mini-Conf/issues… It's early days, we're making up virtual conferences as we go along.

[Infra]

“DeepSeek Summary: Sasha Rush advocates for using GitHub over Twitter for lasting conference feedback, and acknowledges the experimental nature of virtual conferences.”

Sasha Rush@srush_io

(My last chance to tweet about Yoon Kim as he leaves the lab 😢. Part of an amazing group of students.) Congrats to Yoon on winning this year's HarvardCS thesis award! And since its public, Yoon is heading next to MIT. Highly recommend sending an app🍎 https://seas.

[Evaluation]

“DeepSeek Summary: Sasha Rush congratulates student Yoon Kim on winning a thesis award and announces his move to MIT.”

Sasha Rush@srush_io

Congrats to Dr. Yoon Kim 🍾 who zoom defended his dissertation "Deep Latent Variable Model of Natural Language". Yoon's research is wonderful, he's also such a thoughtful teacher and dedicated collaborator. Very curious what he decides to do next

[Evaluation]

“DeepSeek Summary: Sasha Rush celebrates Yoon Kim's PhD defense and praises his research and character.”

Sasha Rush@srush_io

Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. https://cursor.com/blog/composer Excited for the potential of building specialized models to help in critical domains.

[LLM][Fine-tuning][Tooling]

“DeepSeek Summary: Sasha Rush announces Composer, a new RL-trained MoE coding model from Cursor, emphasizing speed and real-world coding performance.”

Stas Bekman@stas00

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

[Infra][Deployment]

“DeepSeek Summary: Stas Bekman points out that DeepSpeed ZeRO++ is now available on master branch, encouraging users to try it.”

Stas Bekman@stas00

Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

[Infra][Tooling]

“DeepSeek Summary: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul for evaluating ML hardware.”

Stas Bekman@stas00

If you're trying out FA4, you're likely to run into not being able to load cutlass.cute

[Infra][Tooling]

“DeepSeek Summary: Stas Bekman warns about a common issue with FlashAttention-4 where cutlass.cute fails to load.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

[Tooling]

“DeepSeek Summary: Stas Bekman acknowledges a contribution to the Machine Learning Engineering Open Book, adding new content.”

Sayak Paul@sayakpaul

Working at Hugging Face over the past 3.5+ years has allowed me to identify what technical areas truly interest me! In turn, that has allowed me to directly

[LLM][Deployment][Tooling]

“DeepSeek Summary: Reflects on how working at Hugging Face helped identify technical interests.”

Philipp Schmid@philschmid

I read three technical reports from Moonshot AI's Kimi K2.5 paper, Cursor's Composer 2 report and blog post, and Chroma's Context-1 write-up

[Agent][LLM][Tooling]

“DeepSeek Summary: Philipp Schmid read three technical reports on AI topics.”

Philipp Schmid@philschmid

Random thought. We are going to be so much faster at creating and building.

[Agent][Deployment]

“DeepSeek Summary: He reflects on the accelerating pace of creation and building.”

Philipp Schmid@philschmid

Skills have become one of the most used extension points in agents. They're flexible, easy to make, and simple to distribute.

[Agent][Tooling]

“DeepSeek Summary: He notes that Skills are a key extension point for agents.”

Ethan Mollick@emollick

Here is a full implementation of the Chinese Room using a printed copy of GPT-1, in case you have a few spare years and want to actually run

[LLM][Safety]

“DeepSeek Summary: Ethan Mollick humorously describes a thought experiment implementation of the Chinese Room using a printed GPT-1, highlighting the impracticality of running it manually.”

Ethan Mollick@emollick

The fact that no current AI models, often including GPT-5, believe in the existence of

[LLM][Evaluation]

“DeepSeek Summary: Mollick points out that even advanced AI models like GPT-5 do not believe in the existence of something, likely referring to a specific concept or fact.”

Ethan Mollick@emollick

So much work is going into faking continual learning and memory for AIs,

[LLM][Fine-tuning]

“DeepSeek Summary: Mollick criticizes the focus on simulating continual learning and memory in AI rather than achieving genuine capabilities.”

Ethan Mollick@emollick

Talking about the ethics of AI companies or personalities, or discussing the potential of

[Safety][Deployment]

“DeepSeek Summary: Mollick engages in discussions about AI ethics and the potential of AI technologies.”

Naomi Saphra@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

[LLM][Fine-tuning][Evaluation]

“DeepSeek Summary: Naomi Saphra describes her research focus on understanding and improving NLP model training, specifically how structures and mechanistic behaviors emerge.”

Naomi Saphra@NaomiSaphra

Naomi Saphra (@nsaphra). 237 likes. New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

[Safety][Evaluation][LLM]

“DeepSeek Summary: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.”

Naomi Saphra@NaomiSaphra

Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for

[Evaluation][Fine-tuning]

“DeepSeek Summary: Naomi Saphra shares an experience of receiving a desk reject after rebuttals due to a paper being submitted to arXiv less than 30 minutes late.”

Angela Zhou@angelamczhou

#throwback to the beginnings of a beautiful friendship =D @ansonmount @HellOnWheelsAMC

[Agent]

“DeepSeek Summary: Angela Zhou shares a throwback post about the start of a friendship, tagging @ansonmount and @HellOnWheelsAMC.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Evaluation]

“DeepSeek Summary: Ben Recht announces teaching a class on learning and control after a long hiatus.”

Ben Recht@beenwrekt

Building a theory of the architecture of organizing machines and people.

[Infra]

“DeepSeek Summary: He is working on a theory for organizing machines and people.”

Ben Recht@beenwrekt

Fully open machine learning requires not only GPU access but a community commitment to openness.

[Infra][Safety]

“DeepSeek Summary: He argues that open ML needs both GPU access and community commitment.”

BLOG

The distillation panic

‘Distillation attacks’ is a horrible term for what is happening right now.

By Nathan Lambert

“The post criticizes the term 'distillation attacks' as misleading and argues that the current trend of smaller models learning from larger ones is a natural and beneficial progression in AI development.”

-- END OF LOG --

[STATS] 64 items · Filter applied