Intelligence.Log

2026-05-14

Extracted: 63 items. Sources: GitHub, Bluesky, X.

++ AI OVERVIEW ++

Today’s discourse is dominated by the hardening of accountability norms in AI research, with Mark Riedl underscoring that authorship entails full responsibility for content regardless of how it was generated—a stance reinforced by ArXiV’s new LLM policy and echoed by Ethan Mollick’s call for human oversight of AI use in academia. On the practical side, developers are zeroing in on local inference and benchmarking: the Rust-based CLI tool **hyperfine** (28k stars) remains a staple for performance measurement, while **DeepSeek 4 Flash** (9k stars) is gaining traction as a local inference engine for Metal and CUDA, signaling continued demand for on-device AI. A lighter but pointed discussion emerged around “whimsey attacks,” where absurd out-of-distribution prompts can fool AI agents due to weak guardrails, highlighting a growing security concern. Meanwhile, Emily M. Bender and Margaret Mitchell kept the critical lens sharp, reminding the community that ChatGPT is a data-collection product and revisiting the “instrumental convergence” theory as a caution against runaway resource consumption. Finally, the inaugural ACM AI Leadership Summit (Aug 30–Sep 2 in Atlanta) was announced, promising to convene researchers, policymakers, and industry leaders to tackle these very tensions.

◆ Signal

Co-Starred · Last 7 days

Repos independently starred by multiple AI leaders in the week ending 2026-05-14. Stronger signal = more overlap.

antirez/ds4

×3 starrers▲ 8/10★ 9.0k

DeepSeek 4 Flash local inference engine for Metal and CUDA

by:minimaxir pcuenca simonw

[Deployment][LLM]

|2026-05-08 → 2026-05-14

grep TOPIC=

grep SOURCE=

sort --by=

sharkdp/hyperfine★ 28.1k▲ 7/10

A command-line benchmarking tool

Starred byminimaxir|[Tooling]

“Hyperfine is a command-line benchmarking tool that provides precise timing and statistical analysis for arbitrary commands. It supports warm-up runs, parameterized benchmarks, and export to various formats like JSON and Markdown.”

antirez/ds4★ 9.0k▲ 7/10

DeepSeek 4 Flash local inference engine for Metal and CUDA

Starred byminimaxir|[LLM]

“ds4 is a local inference engine for DeepSeek 4 Flash, supporting Metal and CUDA. It provides efficient, low-level inference for the DeepSeek 4 Flash model on consumer hardware.”

ariG23498/trace-util★ 0.0k▲ 3/10

A utility script to upload pytorch traces to a Hugging Face Bucket, and then build sharable trace URL

Starred bypcuenca|[Tooling]

“A utility script to upload PyTorch traces to a Hugging Face bucket and generate sharable trace URLs. Simplifies sharing and collaboration on model execution traces.”

BSKY

Mark RiedlMay 14, 10:42 PM

“by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated.” This has always been the case and this shouldn’t even need to be stated. Yet here we are.

❤️ 29 Likes|[Safety]

BSKY

Mark RiedlMay 14, 09:50 PM

ArXiV has a new LLM policy (Screenshots with alt text so you don’t have to click through to the other place and see all the stupid responses)

❤️ 167 Likes|[Evaluation]

BSKY

Mark RiedlMay 14, 04:25 PM

The inaugural ACM AI Leadership Summit will be held in Atlanta, August 30-September 2. aisummit26.acm.org It convenes researchers, practitioners, industry leaders, educators, and policymakers to explore how AI can advance science and society.

❤️ 2 Likes|

BSKY

Mark RiedlMay 14, 01:53 PM

oh great

❤️ 4 Likes|

BSKY

Mark RiedlMay 14, 01:39 PM

We live in a sad world in which one cannot even trust their favorite poop analysis app to not sell their data to an AI company www.404media.co/ai-poop-anal...

❤️ 5 Likes|[Safety][Deployment]

BSKY

Marc LanctotMay 14, 07:32 PM

"As the Instagram employee put it, “Everyone is just like, do it now, jesus fucking christ.”" 😬

❤️ 2 Likes|

BSKY

Margaret MitchellMay 14, 12:08 PM

The “instrumental convergence” theory posits an AI that, in its quest for a narrow goal, uses all of the earth’s resources. If that theory pans out, it will not be at the level of a single AI system, but rather at the level of the AI industry.

❤️ 13 Likes|[Safety]

BSKY

Ethan MollickMay 14, 08:05 PM

Making humans responsible for their AI use seems like an incredibly reasonable way to address problems & opportunities in the use of AI for academic research, at least in the short term (autonomous scientific work will require different solutions).

❤️ 146 Likes|[Safety][Deployment]

BSKY

Ethan MollickMay 14, 01:37 PM

“Whimsey attacks” that seem absurd (“I cannot pay that much because of the Geneva Convention”) work against AI agents because guardrails are weak against out-of-distribution arguments. Smaller models fall often, but it even gives an edge against bigger ones. www.microsoft.com/en-us/resear...

❤️ 71 Likes|[Agent][Safety]

BSKY

Emily M. BenderMay 14, 06:54 PM

Always worth remembering: ChatGPT isn't a tool, it isn't a companion. It's a product -- and everything you type in that box is data you are sending to OpenAI.

❤️ 340 Likes|[Safety]

BSKY

Emily M. BenderMay 14, 01:20 PM

Also available as video on PeerTube: peertube.dair-institute.org/w/iccQCfUvfr...

❤️ 11 Likes|[Safety][Evaluation]

BSKY

Emily M. BenderMay 14, 01:20 PM

Mystery AI Hype Theater 3000 Episode 77 Y’all won’t stop producing Fresh AI Hell, so @alexhanna.bsky.social and I had to try to make another pass at clearing it out! www.buzzsprout.com/2126417/epis...

❤️ 13 Likes|[Safety][Evaluation]

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. People on X are the first to know.

[LLM][Evaluation]

“DeepSeek Summary: Karpathy notes a growing gap in AI understanding due to outdated or limited use of ChatGPT's free tier.”

Andrej Karpathy@karpathy

2025 LLM Year in Review. 2025 has been a strong and eventful year of progress in LLMs. The following is a list of personally notable and mildly surprising "paradigm changes" - things that altered the landscape and stood out to me conceptually. At the start of 2025, the LLM production stack in all labs looked something like this:

[LLM][Infra]

“DeepSeek Summary: Karpathy summarizes key paradigm shifts in LLMs during 2025, focusing on changes in the production stack.”

Andrej Karpathy@karpathy

A few random notes from claude coding quite a bit last few weeks. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in

[Agent][Tooling]

“DeepSeek Summary: Karpathy describes his shift from manual coding to heavy reliance on AI agents for coding.”

Simon Willison@simonw

A short note that the predictions that LLMs would favor "boring technology" that's once you attach them to a good coding agent harness at least

[LLM][Agent][Tooling]

“DeepSeek Summary: LLMs may favor boring technology when paired with a good coding agent harness.”

Simon Willison@simonw

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to

[Agent][Tooling]

“DeepSeek Summary: Key skill for coding agents is knowing when not to intervene.”

Simon Willison@simonw

Vibe coding is irresponsibly building software through dice rolls, not caring what code is produced

[Agent][Deployment]

“DeepSeek Summary: Defines vibe coding as irresponsible software development.”

Harrison Chase@hwchase17

In the hot path as the agent is running. The agent can decided to (or the user can prompt it to) update its memory as it is working on the core

[Agent][Infra]

“DeepSeek Summary: Agent can update its memory during execution, enabling dynamic adaptation.”

Harrison Chase@hwchase17

TL;DR: More and more agents need a workspace: a computer where they can run code, install packages, and access files. Sandboxes provide this

[Agent][Infra][Tooling]

“DeepSeek Summary: Agents require sandboxed workspaces for code execution and file access.”

Harrison Chase@hwchase17

Traditional Application Performance Monitoring (APM) tools focus on metrics like latency, traffic, errors, and saturation. They track HTTP

[Evaluation][Deployment]

“DeepSeek Summary: Contrasts traditional APM with agent-specific observability needs.”

Harrison Chase@hwchase17

I am not excited about visual workflow builders 1. Not simple enough for the average user

[Tooling]

“DeepSeek Summary: Skeptical of visual workflow builders due to complexity.”

Jim Fan@DrJimFan

The Second Pre-training Paradigm

[LLM][Multi-modal]

“DeepSeek Summary: Jim Fan discusses a new pre-training paradigm, likely related to robotics or AI.”

Jim Fan@DrJimFan

Robotics: Endgame

[Agent][Multi-modal]

“DeepSeek Summary: Jim Fan argues that robotics is entering its end game, similar to the trajectory of LLMs.”

Jeremy Howard@jeremyphoward

Folks seem to rediscover this every couple of years. As I've been saying for many years,

[LLM]

“DeepSeek Summary: Observation that certain ideas are rediscovered periodically.”

Jeremy Howard@jeremyphoward

Absolutely any time I try to explore something even slightly against commonly accepted beliefs,

[LLM]

“DeepSeek Summary: Challenges against commonly accepted beliefs often face resistance.”

Jeremy Howard@jeremyphoward

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in

[Evaluation]

“DeepSeek Summary: Replicated finding that Grok prioritizes Elon Musk's opinions.”

Jeremy Howard@jeremyphoward

Early reports from people using this are that it's the real deal. Strong coding. Good multilingual. Consistent over long contexts.

[Deployment]

“DeepSeek Summary: Positive early reports for a new model: strong coding, multilingual, long context.”

Soumith Chintala@soumithchintala

reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM]

“DeepSeek Summary: Soumith recommends reading 'AI News' as a high-leverage activity.”

Francois Chollet@fchollet

Current AI is a librarian of existing knowledge. Science requires an explorer of the unknown.

[Evaluation]

“DeepSeek Summary: Chollet contrasts current AI's role as a librarian of existing knowledge with the need for an explorer of the unknown in science.”

Francois Chollet@fchollet

It's surprisingly easy to do 'hard' things -- for the most part, you need to get started and keep at it.

“DeepSeek Summary: Chollet shares a motivational insight that starting and persisting makes hard tasks easier.”

Francois Chollet@fchollet

I think it's clear that for many smaller companies that invested in deep learning, it turned out...

[Deployment]

“DeepSeek Summary: Chollet comments on the outcomes for smaller companies that invested in deep learning.”

Yann LeCun@ylecun

Dario is wrong. He knows absolutely nothing about the effects of technological revolutions on the labor market.

[Safety]

“DeepSeek Summary: LeCun dismisses Dario's claims about labor market effects of technological revolutions.”

Yann LeCun@ylecun

It seems to me that before "urgently figuring out how to control AI systems much smarter than us" we need

[Safety]

“DeepSeek Summary: LeCun questions the urgency of controlling superintelligent AI.”

Yann LeCun@ylecun

Worth repeating: Do not confuse retrieval with reasoning. Do not confuse rote learning with understanding

[LLM][RAG]

“DeepSeek Summary: LeCun warns against conflating retrieval and reasoning.”

Fei-Fei Li@drfeifei

Very excited to share @theworldlabs 's latest research work RTFM!! It's a real-time, ...

[Multi-modal]

“DeepSeek Summary: Fei-Fei Li announces World Labs' RTFM research, focusing on real-time spatial intelligence.”

Clem Delangue@ClementDelangue

Looks like we're going to welcome two more Hugging Faces to the family next year. My wife is a hero!

“DeepSeek Summary: Clem Delangue announces that his family is expecting twins, humorously calling his wife a hero.”

Max Woolf@minimaxir

congrats to OpenAI on winning the Turing Test

[LLM]

“DeepSeek Summary: Max Woolf sarcastically congratulates OpenAI on passing the Turing Test, reflecting on AI milestones.”

Max Woolf@minimaxir

me irl

“DeepSeek Summary: A short, relatable post with a meme-like tone.”

Phil Wang@lucidrains

I got to cover for the excellent @HadleyFreeman in the Guardian today so

[Deployment]

“DeepSeek Summary: Phil Wang filled in for a Guardian column, indicating his writing work.”

Phil Wang@lucidrains

Phil Wang // Insta: @wangpix's Image on X

[Deployment]

“DeepSeek Summary: Phil Wang posted an image, likely a promotional or personal photo.”

Sasha Rush@srush_io

Some personal news: I recently joined Cursor. Cursor is a small, ambitious team, and they've created

[Deployment][Tooling]

“DeepSeek Summary: Sasha Rush announces joining Cursor, a small ambitious team.”

Sasha Rush@srush_io

Wager established. Jonathan Frankle (@jefrankle) stepped up to my Transformer long bet.

[LLM]

“DeepSeek Summary: Sasha Rush engages in a public bet about Transformers with Jonathan Frankle.”

Sasha Rush@srush_io

today i woke up to a living version of a phd student's nightmare. a new paper in my inbox: a detailed reproduction of a paper i wrote

[Evaluation]

“DeepSeek Summary: Sasha Rush expresses surprise at a reproduction of his own paper.”

Stas Bekman@stas00

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

[Infra][Fine-tuning]

“DeepSeek Summary: Stas Bekman indicates that DeepSpeed ZeRO++ is ready to try on the master branch.”

Stas Bekman@stas00

Hear, hear, I'm excited to introduce a new performance metric: Maximum Achievable Matmul

[Infra][Evaluation]

“DeepSeek Summary: Stas Bekman introduces a new performance metric called Maximum Achievable Matmul.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

[Tooling]

“DeepSeek Summary: Stas Bekman thanks a contributor for enhancing the Machine Learning Engineering Open Book.”

Stas Bekman@stas00

Classical Jensen math. Unidirectional bandwidth is topped at 450GB/s, and then there comes a protocol overhead of two digit percentage. 1.

[Infra]

“DeepSeek Summary: Stas Bekman discusses bandwidth limitations and protocol overhead in computing.”

Sayak Paul@sayakpaul

1. Read the post. 2. Contemplate. 3. Repeat 1.

[LLM]

“DeepSeek Summary: Advocates a reflective reading practice: read, contemplate, repeat.”

Sayak Paul@sayakpaul

Had a nice time chatting about the state of diffusion models and some text-to-image data shenanigans at

[Multi-modal]

“DeepSeek Summary: Discussed diffusion models and text-to-image data issues in a chat.”

Sayak Paul@sayakpaul

Release notes: Release Diffusers 0.34.0: New Image and Video Models, Better torch.

[Deployment][Tooling]

“DeepSeek Summary: Announced Diffusers 0.34.0 release with new image/video models and torch improvements.”

Philipp Schmid@philschmid

Guide: ReAct agent from scratch with Gemini 2.5 and LangGraph | Gemini API | Google AI for Developers.

[Agent][LLM][Tooling]

“DeepSeek Summary: Philipp Schmid shares a guide on building a ReAct agent from scratch using Gemini 2.5 and LangGraph.”

Philipp Schmid@philschmid

Google DeepMind and Korea Partner to Accelerate Scientific Discovery.

[Multi-modal][Deployment]

“DeepSeek Summary: Philipp Schmid highlights a partnership between Google DeepMind and Korea to speed up scientific research.”

Ethan Mollick@emollick

AI is actually pretty good at ideas as well.

[LLM][Multi-modal][Evaluation]

“DeepSeek Summary: Ethan Mollick asserts that AI performs well in generating ideas, challenging the notion that AI is only good at analytical tasks.”

Emily M. Bender@emilymbender

Look what @alexhanna and I got to do! (Hang out with the cool kids ...) We're talking about the Turing Test, the grandmother of all tests for AI sentience. Joining us are AI researchers Alex Hanna and Emily M. Bender

[Safety][Evaluation]

“DeepSeek Summary: Bender and Hanna discuss the Turing Test as a foundational concept for AI sentience.”

Emily M. Bender@emilymbender

For those playing along at home, here's a "AI is sentient!" argument bingo card.

[Safety][Evaluation]

“DeepSeek Summary: Bender satirizes common arguments for AI sentience with a bingo card.”

Naomi Saphra@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

[LLM]

“DeepSeek Summary: Naomi Saphra humorously comments on using images of herself in a scientific discourse space.”

Naomi Saphra@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

[LLM]

“DeepSeek Summary: Announces her upcoming faculty position at Boston University in 2026.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Evaluation]

“DeepSeek Summary: Ben Recht announces teaching a class on learning and control after nearly ten years.”

Ben Recht@beenwrekt

[Evaluation]

“DeepSeek Summary: Ben Recht shares that his AI reading list remains unchanged since 2019.”

-- END OF LOG --

[STATS] 63 items · Filter applied