Intelligence.Log

2026-05-19

Extracted: 55 items. Sources: GitHub, Bluesky, X, Blogs.

++ AI OVERVIEW ++

Today's trending signals point to a deepening focus on both practical security tooling and the evolution of AI post-training methods. On the security front, Simonw's star of the `andrew/pycon` repo highlights growing interest in auditing GitHub Actions security across Python packages, a timely concern as supply chain attacks become more sophisticated. Meanwhile, Nathan Lambert's thread on Bluesky is generating significant discussion around on-policy distillation, which he argues is becoming a permanent fixture across instruction tuning, RLHF, DPO, and RLVR—suggesting the field is converging on a core set of training techniques. This dual emphasis on hardening infrastructure and refining alignment methodologies underscores a maturing ecosystem where both safety and performance are being tackled head-on.

grep TOPIC=

grep SOURCE=

sort --by=

sapientinc/HRM-Text★ 0.4k▲ 7/10

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

Starred bylucidrains|[LLM][Fine-tuning]

“HRM-Text is a 1B parameter text generation model that introduces hierarchical reasoning and latent space reasoning to improve task completion. It offers a novel architecture that could enhance reasoning capabilities in LLMs.”

andrew/pycon★ 0.0k▲ 4/10

Data collection and analysis for a PyCon talk on GitHub Actions security across Python packages.

Starred bysimonw|[Infra]

“This repository provides data collection and analysis scripts for a PyCon talk on GitHub Actions security across Python packages. It offers insights into how GitHub Actions are used in the Python ecosystem and potential security risks.”

elcritch/sarcophagus★ 0.0k▲ 3/10

auth and other api helpers for mummy

Starred bylucidrains|[Infra]

“Sarcophagus provides authentication and API helper utilities for the Mummy web framework in Nim. It simplifies common backend tasks like user auth, session management, and request handling.”

BSKY

Nathan LambertMay 19, 12:48 AM

On-policy distillation is on track to be a lasting method in post-training. The list of areas would be: Instruction tuning (SFT/IFT) RLHF Direct Preference Optimization (DPO et al) RLVR On-policy Distillation (OPD) New classes of methods are rare! Excited to play.

❤️ 14 Likes|[Fine-tuning]

BSKY

Simon WillisonMay 19, 10:41 PM

My notes on Gemini 3.5 Flash - 3x the price of Gemini 3 Flash but Google are planning to use it for many of their own products simonwillison.net/2026/May/19/...

❤️ 47 Likes|[LLM][Deployment]

BSKY

Margaret MitchellMay 19, 08:09 PM

Against the constant pressure of *genAI, genAI, genAI*, I am really appreciating @ai2.bsky.social 's work on creating tools for critical needs -- like crop maps and forest loss analysis. They just did a nice release on @hf.co. huggingface.co/blog/allenai...

❤️ 41 Likes|

BSKY

Margaret MitchellMay 19, 07:59 PM

Gmail's automatically generated responses (which can appear whether or not you ask for them) cement human anchoring bias: The tendency for people to heavily rely on what they have already seen. The effects are insidious, subconsciously influencing what we believe.

❤️ 15 Likes|[Safety]

BSKY

Thomas DietterichMay 19, 07:42 PM

Yet another sobering post from @noahpinion.blogsky.venki.dev open.substack.com/pub/noahpini...

❤️ 3 Likes|[Evaluation][Safety]

BSKY

Ethan MollickMay 19, 09:06 PM

🚨Our paper is out in PNAS: we found classic human persuasion techniques worked on AIs in a "parahuman" way, making them agree to objectionable requests (increasing compliance from 35% to 51%) It worked on a range of major recent LLMs though newer models do resist more www.pnas.org/doi/10.1073/...

❤️ 36 Likes|[Safety]

BSKY

Ethan MollickMay 19, 06:07 PM

Also had some early access to Gemini 3.5 Flash. Very fast for a flash model and very capable, though not as powerful as a full frontier model. I added it to the gallery or procedurally generated one-shot towns (it made one error that it corrected): hg-20f7d1a3ce.netlify.app#gemini-3-5-f...

❤️ 33 Likes|[LLM][Evaluation]

BSKY

Ethan MollickMay 19, 05:52 PM

Gemini Omni is quite good at instruction following: "sea otter in a pilot's uniform explains why Spirit Airlines went bankrupt to a river otter who is distracted by their laptop while they are in a hot air balloon over NYC. in the next balloon over, william shakespeare fights a robot made of pizza"

❤️ 77 Likes|[LLM]

BSKY

Ethan MollickMay 19, 05:38 PM

Had early access to Gemini Omni: "a dramatic reading of Death by Water from the Wasteland by a man eating garlic bread while balanced on a unicycle on a small platform over a churning sea of tomato sauce in which, at the center, sites a meatball with bright blue eyes wearing a top hat"

❤️ 59 Likes|[Multi-modal]

BSKY

Emily M. BenderMay 19, 07:57 PM

Wow some terrible reporting about Google's latest horrible ideas about how to distort information access in the name of "convenience" (or something): techcrunch.com/2026/05/19/g... A short thread 🧵>>

❤️ 267 Likes|[Evaluation][Safety]

BSKY

Emily M. BenderMay 19, 05:04 PM

We gotta find the guy that did this!!

❤️ 84 Likes|

BSKY

angela zhouMay 19, 04:53 PM

Excited to share our paper! Due Process on Hold: A Queueing Framework for Improving Access in SNAP arxiv.org/abs/2605.15165 Millions of Americans interface with the social safety net via call centers that are too congested. In Holmes v. Knodell, bad operations = procedural due process violation.

❤️ 23 Likes|

Andrej Karpathy@karpathy

Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours.

[LLM]

“DeepSeek Summary: Karpathy used an LLM to refine a blog post argument over 4 hours.”

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around

[Safety]

“DeepSeek Summary: Karpathy notes a growing gap in understanding AI capability.”

Andrej Karpathy@karpathy

LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected. In any case they

[LLM]

“DeepSeek Summary: Karpathy observes LLMs are both smarter and dumber than expected.”

Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits

[Tooling]

“DeepSeek Summary: Karpathy feels behind as a programmer due to AI-driven refactoring.”

Simon Willison@simonw

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career because of power tools.

[LLM][Tooling]

“DeepSeek Summary: Analogizes LLMs in programming to power tools in carpentry, suggesting they augment rather than replace.”

Harrison Chase@hwchase17

I am not excited about visual workflow builders 1. Not simple enough for the average user

[Tooling][Evaluation]

“DeepSeek Summary: Harrison Chase expresses skepticism about visual workflow builders, citing lack of simplicity for average users.”

Harrison Chase@hwchase17

We launched LangSmith Agent Builder this week as a no-code way to build agents. A key part of Agent builder is it's memory system.

[Agent][Tooling]

“DeepSeek Summary: Announcement of LangSmith Agent Builder, a no-code agent builder with a focus on memory systems.”

Harrison Chase@hwchase17

In the hot path as the agent is running. The agent can decided to (or the user can prompt it to) update its memory as it is working on the core

[Agent][LLM]

“DeepSeek Summary: Describes how agents can update memory during execution, either autonomously or via user prompt.”

Jim Fan@DrJimFan

In this context, I define world modeling as predicting the next plausible world state (or a longer duration of states) conditioned on an action.

[Agent][Multi-modal]

“DeepSeek Summary: Jim Fan defines world modeling as predicting future world states given actions, a key concept in robotics and embodied AI.”

Jeremy Howard@jeremyphoward

Here's what I would prefer to see:

[LLM]

“DeepSeek Summary: Jeremy Howard expresses a preference for an unspecified topic.”

Jeremy Howard@jeremyphoward

hi, i'm a sole proprietor/founder in Austria and i earn many many multiples of what i'd earn as an employee, despite 'predatory income tax'. in fact, i opt out

[Agent]

“DeepSeek Summary: Jeremy Howard discusses his income as a sole proprietor in Austria, noting high earnings despite taxes.”

Soumith Chintala@soumithchintala

reading "AI News" (previously Smol Talk) is probably the highest-leverage 45 mins

[LLM]

“DeepSeek Summary: Recommends a newsletter called AI News as a high-leverage way to stay informed.”

Soumith Chintala@soumithchintala

MacStudio you ask? Apple Engineering's **actual** time spent on PyTorch support

[Infra][Deployment]

“DeepSeek Summary: Comments on Apple's engineering effort for PyTorch support on Mac Studio.”

Soumith Chintala@soumithchintala

Open LLMs need to get organized and co-ordinated about sharing human feedback.

[Fine-tuning][LLM]

“DeepSeek Summary: Advocates for coordination among open LLM projects to share human feedback data.”

Francois Chollet@fchollet

Folks who work in AI or software engineering feel like the world is changing exponential fast.

[Deployment]

“DeepSeek Summary: Chollet observes that AI/software engineers perceive rapid exponential change in the world.”

Francois Chollet@fchollet

The 3rd edition of my book Deep Learning with Python is being printed right now, and will be in bookstores within 2 weeks. The problem with Facebook is not *just* the loss of your privacy and the fact that it can be used as a totalitarian panopticon.

[Fine-tuning]

“DeepSeek Summary: Chollet announces the 3rd edition of his book and criticizes Facebook beyond privacy issues.”

Fei-Fei Li@drfeifei

We are beyond thrilled to congratulate Dr. Fei-Fei Li for being ranked #9 in the Top 100 Women in #AI by AI Magazine!

[Safety]

“DeepSeek Summary: Fei-Fei Li ranked #9 in Top 100 Women in AI.”

Max Woolf@minimaxir

LOL. Remove the code in the algorithm that boosts the tweets of Elon by elvodqa · Pull Request #160 ·... github.com.

[Deployment][Tooling]

“DeepSeek Summary: Max Woolf finds humor in a GitHub pull request that aims to remove code boosting Elon Musk's tweets.”

Max Woolf@minimaxir

me irl

“DeepSeek Summary: A short, relatable post expressing a personal sentiment.”

Phil Wang@lucidrains

Having a wonderful time hanging out with my uncle James Wong at the Chelsea Flower show!

“DeepSeek Summary: Phil Wang posts about spending time with his uncle James Wong at the Chelsea Flower Show.”

Sasha Rush@srush_io

today i woke up to a living version of a phd student's nightmare. a new paper in my inbox: a detailed reproduction of a paper i wrote

[Evaluation]

“DeepSeek Summary: Sasha Rush woke up to a detailed reproduction of his own paper, a common PhD student nightmare.”

Sasha Rush@srush_io

Some news: moving this fall from Harvard -> Cornell Tech. Sad to leave such an incredible ...

[Deployment]

“DeepSeek Summary: Sasha Rush announced his move from Harvard to Cornell Tech.”

Stas Bekman@stas00

I have been compiling LLM/VLM training logbooks/chronicles. This is the one of the best sources to

[LLM][Fine-tuning][Infra]

“DeepSeek Summary: Compiling logbooks/chronicles for LLM/VLM training, sharing a valuable resource.”

Stas Bekman@stas00

Thanks to an awesome contribution from @omarnomad The Machine Learning Engineering Open book now can

[Tooling][Infra]

“DeepSeek Summary: Announces a contribution to the Machine Learning Engineering Open Book.”

Stas Bekman@stas00

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should

[Infra][Fine-tuning]

“DeepSeek Summary: Encourages trying DeepSpeed ZeRO++ as it should be functional on master.”

Sayak Paul@sayakpaul

Had a nice time chatting about the state of diffusion models and some text-to-image data shenanigans at

[Multi-modal]

“DeepSeek Summary: Sayak Paul discussed diffusion models and text-to-image data issues.”

Sayak Paul@sayakpaul

Release notes: Release Diffusers 0.34.0: New Image and Video Models, Better torch.

[Deployment][Tooling]

“DeepSeek Summary: Announcement of Diffusers 0.34.0 release with new models and improvements.”

Philipp Schmid@philschmid

I read three technical reports from Moonshot AI's Kimi K2.5 paper, Cursor's Composer 2 report and blog post, and Chroma's Context-1 write-up

[LLM][RAG][Tooling]

“DeepSeek Summary: Philipp Schmid read three technical reports: Kimi K2.5, Cursor Composer 2, and Chroma Context-1.”

Philipp Schmid@philschmid

Random thought. We are going to be so much faster at creating and building.

[Agent][Infra]

“DeepSeek Summary: Philipp Schmid predicts accelerated creation and building speed.”

Ethan Mollick@emollick

From a pure "do good for the world" mission perspective, having the acting like a solid personalized tutor is one of the better uses of AI. If OpenAI cares about the mission of making the world a better place, tutoring should be an area of investment, not one to silently remove.

[LLM][Safety]

“DeepSeek Summary: AI as a personalized tutor is a high-impact use case; OpenAI should invest in tutoring.”

Ethan Mollick@emollick

In 1980, the philosopher John Searle proposed a thought experiment: a person locked in a room, manipulating Chinese characters according to a

[LLM]

“DeepSeek Summary: References Searle's Chinese Room argument to discuss AI understanding.”

Ethan Mollick@emollick

This is going to get even worse as people realize that careful tuning in their prompts can make AI writing seem not like AI writing to readers. We expect word counts to align, in some way, with thinking & value. Writing took effort. We are not mentally ready for the alternative.

[LLM][Evaluation]

“DeepSeek Summary: Prompt tuning can make AI writing indistinguishable from human writing, challenging our assumptions about effort and value.”

Naomi Saphra@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

[Evaluation]

“DeepSeek Summary: Saphra humorously comments on a space for scientific discourse with self-deprecating tone.”

Naomi Saphra@NaomiSaphra

Perfect cute light very short read for a break in a deadline crunch.

[LLM]

“DeepSeek Summary: Saphra recommends a short, light read for a break during intense work.”

Ben Recht@beenwrekt

For the first time in almost a decade, I'm teaching a class on learning and control.

[Deployment]

“DeepSeek Summary: Ben Recht announces teaching a class on learning and control after nearly a decade.”

Ben Recht@beenwrekt

Building a theory of the architecture of organizing machines and people.

[Agent]

“DeepSeek Summary: Recht discusses developing a theory for organizing machines and people.”

Ben Recht@beenwrekt

On unquantifiable costs and inherent tradeoffs in decision theory.

[Safety]

“DeepSeek Summary: Recht addresses unquantifiable costs and tradeoffs in decision theory.”

Ben Recht@beenwrekt

With more equations than usual, I explain how policy gradient gives you a framework to randomly search for

[Fine-tuning]

“DeepSeek Summary: Recht explains policy gradient as a framework for random search.”

BLOG

The last six months in LLMs in five minutes

<p>I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the <a href="https://tools.simonwillison.net/annotated-presentations">latest iteration</a> of my <a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/">annotated presentation...

By Simon Willison

“The post summarizes key developments in LLMs over the past six months, including the rise of multi-modal models, improved reasoning capabilities, and the increasing importance of evaluation frameworks. It highlights practical tools and techniques for working with LLMs, such as prompt engineering and fine-tuning.”

BLOG

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

<p>Today at Google I/O, Google <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">released Gemini 3.5 Flash</a>. This one skipped the <code>-preview</code> modifier and went straight to general availability, and Google appear to be using it for a whole lot...

By Simon Willison

“Google released Gemini 3.5 Flash directly to general availability, skipping the preview phase, and plans to integrate it across many products. Despite being more expensive, it offers improved performance and efficiency, making it a versatile model for various applications.”

-- END OF LOG --

[STATS] 55 items · Filter applied