Intelligence.Log

Wednesday, May 20, 2026

Extracted: 76 items. Sources: 38. Filter: Score >= 5.0

++ Daily.Brief ++

今日AI领域迎来重大动态：**Karpathy宣布加入Anthropic**（[#item-twitter-com-karpathy-status-2056753169888334312]），同时**Google发布Gemini 3.5**（[#item-blog-google-innovation-and-ai-models-and-research-gemini-mod]），并宣布进入Agentic Gemini时代（[#item-blog-google-innovation-and-ai-sundar-pichai-io-2026]）。研究方面，**ICRL框架**（[#item-arxiv-org-abs-2605-15224]）通过强化学习内化自我批评，而**CHI-Bench**（[#item-arxiv-org-abs-2605-16679]）评估AI代理在医疗工作流中的自动化能力。工具更新亮点包括**CLI-Anything**（[#item-github-com-HKUDS-CLI-Anything]）让软件支持Agent原生交互，以及**codegraph**（[#item-github-com-colbymchenry-codegraph]）实现预索引代码知识图谱。观点洞察指出，**Gemini 3.5 Flash**（[#item-techcrunch-com-2026-05-19-with-gemini-3-5-flash-google-bets-]）被押注为AI代理新浪潮，可为企业每年节省超10亿美元成本（[#item-venturebeat-com-technology-google-says-gemini-3-5-flash-can-]）。

> Headlines & Launches

9.6I’ve joined Anthropic

Karpathy宣布加入Anthropic，引发行业关注。

HN (1166)#anthropic #hiring #karpathy

9.5Gemini 3.5: frontier intelligence with action

Google发布Gemini 3.5，前沿智能与行动能力。

Google AI Blog#gemini #model-release #frontier[Model Release]

9.0I/O 2026: Welcome to the agentic Gemini era

Google I/O 2026主题演讲：进入Agentic Gemini时代。

Google AI Blog#gemini #agentic #google-io[Agent Harness]

9.0Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Google I/O发布Gemini 3.5 Flash，计划全面采用。

Simon Willison#gemini #model-release #google-io[Model Release]

9.0Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think. | VentureBeat

Google 25年来首次重新设计搜索框，集成AI生成答案。

venturebeat.com#search #ai-integration #google

8.5A new era for AI Search

Google搜索进入AI新时代，重大更新。

Google AI Blog#search #ai #google

8.5Google isn't releasing its next big AI model yet, drawing groans at its I/O conference - Business Insider

Google I/O宣布Gemini 3.5 Pro推迟发布，引发不满。

businessinsider.com#gemini #google-io #model-release[Model Release]

8.0Google Revamps Search for AI Era, Debuts Coding Tools - Bloomberg

Google为搜索、YouTube和Docs推出AI工具。

bloomberg.com#google #search #ai-tools

7.0OpenAI says it’s getting serious about AI detection and labeling

OpenAI加强AI检测与标注，扩展C2PA凭证。

theverge.com#openai #ai-detection #content-credentials

6.4Mistral AI acquires Emmi AI

Mistral AI收购Emmi AI，强化工业AI能力。

HN (163)#acquisition #mistral #industrial-ai

> Research & Innovation

7.5ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

ICRL：通过强化学习内化自我批评，提升智能体纠错能力。

ArXiv cs.AI#reinforcement-learning #self-critique #agent[Post-Training]

7.5The Scaling Laws of Skills in LLM Agent Systems

研究LLM智能体系统中技能的规模定律。

ArXiv cs.CL#scaling-laws #agent-skills #llm[Agent Harness]

7.5CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

CHI-Bench评估AI代理自动化医疗工作流的能力。

ArXiv cs.CL#benchmark #healthcare #ai-agent[Evals][Agent Harness]

7.5Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

KV cache量化基准测试，评估TurboQuant、TCQ等方法的性能。

Reddit r/LocalLLaMA#kv-cache #quantization #benchmark[Context Engineering]

7.0SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

SDOF通过状态约束调度减少多智能体编排中的对齐损失。

ArXiv cs.AI#multi-agent #orchestration #alignment[Agent Harness]

7.0SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

SkillSmith将智能体技能编译为边界引导的运行时接口。

ArXiv cs.AI#agent-skills #llm-agent #interface[Agent Harness]

7.0NOVA: Fundamental Limits of Knowledge Discovery Through AI

NOVA探讨AI通过迭代自我改进发现新知识的基本极限。

ArXiv cs.AI#knowledge-discovery #self-improvement #limits

7.0Nemotron-Labs-Diffusion from NVIDIA

NVIDIA发布Nemotron-Labs-Diffusion，支持AR和扩散并行解码。

Reddit r/LocalLLaMA#diffusion #decoding #nvidia[Model Release]

7.0I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token [R]

构建工具实时可视化GPT-2生成时的概念激活3D图，探索机械可解释性。

Reddit r/MachineLearning#mechanistic-interpretability #gpt-2 #visualization

6.8NVlabs/Sana

SANA：高效高分辨率图像合成，线性扩散Transformer。

GitHub trending:python (+575★)#image-generation #diffusion-transformer #efficient

6.5Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

实证研究：提升LLM心理理论能力是否真正改善人机交互。

ArXiv cs.AI#theory-of-mind #human-ai-interaction #llm

6.5CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

CAX-Agent：用于可靠APDL自动化的轻量级智能体框架。

ArXiv cs.AI#agent-harness #automation #simulation[Agent Harness]

6.5NIMO Controller: a self-driving laboratory orchestrator based on the Model Context Protocol

NIMO Controller：基于MCP协议的自驱动实验室编排器。

ArXiv cs.AI#mcp #self-driving-lab #orchestrator[Agent Harness]

6.5PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

PQR框架生成多样真实查询以引发QA智能体失败。

ArXiv cs.CL#qa-agent #evaluation #failure-detection[Evals]

6.5Backprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]

无反向传播的Pong游戏：分布Hebbian可塑性接近PPO性能。

Reddit r/MachineLearning#reinforcement-learning #hebbian-learning #bio-plausible

6.0Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

LLM在高风险决策中输出公平但内部存在潜在偏见。

ArXiv cs.AI#fairness #bias #llm

6.0Verifiable Agentic Infrastructure: Proof-Derived Authorization for Sovereign AI Systems

可验证智能体基础设施：基于证明的授权用于主权AI系统。

ArXiv cs.AI#authorization #agent-infrastructure #security

6.0SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

SKG-Eval：基于增量语义知识图谱的多轮对话评估。

ArXiv cs.CL#dialogue-evaluation #knowledge-graph #multi-turn[Evals]

6.0Language Acquisition Device in Large Language Models

研究LLM中的语言习得装置，探讨数据效率。

ArXiv cs.CL#llm #language-acquisition

6.0New SOTA 1B model? HRM-text

声称新SOTA 1B模型HRM-text，但基准测试结果存疑。

Reddit r/LocalLLaMA#small-model #benchmark[Model Release]

5.5DeepSlide: From Artifacts to Presentation Delivery

提出DeepSlide，用AI从工件生成演示文稿，优化幻灯片生成。

ArXiv cs.AI#ai-slides #presentation #generation

5.0Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Text

超越情感分类：文本情感强度评估的生成框架。

ArXiv cs.CL#emotion #nlp #generative

5.0Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

基于检索的多标签法律注释方法，减少幻觉。

ArXiv cs.CL#legal-ai #retrieval #annotation

> Engineering & Resources

8.5With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots

Google押注Gemini 3.5 Flash推动AI代理而非聊天机器人。

techcrunch.com#gemini #agents #google-io[Agent Harness]

8.3HKUDS/CLI-Anything

CLI-Anything：让所有软件支持Agent原生交互。

GitHub trending:all (+1038★)#agent-native #cli #tool-use[Tool Use]

8.3rohitg00/agentmemory

AI编码Agent持久记忆系统，基于真实基准测试。

GitHub trending:all (+1609★)#agent-memory #persistent-memory #coding-agents[Context Engineering]

8.3colbymchenry/codegraph

预索引代码知识图谱，减少token和工具调用，100%本地。

GitHub trending:all (+1850★)#code-knowledge-graph #claude-code #cursor[Context Engineering]

8.0Gemini Omni

Google发布Gemini Omni多模态模型。

HN (264)#multimodal #gemini #google-deepmind[Model Release]

8.0Anthropic adds self-hosted sandboxes and MCP tunnels to Claude Managed Agents - the-decoder.com

Anthropic为Claude Managed Agents添加自托管沙箱和MCP隧道。

the-decoder.com#anthropic #agents #mcp[Agent Harness]

8.0bytedance released an open source model that attempts to do just about anything with only 3b parameters

字节跳动发布3B参数统一多模态模型Lance。

Reddit r/LocalLLaMA#multimodal #open-source #small-model[Model Release]

7.9Imbad0202/academic-research-skills

Claude Code学术研究技能集：研究→写作→审阅→修订→定稿。

GitHub trending:all (+3164★)#ai-coding #academic #claude-code[Coding Agents]

7.9obra/superpowers

Agent技能框架与软件开发方法论。

GitHub trending:all (+1623★)#agent-framework #skills #methodology[Agent Harness]

7.9multica-ai/andrej-karpathy-skills

基于Karpathy观察的Claude Code行为改进配置。

GitHub trending:all (+1955★)#claude-code #coding-pitfalls #configuration[Coding Agents]

7.5The 13 biggest announcements at Google I/O 2026 - The Verge

Google I/O 2026 13大AI公告汇总。

theverge.com#google-io #gemini #ai-announcements

7.5Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year

Gemini 3.5 Flash可为企业AI每年节省超10亿美元。

venturebeat.com#gemini #enterprise #cost-saving

7.5Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

用4张RTX 2080 Ti本地运行DeepSeek-V4，实现255 tok/s预填充。

Reddit r/LocalLLaMA#deepseek #local-inference #quantization

7.5antirez/ds4

antirez发布DeepSeek 4 Flash本地推理引擎，支持Metal。

Co-Starred#deepseek #local-inference #metal[Model Release]

7.2Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Forge开源工具将8B模型在智能体任务上准确率从53%提升至99%。

HN (267)#agent #guardrails #open-source[Agent Harness][Tool Use]

7.1humanlayer/12-factor-agents

提出构建生产级 LLM 应用的12条原则。

GitHub trending:typescript (+736★)#llm #best-practices #production[Agent Harness]

7.0rtk-ai/rtk

CLI代理减少LLM token消耗60-90%，单Rust二进制。

GitHub trending:all (+704★)#token-efficiency #cli #rust[Context Engineering]

7.0How AI Mode is changing the way people search in the U.S.

AI模式在美国搜索中的使用情况一年回顾。

Google AI Blog#search #ai-mode #google

7.0New ways to create and get things done in Google Workspace

Google Workspace新增语音功能等AI更新。

Google AI Blog#workspace #voice #productivity

7.0Everything new in our Google AI subscriptions, fresh from I/O 2026

Google AI订阅服务更新，推出100美元套餐。

Google AI Blog#subscription #google-one #ai

7.0OlmoEarth v1.1: A more efficient family of models

OlmoEarth v1.1发布，更高效的模型系列。

Hugging Face#open-source #model #efficiency[Model Release]

7.0Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history

Google AI Edge Gallery更新，支持Gemma 4多token预测、Pixel TPU、MCP等。

Reddit r/LocalLLaMA#edge-ai #gemma #mobile[Model Release]

7.0huggingface/ml-intern

Hugging Face开源ML工程师项目，自动读论文、训练模型。

Co-Starred#open-source #ml-engineer #automation[Agent Harness]

6.7HKUDS/ViMax

Agentic视频生成：导演、编剧、制片、视频生成一体。

GitHub trending:python (+503★)#video-generation #agentic #multimodal

6.6msitarzewski/agency-agents

多Agent AI代理平台，包含前端、Reddit等专用Agent。

GitHub trending:all (+1120★)#multi-agent #agency #automation[Agent Harness]

6.5OpenAI Adopts Google's SynthID Watermark for AI Images with Verification Tool

OpenAI采用Google SynthID水印技术验证AI图像。

HN (192)#watermark #ai-images #provenance

6.5How to use Google’s new AI agents to go beyond your standard searches | TechCrunch

如何使用Google新AI代理进行高级搜索。

techcrunch.com#google #agents #search[Agent Harness]

6.5From teen hacker to Iron Dome researcher, this founder raised $28M to fight AI phishing | TechCrunch

创始人融资2800万美元，用AI对抗网络钓鱼。

techcrunch.com#ai-security #phishing #funding

6.5A tool I built to generate 3D objects with functional, articulated parts. It's on github, and is mostly LLM-agnostic.

开源工具用 LLM 生成带功能部件的3D物体。

Reddit r/LocalLLaMA#3d-generation #llm #open-source[Tool Use]

6.5Time to update llama.cpp to get som MTP improvements!

llama.cpp更新以支持多token预测（MTP）改进。

Reddit r/LocalLLaMA#llama.cpp #inference[Context Engineering]

6.0heygen-com/hyperframes

HeyGen 发布 Hyperframes，用 HTML 生成视频，面向 AI agent。

GitHub trending:typescript (+344★)#video-generation #html #agent[Tool Use]

6.0[AINews] How to land a job at a frontier lab (on Pretraining)

如何在前沿实验室获得工作（预训练方向）的讨论。

Latent Space#career #pretraining #advice

6.0Google I/O 2026 Live Blog: All the Gemini and Smart Glasses Updates as They Happen | WIRED

Google I/O 2026实时博客，涵盖Gemini和智能眼镜更新。

wired.com#google-io #gemini #live-blog

6.0Why founders are moving from prompting to systems | VentureBeat

创始人从提示工程转向构建AI系统。

venturebeat.com#ai-systems #prompt-engineering #startups

5.9Alishahryar1/free-claude-code

免费使用Claude Code的终端/VSCode/Discord工具。

GitHub trending:python (+563★)#claude-code #free #vscode[Coding Agents]

5.8anthropics/claude-plugins-official

Anthropic官方Claude Code插件目录。

GitHub trending:all (+171★)#claude-code #plugins #official[Coding Agents]

5.8unslothai/unsloth

Unsloth Studio：本地训练和运行开源模型的Web UI。

GitHub trending:python (+156★)#fine-tuning #web-ui #open-models[Post-Training]

5.5llm-gemini 0.32

llm-gemini 0.32发布，支持新模型。

Simon Willison#llm #gemini #cli

5.5What do you think about Tabular Foundation Models [D]

讨论表格基础模型（如TabPFN-3）的性能和前景。

Reddit r/MachineLearning#tabular-data #foundation-model

5.4Intro to TLA+ for the LLM Era: Prompt Your Way to Victory

介绍如何用LLM提示工程学习TLA+形式化验证。

HN (109)#llm #formal-verification #tla+

5.0Firefox expands ‘Shake to Summarize’ to Android. | The Verge

Firefox将“摇动总结”功能扩展到Android。

theverge.com#firefox #summarization #mobile

5.0got my first "rm -rf /" today

AI agent 执行危险命令的社区讨论。

Reddit r/LocalLLaMA#agent-safety #reddit[Agent Harness]

[STATS] 76 items · 38 sources · Score >= 5.0