Intelligence.Log

Wednesday, May 13, 2026

Extracted: 54 items. Sources: 19. Filter: Score >= 5.0

++ Daily.Brief ++

**今日AI快报**：DeepMind衍生公司Isomorphic Labs融资21亿美元用于AI药物设计，同时OpenAI发布网络安全模型与Anthropic竞争。研究方面，TabPFN-3发布支持百万行的预训练表格模型，Auto-Rubric方法实现从隐式偏好学习显式标准。工具更新中，Thinking Machines发布原生交互模型TML-Interaction-Small，实现实时语音SOTA。观点洞察显示，有开发者成功在Game Boy Color上运行Transformer模型，而AI紧张局势正影响特朗普与习近平的会晤。

> Headlines & Launches

9.0DeepMind Spinout Isomorphic Labs Raises $2.1 Billion to Design Drugs With AI - Bloomberg

DeepMind衍生公司Isomorphic Labs融资21亿美元用于AI药物设计。

bloomberg.com#funding #drug-discovery #deepmind

8.0OpenAI launches cybersecuity model to rival Anthropic’s Mythos | Semafor

OpenAI发布网络安全模型，与Anthropic的Mythos竞争。

semafor.com#cybersecurity #model-release #openai[Model Release]

7.5SAP Invests in AI Startup N8n, Doubling Valuation to $5.2 Billion - Bloomberg

SAP投资AI自动化初创公司n8n，估值翻倍至52亿美元。

bloomberg.com#investment #automation #enterprise

7.0AI Dictation Startup Wispr in Funding Talks at $2 Billion Value - Bloomberg

AI听写初创公司Wispr融资谈判估值达20亿美元。

bloomberg.com#voice-recognition #funding #startup

6.5AI Life Science Firm Metis TechBio Set for HK Debut After IPO - Bloomberg

AI生命科学公司Metis TechBio将在香港IPO。

bloomberg.com#ipo #biotech #investment

6.0Waymo Recalls 3,791 Robotaxis to Fix Software After Flooded Road Incident - Bloomberg

Waymo因洪水事件召回3791辆Robotaxi修复软件。

bloomberg.com#autonomous-driving #safety #recall

> Research & Innovation

8.5TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]

TabPFN-3发布：预训练表格基础模型，支持百万行

Reddit r/MachineLearning#tabular-data #foundation-model #pretrained[Model Release]

7.5Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

提出Auto-Rubric方法，从隐式偏好学习显式多模态生成标准。

ArXiv cs.AI#multimodal #reward-modeling #alignment[Post-Training]

7.5MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

将Q学习集成到基于溯源DAG的自进化记忆智能体中。

ArXiv cs.AI#llm-agents #memory #reinforcement-learning[Agent Harness]

7.5Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

对33个前沿LLM进行领域级元认知监控分析。

ArXiv cs.CL#llm #metacognition #benchmark[Evals]

7.5Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

提出分组技能检索方法用于智能体技能库。

ArXiv cs.CL#agent #skill-retrieval #llm[Agent Harness]

7.0Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

研究VLM中可靠性的注意力、隐藏状态和因果电路机制。

ArXiv cs.AI#vlm #mechanistic-interpretability #reliability

7.0On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective

从自由能视角区分后训练中的能力激发与能力创造。

ArXiv cs.AI#post-training #sft #free-energy[Post-Training]

7.0SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

提出自适应多粒度技能复用方法，降低LLM智能体成本。

ArXiv cs.AI#llm-agents #skill-library #cost-efficiency[Agent Harness]

7.0CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

提出共进化组合DAG方法增强工具增强型智能体。

ArXiv cs.AI#tool-augmented #dag #agents[Tool Use]

7.0Belief or Circuitry? Causal Evidence for In-Context Graph Learning

通过因果证据研究LLM上下文学习中的图结构推断。

ArXiv cs.AI#llm #in-context-learning #causality

7.0Can LLMs Take Retrieved Information with a Grain of Salt?

研究LLM在检索增强中能否批判性对待检索信息。

ArXiv cs.CL#rag #llm #reasoning[Context Engineering]

6.5Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

提出基于网格的空间启动方法提升LLM图表数据提取准确性。

ArXiv cs.AI#llm #chart-extraction #spatial-priming

6.5VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

提出富有表现力的口语语言模型，用于角色扮演和唱歌。

ArXiv cs.CL#spoken-language-model #role-playing #singing

6.5IntentGrasp: A Comprehensive Benchmark for Intent Understanding

提出意图理解综合基准IntentGrasp。

ArXiv cs.CL#benchmark #intent-understanding[Evals]

6.5MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

提出多任务均衡学习检测器用于AI生成文本识别。

ArXiv cs.CL#ai-detection #llm #benchmark[Evals]

6.4Reimagining the mouse pointer for the AI era

DeepMind重新构想AI时代的鼠标指针。

HN (134)#ai #hci #deepmind

6.0Embeddings for Preferences, Not Semantics

提出面向偏好的嵌入方法，用于集体决策。

ArXiv cs.AI#embeddings #preferences #collective-decision

6.0PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams

提出多阶段框架优化人机协作成本效益。

ArXiv cs.AI#human-ai-collaboration #cost-efficiency

6.0MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

提出多模态交互式语音工具调用对话助手用于智能家居。

ArXiv cs.CL#smart-home #voice-assistant #tool-calling[Tool Use]

6.0Reflections and New Directions for Human-Centered Large Language Models

探讨以人为中心的LLM研究方向与未来展望。

ArXiv cs.CL#llm #human-centered #survey

6.0I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

发现Transformer中MLP与注意力谱范数比值可预测几何稳定性。

Reddit r/MachineLearning#transformer #stability #spectral-analysis

5.5MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media

提出孟加拉社交媒体标注中指令诱导标签崩溃的基准。

ArXiv cs.CL#llm #benchmark #social-media[Evals]

5.0TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP

构建塔吉克-波斯语词汇资源及跨脚本低资源NLP混合模型。

ArXiv cs.CL#low-resource #nlp #lexical-resource

> Engineering & Resources

8.3rohitg00/agentmemory

AI编码Agent持久记忆库，基于基准测试

GitHub trending:all (+1048★)#ai-agents #memory #coding[Context Engineering][Coding Agents]

8.0[AINews] Thinking Machines' Native Interaction Models - TML-Interaction-Small 276B-A12B - advances SOTA Realtime Voice and kills standard VAD

Thinking Machines发布原生交互模型TML-Interaction-Small，实时语音SOTA。

Latent Space#multimodal #voice #model-release[Model Release]

8.0I got a real transformer language model running locally on a stock Game Boy Color!

在Game Boy Color上本地运行真实Transformer语言模型。

Reddit r/LocalLLaMA#edge-ai #transformer #game-boy

7.9mattpocock/skills

工程师技能集，来自Claude配置目录

GitHub trending:all (+3867★)#skills #claude #developer-tools[Coding Agents]

7.6Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

开源26M参数工具调用模型Needle，速度极快。

HN (272)#tool-use #open-source #small-model[Tool Use]

7.5Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP

Luce DFlash/PFlash支持AMD Strix Halo，推理加速

Reddit r/LocalLLaMA#inference #amd #optimization

7.5huggingface/ml-intern

Hugging Face开源ML工程师项目，可读论文、训练模型。

Co-Starred#open-source #ml-engineer #huggingface[Agent Harness]

7.5tinyhumansai/openhuman

开源个人AI超级智能，注重隐私和强大

GitHub trending:all (+1014★)#open-source #personal-ai[Model Release]

7.0AI tensions loom over Trump-Xi meeting | Semafor

AI紧张局势笼罩特朗普与习近平的会晤。

semafor.com#geopolitics #policy

7.0MagicQuant (v2.0) - Hybrid Mixed GGUF Models + Unsloth Dynamic Learned Quant Configurations + Benchmark table with collapsed winners and more

MagicQuant v2.0发布：混合GGUF量化配置

Reddit r/LocalLLaMA#quantization #gguf #open-source

7.0examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

llama.cpp新增llama-eval评估工具

Reddit r/LocalLLaMA#llm #evaluation #open-source[Evals]

7.0antirez/ds4

antirez发布DeepSeek 4 Flash本地推理引擎，支持Metal。

Co-Starred#deepseek #local-inference #metal[Model Release]

6.5How open model ecosystems compound

分析中国高参与、开放优先的AI生态系统如何复合增长。

Interconnects#open-source #china #ecosystem

6.5I created a minimal one-file implementations (160loc) of JEPA family (ijepa, vjepa, vjepa2, cjepa) for educational purposes [P]

JEPA系列算法的最小化单文件实现，用于教育目的。

Reddit r/MachineLearning#jepa #self-supervised-learning #implementation

6.3millionco/react-doctor

检测React代码问题的AI工具

GitHub trending:all (+788★)#react #code-quality #ai-tool[Coding Agents]

6.0AI Companies Need More Than Nvidia Chips to Power Their Massive Data Centers - Bloomberg

AI公司数据中心需要更多非英伟达芯片的组件。

bloomberg.com#infrastructure #hardware #datacenter

6.0AI spending likely higher than suggested | Semafor

分析师称AI实际支出可能高于公开数据。

semafor.com#industry-analysis #spending

6.0Let's build claude code from scratch!

教程视频：从零构建Claude Code

Reddit r/LocalLLaMA#ai-coding #tutorial[Coding Agents]

6.0Follow-up on the TranslateGemma subtitle benchmark: human review of segments rated "clean" by MetricX-24 and COMETKiwi [D]

对TranslateGemma字幕基准测试中评分结果的人工复查。

Reddit r/MachineLearning#translation #benchmark #llm[Evals]

5.7Show HN: Statewright – Visual state machines that make AI agents reliable

Statewright：用可视化状态机让AI代理更可靠。

HN (72)#ai-agents #state-machines #reliability[Agent Harness]

5.3Show HN: Agentic interface for mainframes and COBOL

Hypercubic为大型机和COBOL提供AI代理界面。

HN (54)#ai-agents #cobol #mainframe[Agent Harness]

5.2Launch HN: Voker (YC S24) – Analytics for AI Agents

AI Agent分析平台，用于产品团队追踪

HN (37)#analytics #ai-agents[Agent Harness]

5.0Interaction Models from Thinking Machines Lab [P]

Thinking Machines Lab发布交互模型。

Reddit r/MachineLearning#interaction-model #thinking-machines-lab

5.0Cache-testing software for LLM-provider-style tiered ephemeral caches? [D]

寻找LLM提供商风格的分层临时缓存测试软件。

Reddit r/MachineLearning#cache #llm #benchmark[Context Engineering]

[STATS] 54 items · 19 sources · Score >= 5.0