Intelligence.Log

Friday, May 15, 2026

Extracted: 61 items. Sources: 31. Filter: Score >= 5.0

++ Daily.Brief ++

**AI 今日快报** Anthropic与盖茨基金会达成2亿美元合作，聚焦AI在健康与教育领域的应用[#item-anthropic-com-news-gates-foundation-partnership]；Google被曝将在I/O大会发布新Gemini模型，但非前沿版本[#item-sources-news-p-google-about-to-release-new-gemini]。研究方面，新工具BenchJack系统审计AI智能体基准测试漏洞[#item-arxiv-org-abs-2605-12673]，并揭示视觉语言模型的可解释失败模式[#item-arxiv-org-abs-2605-12674]。工具更新上，OpenAI将编程助手Codex引入ChatGPT手机应用[#item-axios-com-2026-05-14-openai-brings-codex-to-your-phone]，Nous Research发布与用户共同成长的Agent[#item-github-com-NousResearch-hermes-agent]。观点洞察指出，AI笔记工具在医疗场景中频繁出现基本事实错误[#item-theregister-com-ai-ml-2026-05-14-ontario-auditors-find-docto]，而Abridge正通过AI将医患对话转化为高效医疗操作系统[#item-latent-space-p-abridge]。

> Headlines & Launches

9.0Anthropic forms $200 million partnership with the Gates Foundation

Anthropic与盖茨基金会达成2亿美元AI合作。

anthropic.com#partnership #philanthropy #anthropic

8.5Anthropic, Gates Foundation launch $200 million partnership for AI in health, education - Reuters

Anthropic与盖茨基金会合作，投入2亿美元用于AI健康与教育。

reuters.com#anthropic #gates-foundation #partnership

8.0Google is about to release a new Gemini model - Sources | Alex Heath

消息称Google将在I/O发布新Gemini模型，但非前沿模型。

sources.news#gemini #google #model-release[Model Release]

8.0Microsoft starts canceling Claude Code licenses

微软开始取消Claude Code许可证。

theverge.com#microsoft #claude #licensing[Coding Agents]

7.0Synthetic Raises $10M Seed Led by Khosla Ventures | VentureBeat

Synthetic获1000万美元种子轮融资，用于AI记账服务。

venturebeat.com#funding #fintech #ai-agent

> Research & Innovation

8.0Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

系统审计AI智能体基准测试，揭示漏洞并提出BenchJack工具。

ArXiv cs.AI#benchmark #agent #auditing[Evals]

7.5Revealing Interpretable Failure Modes of VLMs

揭示视觉语言模型的可解释失败模式，提升安全性。

ArXiv cs.AI#vlm #interpretability #safety[Evals]

7.5Correct Answers from Sound Reasoning: Verifiable Process Supervision for Language Models

提出可验证过程监督方法，训练语言模型生成正确答案和合理推理。

ArXiv cs.CL#reasoning #supervision #verification[Post-Training]

7.5I trained Qwen3.5 to jailbreak itself with RL, then used the failures to improve its defenses

用RL训练Qwen3.5自我越狱并利用失败改进防御。

Reddit r/LocalLLaMA#rl #jailbreak #red-teaming[Post-Training]

7.5Continual Harness: Online Adaptation for Self-Improving Foundation Agents [R]

提出持续在线适应框架用于自我改进的基础智能体。

Reddit r/MachineLearning#online-adaptation #foundation-agents #continual-learning[Agent Harness]

7.0Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

提出验证器引导的动作选择方法，用于具身智能体任务。

ArXiv cs.AI#embodied-agent #action-selection #verifier[Planning]

7.0DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

提出DisaBench框架，评估语言模型对残疾人群体的危害。

ArXiv cs.AI#benchmark #safety #disability[Evals]

7.0CHAL: Council of Hierarchical Agentic Language

提出分层智能体语言委员会，通过多智能体辩论提升推理。

ArXiv cs.AI#multi-agent #debate #reasoning[Agent Harness]

7.0ToolWeave: Structured Synthesis of Complex Multi-Turn Tool-Calling Dialogues

提出ToolWeave，结构化合成复杂多轮工具调用对话。

ArXiv cs.CL#tool-calling #multi-turn #dialogue-synthesis[Tool Use]

7.0Follow the Mean: Reference-Guided Flow Matching [R]

提出参考引导的流匹配方法用于生成建模。

Reddit r/MachineLearning#flow-matching #generative-modeling

6.5Macro-Action Based Multi-Agent Instruction Following through Value Cancellation

提出基于宏动作的多智能体指令跟随方法，通过价值取消实现。

ArXiv cs.AI#multi-agent #reinforcement-learning #instruction-following[Agent Harness]

6.5Learning Transferable Latent User Preferences for Human-Aligned Decision Making

学习可迁移的潜在用户偏好，用于人机对齐决策。

ArXiv cs.AI#llm #alignment #preference-learning[Post-Training]

6.5State-Centric Decision Process

提出以状态为中心的决策过程，用于语言环境中的智能体。

ArXiv cs.AI#decision-making #state-centric #agent[Planning]

6.5Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation

通过共识驱动的偏好优化缓解LLM跨语言文化不一致。

ArXiv cs.CL#llm #multilingual #alignment[Post-Training]

6.5TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models

提出TimelineReasoner，利用大型推理模型推进时间线摘要。

ArXiv cs.CL#reasoning #summarization #timeline[Planning]

6.5BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

提出BoostTaxo，用boosting式agent推理进行零样本分类体系归纳。

ArXiv cs.CL#taxonomy-induction #zero-shot #agentic-reasoning[Agent Harness]

6.0BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

提出BEHAVE混合AI框架，实时建模集体人类动态。

ArXiv cs.AI#human-dynamics #simulation #ai-framework

6.0Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models

改进视觉语言模型在纯文本输入下的校准，弥补缺失模态。

ArXiv cs.CL#vlm #calibration #text-only

6.0Differences in Text Generated by Diffusion and Autoregressive Language Models

比较扩散语言模型与自回归语言模型生成文本的差异。

ArXiv cs.CL#diffusion-lm #autoregressive #text-generation

6.0In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores

提出LLM公平性应通过情境对话行为而非标准化测试评估。

ArXiv cs.CL#fairness #evaluation #llm[Evals]

6.0A First Comprehensive Study of TurboQuant: Accuracy and Performance

TurboQuant量化方法首次全面研究，FP8 KV缓存量化最佳。

Reddit r/LocalLLaMA#quantization #kv-cache #fp8[Context Engineering]

5.5Domain Adaptation of Large Language Models for Polymer-Composite Additive Manufacturing Using Retrieval-Augmented Generation and Fine-Tuning

使用RAG和微调实现LLM在聚合物复合材料增材制造中的领域适应。

ArXiv cs.CL#rag #fine-tuning #domain-adaptation

5.0AI-supported chatbots as triggers of potential communication crises

研究AI聊天机器人如何引发潜在沟通危机。

nature.com#chatbot #communication #user-experience

5.0Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct

Qwen3.6 27B INT8量化配方使模型思考更少但正确。

Reddit r/LocalLLaMA#qwen #quantization #reasoning[Planning]

> Engineering & Resources

8.7NousResearch/hermes-agent

Nous Research发布hermes-agent，与用户共同成长的Agent。

GitHub trending:python (+1728★)#agent #open-source #nous-research[Agent Harness]

8.5You can access Codex on your phone now - Axios

OpenAI将AI编程助手Codex引入ChatGPT手机应用。

axios.com#ai-coding #mobile #openai[Coding Agents]

8.3tinyhumansai/openhuman

开源个人AI超级智能项目，注重隐私和简洁。

GitHub trending:all (+3329★)#open-source #personal-ai #privacy[Model Release]

8.3rohitg00/agentmemory

为AI编程代理提供持久记忆的库，基于基准测试。

GitHub trending:all (+1879★)#ai-coding #memory #agent[Coding Agents][Context Engineering]

8.3obra/superpowers

代理技能框架和软件开发方法论，可工作。

GitHub trending:all (+1780★)#agent-framework #skills #methodology[Agent Harness]

8.3mattpocock/skills

Matt Pocock分享的Claude Code技能集，面向真实工程师。

GitHub trending:all (+2987★)#claude-code #ai-coding #developer-tools[Coding Agents]

8.0Claude Code's '/goals' separates the agent that works from the one that decides it's done | VentureBeat

Claude Code新增'/goals'功能，分离工作与决策。

venturebeat.com#ai-coding #claude #agent[Coding Agents]

8.0inclusionAI/Ring-2.6-1T · Hugging Face

Ring-2.6-1T万亿参数推理模型发布。

Reddit r/LocalLLaMA#ring #reasoning #large-model[Model Release][Planning]

8.0huggingface/ml-intern

Hugging Face发布ml-intern：开源ML工程师，自动读论文、训练模型。

Co-Starred#open-source #ml-engineer #automation[Coding Agents]

7.7garrytan/gstack

Garry Tan的Claude Code配置，包含23个CEO/设计/工程等工具。

GitHub trending:all (+915★)#claude-code #ai-coding #developer-tools[Coding Agents]

7.5Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

IBM发布Granite多语言嵌入模型R2，支持32K上下文，Apache 2.0许可。

Hugging Face#embedding #multilingual #open-source[Model Release]

7.5Scenema Audio: Zero-shot expressive voice cloning and speech generation

Scenema Audio发布零样本语音克隆和语音生成模型及推理代码。

Reddit r/LocalLLaMA#voice-cloning #speech-generation #open-source

7.0NVFP4 Kimi2.6 and Kimi 2.5 released by Nvidia

NVIDIA发布Kimi2.6和Kimi2.5的NVFP4量化版本。

Reddit r/LocalLLaMA#kimi #quantization #nvidia[Model Release]

7.0Multi-Token Prediction (MTP) for Qwen on LLaMA.cpp + TurboQuant

在LLaMA.cpp上为Qwen实现多令牌预测，性能提升40%。

Reddit r/LocalLLaMA#multi-token-prediction #llama.cpp #qwen[Context Engineering]

7.0antirez/ds4

antirez/ds4：DeepSeek 4 Flash本地推理引擎，支持Metal。

Co-Starred#deepseek #local-inference #metal[Model Release]

6.5AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

Abridge如何将医患对话转化为医疗操作系统，节省大量时间。

Latent Space#healthcare #ai-natives #conversational-ai

6.5Lovable just backed a company that's looking to bring vibe coding to hardware | TechCrunch

Lovable投资将vibe coding引入硬件的公司。

techcrunch.com#hardware #vibe-coding #startup

6.5Codex is now in the ChatGPT mobile app

OpenAI的Codex现已集成到ChatGPT移动应用中。

HN (168)#codex #chatgpt #ai-coding[Coding Agents]

6.2Ontario auditors find doctors' AI note takers routinely blow basic facts

安大略审计发现医生使用的AI笔记工具经常出现基本事实错误。

HN (96)#healthcare #ai-notes #audit

6.0cline/cline

Cline作为自主编码Agent的SDK/IDE扩展/CLI助手发布。

GitHub trending:typescript (+63★)#coding-agent #open-source #sdk[Coding Agents]

6.0[AINews] Codex Rises, Claude Meters Programmatic Usage

AI新闻：Codex崛起，Claude计量程序化使用。

Latent Space#coding-agents #usage-metering[Coding Agents]

6.0What happens when AI starts building itself?

探讨AI开始自我构建时的潜在影响。

techcrunch.com#ai-safety #autonomy #future

6.0RDNA3 Flash Attention fix just dropped by llama.cpp b9158

llama.cpp b9158修复RDNA3 Flash Attention问题。

Reddit r/LocalLLaMA#llama.cpp #flash-attention #amd

5.9New arXiv policy: 1-year ban for hallucinated references

arXiv新政策：对虚构参考文献的作者实施一年禁令。

HN (272)#arxiv #policy #hallucination

5.7millionco/react-doctor

React Doctor：AI Agent检测不良React代码。

GitHub trending:typescript (+426★)#react #ai-coding #code-quality[Coding Agents]

5.7Imbad0202/academic-research-skills

Claude Code学术研究技能：研究、写作、审阅、修订、定稿。

GitHub trending:python (+424★)#academic-research #claude-code #agent-skills

5.7CodebuffAI/codebuff

Codebuff从终端生成代码的AI工具发布。

GitHub trending:typescript (+129★)#coding-agent #cli #code-generation[Coding Agents]

5.7OthmanAdi/planning-with-files

Claude Code技能实现Manus风格持久化Markdown规划。

GitHub trending:python (+124★)#planning #claude-code #agent-skills[Planning]

5.5VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things)

VS Code新Agents窗口支持本地AI模型但需联网和Copilot订阅。

Reddit r/LocalLLaMA#vscode #coding-agent #local-llm[Coding Agents]

5.3wanshuiyin/Auto-claude-code-research-in-sleep

轻量级Markdown技能，用于自主ML研究：跨模型评审、想法发现。

GitHub trending:python (+138★)#ml-research #autonomous #claude-code

5.0[AINews] Everything is Conductor

AI新闻汇总：一切皆Conductor，强调小型趋势。

Latent Space#news-roundup #trends

[STATS] 61 items · 31 sources · Score >= 5.0