Intelligence.Log

Thursday, May 14, 2026

Extracted: 69 items. Sources: 33. Filter: Score >= 5.0

++ Daily.Brief ++

今日AI领域动态：微软MDASH AI系统发现16个Windows漏洞并修复；研究方面，OLIVIA与PIVOT分别提出LLM智能体在线学习与规划执行桥接新框架；工具方面，Notion发布新开发者平台将工作空间转为AI代理中心，26M参数工具调用模型Needle开源；观点指出AI聊天机器人泄露用户电话号码的安全隐患，且Anthropic企业客户数已超过OpenAI。

> Headlines & Launches

7.0Microsoft's MDASH AI System Finds 16 Windows Flaws Fixed in Patch Tuesday - The Hacker News

微软MDASH AI系统发现16个Windows漏洞并修复。

thehackernews.com#ai-security #vulnerability-discovery #microsoft

> Research & Innovation

7.5OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

提出OLIVIA，在推理时通过动作适应实现LLM ReAct智能体在线学习。

ArXiv cs.AI#llm-agent #react #online-learning[Planning][Tool Use]

7.5PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement

提出PIVOT框架，通过轨迹细化桥接LLM智能体的规划与执行。

ArXiv cs.AI#llm-agent #planning #execution[Planning]

7.5ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

提出通过时间视觉冗余减少来扩展计算机使用智能体。

ArXiv cs.CL#computer-use #agent #visual-redundancy[Agent Harness]

7.5Efficient pretraining with token superposition by Nous Research

Nous Research 提出 token superposition 高效预训练方法。

Reddit r/LocalLLaMA#pretraining #efficiency #research[Post-Training]

7.0EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

提出多智能体系统测试时协同进化框架，涵盖个体、团队和群体尺度。

ArXiv cs.AI#multi-agent #evolution #test-time[Agent Harness]

7.0The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

分析在线策略蒸馏的陷阱、机制与修复，对后训练有重要启示。

ArXiv cs.AI#distillation #post-training #llm[Post-Training]

7.0HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

发布希伯来语专用开源MoE语言模型Hebatron。

ArXiv cs.CL#llm #hebrew #mixture-of-experts[Model Release]

7.0Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]

提出让LLM持续适应新任务的快速与慢速学习方法。

Reddit r/MachineLearning#llm #continual-learning #adaptation[Post-Training]

6.5Don't Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs

发现图像中嵌入的数字锚点会系统性偏置VLM质量判断，揭示视觉锚定偏差。

ArXiv cs.AI#vlm #bias #visual-anchoring

6.5Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

发现校准是LLM多样性的瓶颈，采样越多多样性反而下降。

ArXiv cs.CL#llm #diversity #calibration

6.5How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

系统评估差分隐私如何影响LLM中的社会偏见。

ArXiv cs.CL#differential-privacy #bias #llm

6.5ReAD: Reinforcement-Guided Capability Distillation for Large Language Models

提出强化引导的能力蒸馏方法用于大语言模型。

ArXiv cs.CL#llm #knowledge-distillation #reinforcement-learning[Post-Training]

6.5Elastic Attention Cores for Scalable Vision Transformers [R]

提出弹性注意力核用于可扩展视觉Transformer。

Reddit r/MachineLearning#vision-transformer #attention #scalable

6.0RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

提出RankQ方法，通过自监督动作排序实现离线到在线强化学习。

ArXiv cs.AI#reinforcement-learning #offline-to-online #self-supervised

6.0Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

评估VLM在点击解谜游戏中的类人逻辑推理能力。

ArXiv cs.AI#vlm #reasoning #interactive[Planning]

6.0ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

提出ClinicalBench，压力测试跨入院临床QA的断言感知检索。

ArXiv cs.CL#benchmark #clinical #retrieval[Evals]

6.0The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models

提出双院制模型，在并行语言模型间实现双向隐藏状态耦合。

ArXiv cs.CL#llm #multi-model #coupling

6.0Instructions shape Production of Language, not Processing

研究指令如何影响语言模型的生成机制而非处理机制。

ArXiv cs.CL#llm #cognitive-science #instruction-following

6.0Trained transformer-based chess models to play like humans (including thinking time) [P]

训练基于Transformer的国际象棋模型模拟人类棋风。

Reddit r/MachineLearning#transformer #chess #human-like

5.5A Cascaded Generative Approach for e-Commerce Recommendations

提出级联生成方法用于电商推荐，结合多阶段生成与排序。

ArXiv cs.AI#recommendation #e-commerce #generative

5.5The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems

提出基于本体的工具架构，用于工业AI智能体系统的语义训练。

ArXiv cs.AI#ai-agent #ontology #industrial[Tool Use]

5.5Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary

分解进化式混合LoRA架构，分析路由、生命周期惩罚和边界。

ArXiv cs.CL#lora #mixture-of-experts #evolution

5.0Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]

训练数字感知嵌入模型和Text JEPA的实验分享。

Reddit r/MachineLearning#embedding #jepa #autoencoder

> Engineering & Resources

8.7Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

开源26M参数工具调用模型Needle，速度快。

HN (638)#tool-calling #open-source #small-model[Tool Use][Model Release]

8.5Notion just turned its workspace into a hub for AI agents - TechCrunch

Notion发布新开发者平台，将工作空间转变为AI代理中心。

techcrunch.com#ai-agents #notion #developer-platform[Agent Harness]

8.0AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Ovis2.6-80B-A3B 多模态大语言模型发布，基于 Ovis 系列。

Reddit r/LocalLLaMA#multimodal #mllm #open-source[Model Release]

8.0sensenova/SenseNova-U1-A3B-MoT · Hugging Face

SenseNova-U1 统一多模态理解与生成的原生多模态模型发布。

Reddit r/LocalLLaMA#multimodal #model-release #unified[Model Release]

7.9rohitg00/agentmemory

AgentMemory：为AI编程代理提供持久记忆，基于基准。

GitHub trending:all (+1379★)#agent #memory #benchmark[Context Engineering]

7.9mattpocock/skills

Anthropic Claude Code技能集，供工程师使用。

GitHub trending:all (+3392★)#claude-code #skills #ai-coding[Coding Agents]

7.5AI chatbots are giving out people’s real phone numbers | MIT Technology Review

AI聊天机器人泄露用户真实电话号码的安全问题。

technologyreview.com#ai-safety #privacy #chatbot

7.5Anthropic now has more business customers than OpenAI, according to Ramp data | TechCrunch

Ramp数据显示Anthropic企业客户数超过OpenAI。

techcrunch.com#anthropic #openai #enterprise

7.5DramaBox - Most Expressive Voice model ever based on LTX 2.3

DramaBox 是基于 LTX 2.3 的最具表现力的语音模型发布。

Reddit r/LocalLLaMA#voice-model #open-source #multimodal[Model Release]

7.5antirez/ds4

antirez发布DeepSeek 4 Flash本地推理引擎，支持Metal。

Co-Starred#deepseek #local-inference #metal[Model Release]

7.5obra/superpowers

Superpowers：代理技能框架与软件开发方法论。

GitHub trending:all (+1401★)#agent #framework #skills[Agent Harness]

7.5NousResearch/hermes-agent

NousResearch发布的通用AI Agent框架。

GitHub trending:python (+1881★)#agent-framework #open-source[Agent Harness]

7.5garrytan/gstack

Garry Tan的Claude Code配置，含23个工具。

GitHub trending:typescript (+1083★)#claude-code #developer-tools[Coding Agents]

7.3anthropics/skills

Anthropic官方Agent Skills公共仓库。

GitHub trending:python (+635★)#agent-skills #anthropic[Agent Harness]

7.0Amazon launches an AI shopping assistant for the search bar, powered by Alexa+ | TechCrunch

亚马逊推出由Alexa+驱动的AI购物助手。

techcrunch.com#ai-assistant #ecommerce #alexa

7.0Adaption aims big with AutoScientist, an AI tool that helps models train themselves | TechCrunch

Adaption推出AutoScientist工具帮助模型自我训练。

techcrunch.com#auto-ml #training #tool[Post-Training]

7.0Mira Murati’s Thinking Machines previews ‘interaction models’ | Semafor

Mira Murati的Thinking Machines预览交互模型。

semafor.com#interaction-models #startup

7.0Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?

讨论 Google 关闭免费搜索索引对 AI 网络搜索的影响及替代方案。

Reddit r/LocalLLaMA#web-search #ai-infrastructure

7.0I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

为 Anthropic 的自然语言自编码器制作本地 UI 和服务器。

Reddit r/LocalLLaMA#anthropic #autoencoder #local-llm

7.0Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

Scenema Audio发布零样本语音克隆和语音生成模型。

Reddit r/MachineLearning#voice-cloning #speech-generation #zero-shot

7.0huggingface/ml-intern

Hugging Face开源ML工程师项目，自动读论文、训练模型。

Co-Starred#open-source #ml-engineer #automation[Agent Harness]

6.6tinyhumansai/openhuman

OpenHuman：个人AI超级智能，注重隐私和简洁。

GitHub trending:all (+1696★)#personal-ai #open-source

6.5WhatsApp adds an incognito mode in Meta AI chats | TechCrunch

WhatsApp为Meta AI聊天添加隐身模式。

techcrunch.com#meta-ai #privacy #messaging

6.5Mark Zuckerberg announces ‘completely private’ encrypted Meta AI chat | The Verge

扎克伯格宣布Meta AI聊天完全加密私密。

theverge.com#meta-ai #privacy #encryption

6.524+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

在旧 GTX 1080 上以 24+ tok/s 运行约 30B MoE 模型。

Reddit r/LocalLLaMA#inference #performance #moe

6.4millionco/react-doctor

React Doctor：捕获AI代理编写的糟糕React代码。

GitHub trending:all (+604★)#react #code-quality #ai-coding[Coding Agents]

6.1Medicare's new payment model is built for AI. Most of the tech world has no idea

Medicare新支付模型为AI设计，科技界未察觉。

HN (52)#medicare #ai-policy #healthcare

6.0[AINews] The End of Finetuning

反思微调是否走向终结的行业分析。

Latent Space#finetuning #llm #opinion

6.0BasedAI Emerges from Stealth to Launch Hirebase, the instant AI Workforce Platform for Businesses - Yahoo Finance

BasedAI推出Hirebase，即时AI劳动力平台。

finance.yahoo.com#ai-workforce #platform-launch

6.0Alexa is moving into Amazon․com | The Verge

Alexa整合进Amazon.com购物体验。

theverge.com#alexa #shopping #ai-assistant

6.0TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui).

TextGen 发布原生桌面应用，作为 LM Studio 的开源替代。

Reddit r/LocalLLaMA#local-llm #desktop-app #open-source

6.0MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

分享 MI50 上运行 Qwen 3.6 27B 的推理性能数据。

Reddit r/LocalLLaMA#inference #performance #qwen

6.0llama.cpp docker images to run MTP models

llama.cpp Docker 镜像更新以支持 MTP 模型。

Reddit r/LocalLLaMA#llama.cpp #docker #mtp

6.0Show HN: Rotunda - A browser built for agents with simulated typing

Rotunda：为AI代理构建的浏览器，支持模拟打字。

HN (12)#agent #browser #open-source[Agent Harness]

5.8Meta won't let you block its AI account on Threads

Meta不允许用户在Threads屏蔽其AI账号。

HN (107)#meta #ai-account #social-media

5.8CodebuffAI/codebuff

Codebuff 是一个终端代码生成工具，属于 AI 编程助手。

GitHub trending:typescript (+188★)#ai-coding #cli-tool[Coding Agents]

5.7MervinPraison/PraisonAI

多Agent协作框架，可部署AI员工。

GitHub trending:python (+411★)#multi-agent #framework[Agent Harness]

5.6MemoriLabs/Memori

Agent原生记忆基础设施，LLM无关。

GitHub trending:python (+66★)#memory #agent-infrastructure[Context Engineering]

5.6The US is winning the AI race where it matters most: commercialization

分析美国在AI商业化竞赛中领先。

HN (162)#ai-commercialization #us-policy

5.5Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack

重新思考LLMOps在欺诈检测和反洗钱中的应用，构建合规级LLM服务栈。

ArXiv cs.AI#llmops #fraud-detection #compliance

5.3iOfficeAI/AionUi

开源AI助手协作应用，支持多种CLI。

GitHub trending:typescript (+155★)#ai-assistant #open-source[Agent Harness]

5.3opendatalab/MinerU

文档转LLM就绪格式的工具，支持PDF等。

GitHub trending:python (+129★)#document-processing #llm

5.3Launch HN: Ardent (YC P26) – Postgres sandboxes in seconds with zero migration

Ardent推出Postgres沙箱，面向编程Agent。

HN (64)#postgres #sandbox #coding-agents[Coding Agents]

5.2ErlichLiu/Proma

Proma 是基于 Claude Agent SDK 的开源通用 Agent 实践，支持飞书调用。

GitHub trending:typescript (+35★)#agent-framework #open-source #claude[Agent Harness]

[STATS] 69 items · 33 sources · Score >= 5.0