Intelligence.Log

Saturday, May 16, 2026

Extracted: 56 items. Sources: 28. Filter: Score >= 5.0

++ Daily.Brief ++

今日AI领域迎来多项重大发布：OpenAI推出ChatGPT个人理财功能，允许用户连接银行账户管理财务[#item-techcrunch-com-2026-05-15-openai-launches-chatgpt-for-person]；Intercom更名为Fin并推出专门管理其他AI代理的AI代理[#item-venturebeat-com-technology-intercom-now-called-fin-launches-]；arXiv则对包含LLM生成错误（如幻觉引用）的论文实施1年禁令[#item-reddit-com-r-MachineLearning-comments-1tdje2d-arxiv-implemen]。研究方面，Orthrus-Qwen3-8B实现7.8倍token加速且输出分布不变[#item-reddit-com-r-LocalLLaMA-comments-1te5xpu-orthrusqwen38b-up-t]，同时有工作系统审计AI智能体基准测试并揭示视觉语言模型的可解释失败模式[#item-arxiv-org-abs-2605-12673][#item-arxiv-org-abs-2605-12674]。工具更新涌现多个面向AI代理的框架与技能集[#item-github-com-mattpocock-skills][#item-github-com-obra-superpowers][#item-github-com-garrytan-gstack]。观点方面，马斯克与奥特曼法律纠纷持续升级[#item-technologyreview-com-2026-05-15-1137357-musk-v-altman-week-3]，社区热议“AI幻觉状态”[#item-twitter-com-mitchellh-status-2055380239711457578]，OpenAI则通过高管调整全力争夺AI代理市场[#item-theverge-com-ai-artificial-intelligence-931544-openai-keeps-]。

> Headlines & Launches

8.0OpenAI launches ChatGPT for personal finance, will let you connect bank accounts | TechCrunch

OpenAI推出ChatGPT个人理财功能，可连接银行账户。

techcrunch.com#chatgpt #finance #product-launch[Tool Use]

8.0Intercom, now called Fin, launches an AI agent whose only job is managing another AI agent

Intercom更名为Fin，推出管理其他AI代理的AI代理。

venturebeat.com#ai-agent #management #customer-service[Agent Harness]

8.0arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]

arXiv对含LLM生成错误（如幻觉引用）的论文实施1年禁令。

Reddit r/MachineLearning#arxiv #policy #llm-errors[Evals]

7.5Greg Brockman Officially Takes Control of OpenAI's Products in ...

Greg Brockman正式接管OpenAI产品部门，公司高层重组。

wired.com#openai #leadership #reorganization

> Research & Innovation

8.0Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

系统审计AI智能体基准测试，揭示常见漏洞并提出BenchJack工具。

ArXiv cs.AI#benchmark #agent #auditing[Evals]

8.0Orthrus-Qwen3-8B : up to 7.8×tokens/forward on Qwen3-8B, frozen backbone, provably identical output distribution

Orthrus-Qwen3-8B实现7.8倍token加速，输出分布不变。

Reddit r/LocalLLaMA#llm #inference #optimization[Model Release]

7.5Revealing Interpretable Failure Modes of VLMs

揭示视觉语言模型的可解释失败模式，提升安全性。

ArXiv cs.AI#vlm #interpretability #safety

7.5DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

提出参与式评估框架DisaBench，衡量语言模型对残疾人群体的危害。

ArXiv cs.AI#benchmark #safety #disability[Evals]

7.5Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning

审计多模态物理推理评估流程，提出Physics-R1数据集和推理方法。

ArXiv cs.CL#physics #reasoning #multimodal[Evals]

7.0Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

提出验证器引导的动作选择方法，提升具身智能体任务执行可靠性。

ArXiv cs.AI#embodied-agent #verifier #action-selection[Agent Harness]

7.0Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

提出针对推测解码的隐蔽加速崩溃攻击方法Mistletoe。

ArXiv cs.CL#speculative-decoding #security #attack

7.0ByteDance-Seed/Cola-DLM · Hugging Face

字节跳动发布Cola-DLM，连续潜在空间扩散语言模型。

Reddit r/LocalLLaMA#diffusion #language-model #latent-space[Model Release]

7.0Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]

Orthrus提出双视图扩散实现内存高效的并行token生成。

Reddit r/MachineLearning#diffusion #parallel-generation #memory-efficient

6.5Macro-Action Based Multi-Agent Instruction Following through Value Cancellation

提出基于价值取消的宏动作多智能体指令跟随方法。

ArXiv cs.AI#multi-agent #reinforcement-learning #instruction-following[Agent Harness]

6.5CHAL: Council of Hierarchical Agentic Language

提出层级化智能体语言议会框架，提升多智能体辩论推理能力。

ArXiv cs.AI#multi-agent #debate #reasoning[Agent Harness]

6.5Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation

提出基于逻辑的推导提示方法，改进检索增强生成。

ArXiv cs.CL#rag #prompting #logic[Context Engineering]

6.5Distribution Corrected Offline Data Distillation for Large Language Models

提出分布校正的离线数据蒸馏方法，提升小模型推理能力。

ArXiv cs.CL#knowledge-distillation #reasoning #data-augmentation[Post-Training]

6.5When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

研究检索增强生物医学问答中证据冲突时的不确定性和顺序效应。

ArXiv cs.CL#rag #biomedical #uncertainty[Context Engineering]

6.0Learning Transferable Latent User Preferences for Human-Aligned Decision Making

学习可迁移的潜在用户偏好，实现与人类对齐的决策。

ArXiv cs.AI#llm #alignment #preference-learning[Post-Training]

6.0State-Centric Decision Process

提出以状态为中心的决策过程，用于语言环境中的智能体。

ArXiv cs.AI#agent #decision-making #state-centric[Planning]

6.0Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

实证研究多语言知识编辑的合并方法，解决语言间干扰问题。

ArXiv cs.CL#knowledge-editing #multilingual #llm

6.0PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

提出参数高效多任务学习框架PEML，优化连续提示。

ArXiv cs.CL#peft #multi-task-learning #prompt-tuning

6.0Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

全面复制研究LLM毒性测量与缓解方法。

ArXiv cs.CL#toxicity #safety #llm

5.5BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

提出混合AI框架BEHAVE，实时建模集体人类动态。

ArXiv cs.AI#human-dynamics #simulation #ai-framework

5.5VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

发布42M参数的西班牙语网络安全语言模型，支持课程学习和工具调用。

ArXiv cs.CL#spanish #cybersecurity #small-language-model[Model Release]

5.5Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

提出双层次对话策略学习用于法律询问式对话代理。

ArXiv cs.CL#dialogue-system #legal-ai #policy-learning

> Engineering & Resources

8.7mattpocock/skills

面向真实工程师的Claude技能集，来自.claude目录。

GitHub trending:all (+3132★)#agent-skills #claude[Agent Harness]

8.3obra/superpowers

Agent技能框架与软件开发方法论。

GitHub trending:all (+1648★)#agent-framework #skills[Agent Harness]

8.3garrytan/gstack

Garry Tan的Claude Code配置，包含23个AI代理工具。

GitHub trending:typescript (+1005★)#ai-coding #claude #agent[Coding Agents]

7.8anthropics/skills

Anthropic官方Agent Skills公共仓库。

GitHub trending:all (+689★)#agent-skills #anthropic[Agent Harness]

7.5YouTube is expanding its AI deepfake detection tool to all adult users

YouTube向所有成年用户扩展AI深度伪造检测工具。

theverge.com#deepfake #detection #youtube

7.5Musk v. Altman week 3: Musk and Altman traded blows over each ...

马斯克与奥特曼法律纠纷第三周，双方互相攻击。

technologyreview.com#openai #musk #legal

7.5internlm/Intern-S2-Preview · Hugging Face

Intern-S2-Preview发布，35B科学多模态基础模型。

Reddit r/LocalLLaMA#multimodal #science #open-source[Model Release]

7.5huggingface/ml-intern

Hugging Face发布开源ML工程师，可读论文、训练模型。

Co-Starred#open-source #ml-engineer #huggingface[Agent Harness]

7.5rohitg00/agentmemory

为AI编码代理提供持久记忆的开源库。

GitHub trending:typescript (+721★)#memory #ai-coding #agent[Context Engineering]

7.0I believe there are entire companies right now under AI psychosis

MitchellH认为许多公司处于AI幻觉状态，引发社区讨论。

HN (781)#ai-criticism #industry-opinion

7.0OpenAI keeps shuffling its executives in bid to win AI agent battle

OpenAI持续调整高管以赢得AI代理竞争。

theverge.com#openai #executive #ai-agents[Agent Harness]

7.0Codex untethers AI from the laptop | Semafor

Codex将AI从笔记本电脑中解放出来，实现移动化。

semafor.com#codex #mobile-ai #ai-agents[Agent Harness]

7.0Anthropic Calls for Tighter U.S. Chip Restrictions on China — The Information

Anthropic呼吁美国加强对华芯片出口限制，影响AI硬件供应链。

theinformation.com#ai-policy #chip-restrictions #anthropic

7.0Built a fully offline suitcase robot around a Jetson Orin NX SUPER 16GB. Gemma 4 E4B, ~200ms cached TTFT, 30+ sensors, no WiFi/BT/cellular. He has opinions.

基于Jetson的离线机器人，运行Gemma 4模型。

Reddit r/LocalLLaMA#edge-ai #robot #gemma

7.0I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

自托管MCP服务器，为本地LLM提供实时金融数据。

Reddit r/LocalLLaMA#mcp #finance #open-source[Tool Use]

7.0antirez/ds4

antirez发布DeepSeek 4 Flash本地推理引擎，支持Metal。

Co-Starred#deepseek #local-inference #metal[Model Release]

6.9K-Dense-AI/scientific-agent-skills

面向科研、工程、金融等领域的Agent技能集。

GitHub trending:all (+646★)#agent-skills #research[Agent Harness]

6.6tinyhumansai/openhuman

开源个人AI超级智能，注重隐私和简洁。

GitHub trending:all (+1271★)#open-source #personal-ai

6.5‘The new era is here’: Fears rise over AI hacking | Semafor

报道对AI黑客攻击日益增长的担忧。

semafor.com#ai-safety #hacking #cybersecurity

6.5Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE

动态分配计算预算，Qwen-35B-A3B接近GPT-5.4水平。

Reddit r/LocalLLaMA#llm #reasoning #benchmark[Planning]

6.5AllenAI has been iterating on their MolmoAct2 models for robotics

AllenAI发布MolmoAct2系列模型，用于机器人控制的视觉-语言-动作模型。

Reddit r/LocalLLaMA#robotics #vision-language-action[Model Release]

6.4NVIDIA-AI-Blueprints/video-search-and-summarization

NVIDIA GPU加速视频搜索与摘要参考架构。

GitHub trending:all (+308★)#video-analysis #vision-agent

6.1jingyaogong/minimind

从零训练64M参数LLM的开源项目，2小时完成。

GitHub trending:python (+99★)#llm #training #open-source[Post-Training]

5.9Show HN: Watch a neural net learn to play Snake

浏览器中展示神经网络学习玩贪吃蛇的PPO训练演示。

HN (118)#reinforcement-learning #webgpu #demo

5.8joeseesun/qiaomu-anything-to-notebooklm

多源内容处理器，支持微信文章等转播客/PPT。

GitHub trending:all (+438★)#content-processing #notebooklm

5.7opendatalab/MinerU

MinerU将PDF等文档转换为LLM可用的Markdown/JSON。

GitHub trending:python (+143★)#document-processing #llm #open-source

5.2CodebuffAI/codebuff

终端内生成代码的AI工具Codebuff。

GitHub trending:typescript (+91★)#ai-coding #terminal #open-source[Coding Agents]

5.2czlonkowski/n8n-mcp

为Claude等AI工具构建n8n工作流的MCP。

GitHub trending:all (+68★)#mcp #workflow[Tool Use]

5.1awslabs/agent-plugins

AWS AI编码Agent插件，辅助架构和部署。

GitHub trending:python (+21★)#aws #agent-plugins[Coding Agents]

5.0Can a 5090 with qwen3.6 achieve > 3,000 tok/s ? bring your pitchforks (open-dllm)

讨论5090上Qwen3.6能否达到3000 tok/s。

Reddit r/LocalLLaMA#llm #inference #performance

[STATS] 56 items · 28 sources · Score >= 5.0