Intelligence.Log

Thursday, May 7, 2026

Extracted: 66 items. Sources: 35. Filter: Score >= 5.0

++ Daily.Brief ++

今日AI领域动态密集：Anthropic与SpaceX宣布重大合作，旨在通过计算协议提升Claude算力，同时微软与OpenAI的AGI定义首次公开。研究方面，AI自动开发的模型在Kaggle挑战中排名前5.7%，另有论文通过困惑度差异揭示微调目标泄露。工具更新上，ruflo发布多智能体编排平台，DeepSeek-TUI推出终端编码代理。观点方面，Qwen 3.6 27B通过MTP实现2.5倍推理加速，OpenAI则让ChatGPT自行策划GPT-5.5发布派对。

> Headlines & Launches

8.5Anthropic and SpaceX announce major partnership as AI arms races continues - NBC News

Anthropic与SpaceX宣布重大合作，提升Claude算力。

nbcnews.com#anthropic #spacex #compute[Model Release]

8.5Anthropic, SpaceX Sign Deal to Boost AI Computing Power for Claude Software - Bloomberg

Anthropic与SpaceX签署计算协议，增强Claude算力。

bloomberg.com#anthropic #spacex #compute[Model Release]

8.5Microsoft and OpenAI’s definition of AGI was just revealed. | The Verge

微软和OpenAI的AGI定义首次公开，引发行业关注。

theverge.com#agi #definition #microsoft

8.0Meta-Backed Scale AI Wins $500 Million Defense Department Deal - Bloomberg

Meta支持的Scale AI赢得5亿美元国防部合同。

bloomberg.com#scale-ai #defense #contract

7.5Apple to pay $250M to settle lawsuit over Siri's delayed AI features | TechCrunch

苹果支付2.5亿美元和解Siri AI功能延迟诉讼。

techcrunch.com#apple #siri #lawsuit

7.0Google shuts down Project Mariner | The Verge

谷歌关闭Project Mariner，终止AI浏览器代理项目。

theverge.com#google #project-mariner #ai-agent[Agent Harness]

> Research & Innovation

7.5Model automatically developed by the AIBuildAI Agent ranked among top 5.7% out of 3,219 human teams in the Kaggle TGS Salt Identification Challenge [P]

AI自动开发的模型在Kaggle挑战中排名前5.7%。

Reddit r/MachineLearning#automl #kaggle #agent[Coding Agents]

7.0Understanding Emergent Misalignment via Feature Superposition Geometry

通过特征叠加几何理解微调导致的突现错位

ArXiv cs.AI#alignment #fine-tuning #interpretability[Post-Training]

7.0Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives

困惑度差异揭示微调目标，模型泄露训练意图

ArXiv cs.CL#fine-tuning #privacy #perplexity[Post-Training]

7.0Qwen3.6 27B NVFP4 + MTP on a single RTX 5090: 200k context working in vLLM

在单张RTX 5090上以NVFP4+MTP运行Qwen3.6 27B，支持200k上下文。

Reddit r/LocalLLaMA#llm #inference #hardware

6.7Learning the Integral of a Diffusion Model

学习扩散模型的积分，提出流映射方法。

HN (92)#diffusion-models #flow-maps #generative-ai

6.5ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations

基于指南的临床聊天机器人，带优先证据RAG和可验证引用

ArXiv cs.AI#rag #clinical #chatbot[Context Engineering]

6.5H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

从语言模型潜在表示中提取层次结构的探针方法

ArXiv cs.CL#interpretability #hierarchy #probing

6.5Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness

LLM干预可去偏新闻，但高估自身效果

ArXiv cs.CL#bias #news #llm-intervention

6.0Towards Multi-Agent Autonomous Reasoning in Hydrodynamics

多智能体自主推理在流体动力学中的应用

ArXiv cs.AI#multi-agent #scientific-reasoning[Agent Harness]

6.0CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine

揭示噪声和歧义如何降低医学LLM的可靠性

ArXiv cs.CL#medical #reliability #noise[Evals]

6.0Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect

提出XHS-SCoRE基准，评估LLM生成社交比较触发词的能力。

ArXiv cs.CL#llm #benchmark #social-comparison[Evals]

6.0Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)

Qwen 3.6 27B不同量化版本的质量对比测试。

Reddit r/LocalLLaMA#llm #quantization #benchmark

6.0Transformers with Selective Access to Early Representations [R]

新论文提出选择性访问早期表示的Transformer变体。

Reddit r/MachineLearning#transformer #architecture #attention

5.5Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy

临床医生在环的AI言语治疗代理，实现个性化监督治疗

ArXiv cs.AI#healthcare #speech-therapy #agent

5.5DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA

图表问答中推理级归因的评估框架

ArXiv cs.CL#diagram-qa #evaluation #reasoning[Evals]

5.5A Theoretical Game of Attacks via Compositional Skills

理论分析通过组合技能对LLM进行攻击的游戏框架。

ArXiv cs.CL#llm #safety #adversarial

5.5Compared to What? Baselines and Metrics for Counterfactual Prompting

研究反事实提示的基线和度量标准。

ArXiv cs.CL#llm #prompting #evaluation[Evals]

5.5A Theory of Deep Learning

一篇关于深度学习理论的文章，探讨其基本原理。

HN (123)#deep-learning #theory #foundations

5.0A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation

探索差分隐私文本混淆中的文本分解与预算分配。

ArXiv cs.CL#llm #privacy #differential-privacy

5.0Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing

研究句子嵌入空间中受控释义的局部几何结构。

ArXiv cs.CL#llm #embeddings #paraphrase

> Engineering & Resources

9.1ruvnet/ruflo

ruflo：领先的Claude代理编排平台，支持多智能体群。

GitHub trending:all (+2192★)#agent-orchestration #claude #multi-agent[Agent Harness]

8.4addyosmani/agent-skills

agent-skills：AI编码代理的生产级工程技能库。

GitHub trending:all (+800★)#ai-coding #agent-skills #open-source[Coding Agents]

8.3Hmbown/DeepSeek-TUI

DeepSeek-TUI：终端中的DeepSeek模型编码代理。

GitHub trending:all (+6175★)#coding-agent #deepseek #tui[Coding Agents]

8.3AIDC-AI/Pixelle-Video

AI全自动短视频引擎Pixelle-Video开源发布。

GitHub trending:python (+1239★)#video-generation #open-source #ai-agent

8.3mksglu/context-mode

Context Mode优化AI编码智能体上下文窗口，减少98%输出。

GitHub trending:typescript (+711★)#context-optimization #ai-coding #agent[Context Engineering][Coding Agents]

8.0Live blog: Code w/ Claude 2026

Anthropic Code w/ Claude 2026活动的现场博客。

Simon Willison#claude #ai-coding #live-blog[Coding Agents]

8.02.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

Qwen 3.6 27B使用MTP实现2.5倍推理加速，本地编码可行。

Reddit r/LocalLLaMA#inference-speed #qwen #local-llm[Coding Agents]

7.6LearningCircuit/local-deep-research

local-deep-research：本地深度研究工具，支持多种LLM和搜索引擎。

GitHub trending:all (+532★)#research #local-llm #search[Evals]

7.5Higher usage limits for Claude and a compute deal with SpaceX

Anthropic提高Claude使用限制并与SpaceX达成计算协议。

HN (383)#claude #usage-limits #spacex[Coding Agents]

7.5Inside OpenAI's quirky launch party that ChatGPT 5.5 planned itself - Business Insider

OpenAI让ChatGPT自行策划GPT-5.5发布派对的报道。

businessinsider.com#openai #chatgpt #event

7.5Coder Sets a New Standard for AI Coding with Self-Hosted, AI Model Agnostic Coder Agents - GlobeNewswire

Coder发布自托管、模型无关的AI编码代理新标准。

globenewswire.com#ai-coding #self-hosted #coder-agents[Coding Agents]

7.5Anthropic raises Claude Code usage limits, credits new deal with SpaceX - Ars Technica

Anthropic提高Claude Code使用限制，归功于与SpaceX的新计算协议。

arstechnica.com#claude-code #usage-limits #compute-deal[Coding Agents]

7.5The GB10 Solution Atlas is now open source, the inference engine made for the community with breakneck inference speeds (Qwen3.6-35B-FP8 100+ tok/s)

GB10 Solution Atlas推理引擎开源，Qwen3.6-35B-FP8达100+ tok/s。

Reddit r/LocalLLaMA#llm #inference #open-source

7.4virattt/dexter

dexter：用于深度金融研究的自主代理。

GitHub trending:all (+666★)#autonomous-agent #finance #research[Agent Harness]

7.0vLLM V0 to V1: Correctness Before Corrections in RL

vLLM V0到V1更新，强调RL中正确性优先于修正。

Hugging Face#vllm #reinforcement-learning #open-source[Post-Training]

7.0Vibe coding and agentic engineering are getting closer than I'd like

讨论AI编程工具中vibe coding与agentic engineering的趋同。

Simon Willison#ai-coding #vibe-coding #agentic-engineering[Coding Agents]

7.0Anthropic Gets in Bed With SpaceX as the AI Race Turns Weird | WIRED

Anthropic与SpaceX达成计算协议，AI竞赛走向奇特。

wired.com#anthropic #spacex #compute-deal

7.0ZAYA1-8B: Frontier intelligence density, trained on AMD

ZAYA1-8B模型发布，在AMD上训练，前沿智能密度。

Reddit r/LocalLLaMA#model-release #amd #8b[Model Release]

7.0Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR

Qwen3.6-27B通过MTP和Unsloth UD XL实现2.5倍吞吐量。

Reddit r/LocalLLaMA#llm #inference #optimization

6.9bytedance/deer-flow

字节跳动开源长周期超级Agent框架，支持沙箱、记忆、工具等。

GitHub trending:all (+337★)#agent-framework #open-source #long-horizon[Agent Harness]

6.6vercel-labs/open-agents

Vercel开源云Agent构建模板Open Agents。

GitHub trending:typescript (+406★)#agent-template #open-source #cloud[Agent Harness]

6.5[AINews] Silicon Valley gets Serious about Services

硅谷AI公司转向服务化趋势的新闻汇总与分析。

Latent Space#ai-services #industry-trends

6.5Anthropic is programming Claude to “dream.” | The Verge

Anthropic正编程让Claude“做梦”，探索AI新能力。

theverge.com#claude #dreaming #ai-research

6.5How Elon Musk left OpenAI, according to Greg Brockman | TechCrunch

Greg Brockman讲述Elon Musk离开OpenAI的内幕。

techcrunch.com#elon-musk #openai #history

6.5Uploaded Unsloth Qwen3.6-35B-A3B UD XL models with MTP grafted, here are the results

上传了带有MTP的Unsloth Qwen3.6-35B-A3B UD XL模型。

Reddit r/LocalLLaMA#llm #open-source #quantization

6.5HOT TAKE: local models + agent harnesses are now capable enough to hand off junior-level IT professional tasks to [human written]

观点：本地模型+agent框架已能处理初级IT任务。

Reddit r/LocalLLaMA#agent #local-llm #automation[Agent Harness]

6.5Running scope enforcement on every agent action in production — what I'm seeing after launch [P]

在生产中对每个agent动作执行范围强制执行的实践观察。

Reddit r/MachineLearning#agent #production #safety[Agent Harness]

6.5cocoindex-io/cocoindex

长周期智能体增量引擎CocoIndex开源。

GitHub trending:python (+364★)#agent-framework #incremental #open-source[Agent Harness]

6.3Show HN: Tilde.run – Agent sandbox with a transactional, versioned filesystem

Tilde.run：带事务性版本化文件系统的Agent沙箱。

HN (126)#agent-sandbox #versioned-filesystem #tool-use[Agent Harness]

6.3InsForge/InsForge

InsForge：基于Postgres的后端，为编码代理构建。

GitHub trending:all (+230★)#backend #coding-agent #postgres[Coding Agents]

6.2The bottleneck was never the code

反思编程代理的瓶颈不在代码，而在其他因素。

HN (507)#coding-agents #bottleneck[Coding Agents]

6.1anthropics/financial-services

Anthropic的金融服务相关工具或资源。

GitHub trending:all (+641★)#anthropic #financial-services

6.0SoundHound AI Introduces OASYS: The World’s First Self-Learning Orchestrated Agentic AI Platform Where AI Builds AI - AiThority

SoundHound推出自学习编排式AI代理平台OASYS。

aithority.com#agentic-ai #self-learning #orchestration[Agent Harness]

6.0Google updates AI search to include quotes from Reddit and other sources | TechCrunch

谷歌更新AI搜索，整合Reddit等论坛的专家建议。

techcrunch.com#google-search #ai-search #reddit

6.0Get faster qwen 3.6 27b

分享使用MTP GGUF在3090上以100k上下文运行Qwen3.6 27B达到50 t/s的经验。

Reddit r/LocalLLaMA#llm #inference #optimization

6.0Most people seem obsessed with token generation speed, but isn’t prefill the real bottleneck? Am I missing something?

讨论token生成速度与prefill瓶颈，认为prefill才是真正瓶颈。

Reddit r/LocalLLaMA#llm #inference #performance

6.0Great results with Qwen3.6-35B-A3B-UD-Q5_K_XL + VS Code and Copilot

Qwen3.6-35B-A3B-UD-Q5_K_XL与VS Code和Copilot配合使用效果良好。

Reddit r/LocalLLaMA#llm #coding #local-llm[Coding Agents]

5.7Q00/ouroboros

Agent OS项目，通过指定而非提示驱动智能体。

GitHub trending:python (+143★)#agent-os #specification #ai-agent[Agent Harness]

5.0Stop letting LLMs edit your .bib [D]

警告：不要让LLM编辑.bib文件，会导致幻觉引用。

Reddit r/MachineLearning#llm #hallucination #academic

5.0huggingface/ml-intern

Hugging Face开源ML工程师项目ml-intern。

Co-Starred#open-source #ml-engineer

[STATS] 66 items · 35 sources · Score >= 5.0