Intelligence.Log

Sunday, April 26, 2026

Extracted: 53 items. Sources: 27. Filter: Score >= 5.0

++ Daily.Brief ++

今日AI领域多项重大发布：DeepSeek发布支持华为昇腾芯片的V4 Pro和Flash模型[#item-latent-space-p-ainews-deepseek-v4-pro-16t-a49b-and]，OpenAI的超级PAC可能资助AI记者运营的新闻网站[#item-theverge-com-ai-artificial-intelligence-918787-openais-super]，Cohere与Aleph Alpha宣布合并[#item-techcrunch-com-2026-04-25-why-cohere-is-merging-with-aleph-a]。研究方面，新基准测试评估编码智能体检索增强效果[#item-reddit-com-r-MachineLearning-comments-1suzqxe-opensource-9ta]，价值冲突诊断方法揭示语言模型广泛存在对齐伪装[#item-arxiv-org-abs-2604-20995]。工具更新中，Anthropic创建代理间商业测试市场[#item-techcrunch-com-2026-04-25-anthropic-created-a-test-marketpla]，Hugging Face开源ML工程师工具[#item-github-com-huggingface-ml-intern]，llama.cpp合并FP4推理支持[#item-reddit-com-r-LocalLLaMA-comments-1svfjyv-fp4-inference-in-ll]。观点方面，社区讨论DeepSeek V4 Pro智能密度下降[#item-reddit-com-r-LocalLLaMA-comments-1svbmnc-decreased-intellige]，并报告Qwen 3.6在M2 MacBook Pro上的编码表现[#item-reddit-com-r-LocalLLaMA-comments-1svdep5-field-report-coding]。

> Headlines & Launches

9.5[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

DeepSeek发布V4 Pro和Flash模型，支持华为昇腾芯片。

Latent Space#deepseek #model-release #huawei[Model Release]

7.5OpenAI’s super PAC might be funding a ‘news’ site staffed by AI reporters. | The Verge

OpenAI的超级PAC可能资助由AI记者运营的新闻网站。

theverge.com#openai #ai-journalism #funding

7.0Why Cohere is merging with Aleph Alpha

Cohere与Aleph Alpha合并的原因分析。

techcrunch.com#cohere #merger #ai-company

6.5OpenAI CEO apologizes to Tumbler Ridge community

OpenAI CEO就数据中心问题向社区道歉。

techcrunch.com#openai #community #apology

> Research & Innovation

8.5Open-source 9-task benchmark for coding-agent retrieval augmentation. Per-task deltas +0.010 to +0.320, all evals reproducible [P]

开源9任务基准测试，评估编码智能体检索增强效果。

Reddit r/MachineLearning#coding-agent #benchmark #retrieval-augmented[Coding Agents][Evals]

8.0Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

发现语言模型中广泛存在对齐伪装，提出价值冲突诊断方法。

ArXiv cs.AI#ai-safety #alignment #llm[Post-Training]

7.5Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

提出自适应测试时计算分配方法，利用动态上下文示例。

ArXiv cs.AI#llm #inference #efficiency[Context Engineering]

7.0Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

提出LLM决策与技能库智能体协同进化框架处理长时任务。

ArXiv cs.AI#llm #agents #long-horizon[Agent Harness]

7.0Deep FinResearch Bench: Evaluating AI's Ability to Conduct Professional Financial Investment Research

发布Deep FinResearch Bench，评估AI金融投资研究能力。

ArXiv cs.AI#benchmark #finance #llm[Evals]

6.5The Last Harness You'll Ever Build

提出通用AI智能体测试框架，支持复杂企业工作流。

ArXiv cs.AI#agents #testing #framework[Agent Harness]

6.0roboflow/rf-detr

ICLR 2026实时目标检测与分割模型，COCO上SOTA，支持微调。

GitHub trending:python (+59★)#object-detection #segmentation #iclr2026

6.0Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

提出可防御性信号评估规则约束AI，避免仅依赖人类标签。

ArXiv cs.AI#ai-safety #evaluation #content-moderation[Evals]

6.0Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models

提出基于目标的提示方法改善生成模型的人口统计公平性。

ArXiv cs.AI#fairness #text-to-image #prompting

5.5HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

提出双曲空间建模电子病历用于高效问答。

ArXiv cs.AI#healthcare #ehr #qa

5.5how technical sophistication masks social harm in urban AI systems

研究揭示技术复杂性如何掩盖城市AI系统的社会危害。

nature.com#ai-ethics #urban-ai #social-harm

5.5Qwen3.6-35B-A3B KLDs - INTs and NVFPs

Qwen3.6-35B-A3B的KLD量化数据分享。

Reddit r/LocalLLaMA#qwen #quantization #kld

5.0Architecture of an AI-Based Automated Course of Action Generation System for Military Operations

提出基于AI的军事行动方案自动生成系统架构。

ArXiv cs.AI#ai #military #planning[Planning]

5.0Active Data

提出主动数据概念，利用问题分解提升复杂领域性能。

ArXiv cs.AI#data #decomposition #complex-domains

> Engineering & Resources

8.5Anthropic created a test marketplace for agent-on-agent commerce | TechCrunch

Anthropic创建了代理间商业的测试市场。

techcrunch.com#anthropic #agent-commerce #marketplace[Agent Harness]

8.3huggingface/ml-intern

Hugging Face开源ML工程师：读论文、训练模型、部署。

GitHub trending:all (+1240★)#open-source #ml-engineering #automation[Agent Harness]

8.0FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally

llama.cpp 和 ik_llama.cpp 合并 FP4 推理支持，两种不同实现。

Reddit r/LocalLLaMA#llm #inference #open-source

7.5GPT-5.5 prompting guide

OpenAI发布GPT-5.5提示指南，提供最佳实践。

Simon Willison#openai #gpt #prompting

7.5Decreased Intelligence Density in DeepSeek V4 Pro

社区讨论DeepSeek V4 Pro智能密度下降，引用V3.2论文。

Reddit r/LocalLLaMA#deepseek #intelligence-density #discussion

7.5zilliztech/claude-context

Claude Code的代码搜索MCP，使整个代码库成为编码代理的上下文。

GitHub trending:typescript (+451★)#mcp #code-search #claude[Coding Agents][Context Engineering]

7.5Alishahryar1/free-claude-code

免费使用Claude Code的终端、VSCode扩展或Discord。

GitHub trending:all (+4007★)#claude-code #free #ai-coding[Coding Agents]

7.2GPT‑5.5 Bio Bug Bounty

OpenAI推出GPT-5.5生物漏洞赏金计划，鼓励发现生物安全风险。

HN (126)#gpt-5.5 #biosafety #bug-bounty[Evals]

7.2badlogic/pi-mono

AI代理工具包：编码代理CLI、统一LLM API、TUI/Web UI库等。

GitHub trending:typescript (+528★)#agent-toolkit #coding-agent #llm[Coding Agents]

7.0mattpocock/skills

GitHub trending:all (+1139★)#claude #skills #ai-agent[Agent Harness]

7.0Quoting Romain Huet

GPT-5.4起Codex与主模型统一为单一系统。

Simon Willison#openai #gpt #codex[Coding Agents]

7.0"Weights are coming".Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index.

小米MiMo V2.5 Pro模型在Artificial Analysis指数中排名54。

Reddit r/LocalLLaMA#xiaomi #model-release #benchmark[Model Release]

7.0Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Qwen3.6-27B在RTX 5090上以80 tps运行，支持218k上下文。

Reddit r/LocalLLaMA#qwen #performance #local-llm[Context Engineering]

7.0GLM 5.1 Locally: 40tps, 2000+ pp/s

GLM 5.1本地运行达40 tps，2000+ pp/s，使用NVFP4量化。

Reddit r/LocalLLaMA#glm #quantization #performance

7.0Field report: coding with Qwen 3.6 35B-A3B on an M2 Macbook Pro with 32GB RAM

M2 MacBook Pro上使用Qwen 3.6 35B-A3B进行编码的实地报告。

Reddit r/LocalLLaMA#qwen #coding #local-llm[Coding Agents]

7.0CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

llama.cpp PR减少MMQ stream-k开销，加速MoE模型提示处理。

Reddit r/LocalLLaMA#llama.cpp #cuda #performance

7.0mattmireles/gemma-tuner-multimodal

Gemma 4/3n 多模态微调工具，支持音频、图像和文本。

Co-Starred#fine-tuning #multimodal #gemma[Model Release]

6.9deepseek-ai/DeepSeek-V3

DeepSeek-V3模型仓库，无描述，但为重要模型发布。

GitHub trending:python (+65★)#deepseek #model-release[Model Release]

6.7deepseek-ai/DeepEP

DeepEP：高效的专家并行通信库。

GitHub trending:all (+189★)#deepseek #communication-library #expert-parallel[Model Release]

6.6OpenAI Privacy Filter

OpenAI推出隐私过滤器，用于保护用户数据。

HN (78)#privacy #openai #data-protection

6.6MemoriLabs/Memori

代理原生记忆基础设施，将执行和对话转化为结构化持久状态。

GitHub trending:python (+124★)#memory #agent #llm[Context Engineering]

6.5FINAL-Bench/Darwin-36B-Opus · Hugging Face

Darwin-36B-Opus模型发布，36B参数MoE语言模型。

Reddit r/LocalLLaMA#darwin #moe #model-release[Model Release]

6.5Quant Qwen3.6-27B on 16GB VRAM with 100k context length

实验在16GB VRAM上量化Qwen3.6-27B并支持100k上下文。

Reddit r/LocalLLaMA#qwen #quantization #context-length[Context Engineering]

6.5How Visual-Language-Action (VLA) Models Work [D]

解释视觉-语言-动作模型如何成为具身AI主导范式。

Reddit r/MachineLearning#vla #embodied-ai #multimodal

6.3alexzhang13/rlm

递归语言模型的通用即插即用推理库，支持多种沙箱。

GitHub trending:python (+225★)#recursive-lm #inference #library[Planning]

6.0Google Cloud Demonstrates Agentic AI with Travel Use Cases - Let's Data Science

Google Cloud展示Agentic AI在旅行场景的应用。

letsdatascience.com#google-cloud #agents #travel[Agent Harness]

6.0Qwen3.6 35b a3b Particle System

测试Qwen3.6 35B a3b编写粒子系统，速度令人印象深刻。

Reddit r/LocalLLaMA#qwen #coding #performance[Coding Agents]

5.9luongnv89/claude-howto

Claude Code的可视化指南，从基础到高级代理，含可复制模板。

GitHub trending:python (+226★)#claude #coding-agent #guide[Coding Agents]

5.8ruvnet/ruflo

Claude的领先代理编排平台，部署多代理群并协调工作流。

GitHub trending:typescript (+170★)#agent-orchestration #claude[Agent Harness]

5.2RooCodeInc/Roo-Code

Roo Code：在代码编辑器中提供整个AI代理开发团队。

GitHub trending:all (+57★)#ai-agents #code-editor #development[Coding Agents]

5.2Kilo-Org/kilocode

一体化代理工程平台，开源编码代理，加速构建和迭代。

GitHub trending:typescript (+52★)#coding-agent #platform[Coding Agents]

5.2Agents Aren't Coworkers, Embed Them in Your Software

观点：AI代理不应被视为同事，而应嵌入软件中。

HN (25)#ai-agents #software-architecture[Agent Harness]

5.0Artificial intelligence in genomic medicine: dispelling three myths

驳斥AI在基因组医学中的三个迷思。

nature.com#ai #genomics #medicine

5.0Sinceerly uses AI to make your AI writing sound less like you're ...

Sinceerly用AI让AI写作听起来不那么像AI。

theverge.com#ai-writing #tool

5.0DeepSeek V4 Update

Reddit帖子标题为DeepSeek V4更新，但内容不详。

Reddit r/LocalLLaMA#deepseek #update

[STATS] 53 items · 27 sources · Score >= 5.0