Intelligence.Log

Friday, April 24, 2026

Extracted: 73 items. Sources: 36. Filter: Score >= 5.0

++ Daily.Brief ++

今日AI领域迎来重磅发布，OpenAI正式推出GPT-5.5模型，其在Terminal-Bench 2.0上险胜Anthropic的Claude Mythos，向超级应用迈进；研究方面，Qwen 3.6 27B在Agentic Index上追平Sonnet 4.6，同时有论文提出用于长上下文推理的TTKV分层缓存方法；工具更新上，GPT-5.5已通过Codex API可用，并新增代码搜索MCP工具让Claude Code处理整个代码库；观点方面，分析指出AI token支出失控趋势，而Anthropic的Mythos数据泄露引发行业反思，美国政府备忘录关注对抗性蒸馏可能收紧开源模型管控。

> Headlines & Launches

9.6GPT-5.5

OpenAI正式发布GPT-5.5模型。

HN (1074)#gpt-5.5 #openai #model-release[Model Release]

9.5OpenAI releases GPT-5.5, bringing company one step closer to an AI 'superapp'

OpenAI发布GPT-5.5，向AI超级应用迈进。

techcrunch.com#gpt-5.5 #openai #model-release[Model Release]

9.0OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0

OpenAI发布GPT-5.5，在Terminal-Bench 2.0上险胜Claude Mythos。

venturebeat.com#gpt-5.5 #benchmark #model-release[Model Release][Evals]

> Research & Innovation

8.5Qwen 3.6 27B Makes Huge Gains in Agency on Artificial Analysis - Ties with Sonnet 4.6

Qwen 3.6 27B 在 Agentic Index 上追平 Sonnet 4.6。

Reddit r/LocalLLaMA#qwen #benchmark #agentic[Evals][Model Release]

8.0TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

提出TTKV，一种用于长上下文LLM推理的分层KV缓存方法。

ArXiv cs.CL#llm #kv-cache #long-context[Context Engineering]

8.0We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]

基准测试18个LLM的OCR能力，发现便宜/旧模型常胜，开源数据集和框架。

Reddit r/MachineLearning#benchmark #ocr #open-source[Evals]

7.5ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

提出自适应红队与端到端修复策略-奖励系统的框架。

ArXiv cs.AI#rlhf #alignment #red-teaming[Post-Training]

7.5Human-Guided Harm Recovery for Computer Use Agents

提出人机协作恢复机制，防止计算机使用代理造成危害。

ArXiv cs.AI#agent #safety #human-in-the-loop[Agent Harness]

7.5Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

发现幻觉神经元在跨领域迁移中具有泛化性。

ArXiv cs.CL#llm #hallucination #interpretability

7.5OThink-SRR1: Search, Refine and Reasoning with Reinforced Learning for Large Language Models

提出结合搜索、精炼与强化学习的RAG推理框架。

ArXiv cs.CL#rag #reinforcement-learning #reasoning[Planning][Post-Training]

7.5Cognis: Context-Aware Memory for Conversational AI Agents

Cognis：为对话AI代理提供上下文感知记忆系统。

ArXiv cs.CL#llm #memory #conversational-ai[Context Engineering]

7.5Researchers Uncover 10 In-the-Wild Indirect Prompt Injection Attacks - Infosecurity Magazine

研究人员发现10种针对AI代理的间接提示注入攻击。

infosecurity-magazine.com#prompt-injection #security #ai-agents[Agent Harness]

7.0AI scientists produce results without reasoning scientifically

研究发现AI科学家产出结果但缺乏科学推理过程。

ArXiv cs.AI#llm #scientific-reasoning

7.0Can We Locate and Prevent Stereotypes in LLMs?

研究LLM中刻板印象的定位与预防方法。

ArXiv cs.CL#llm #bias #fairness[Post-Training]

7.0Towards a societal AI alignment benchmark for evaluating human ...

提出面向社会的AI对齐基准，用于评估人类价值观对齐。

nature.com#ai-alignment #benchmark #safety[Evals]

6.5Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations

提出可视化语言模型生成分布的方法，超越单输出评估。

ArXiv cs.AI#llm #visualization #evaluation[Evals]

6.5How Adversarial Environments Mislead Agentic AI?

研究对抗环境如何误导工具集成型AI代理。

ArXiv cs.AI#agent #adversarial #tool-use[Tool Use]

6.5Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

量化LLM中认知-修辞失调的框架。

ArXiv cs.CL#llm #calibration #rhetoric

6.0From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS

构建神经符号基准与流水线，将自然语言转换为可执行逻辑。

ArXiv cs.AI#neuro-symbolic #reasoning #benchmark[Planning]

6.0CoAuthorAI: A Human in the Loop System For Scientific Book Writing

CoAuthorAI：人机协同的科学书籍写作系统。

ArXiv cs.CL#llm #writing #human-in-the-loop

6.0New Conversational AI Leverages Trusted Medical Protocols to Guide Users on When to Seek Care - Bioengineer.org

新对话AI利用医疗协议指导用户何时就医。

bioengineer.org#healthcare #conversational-ai

6.08 inputs → 58 body params: putting a body-model forward pass inside the training loss [P]

小MLP从8个输入预测58个人体参数，将前向传播嵌入训练损失。

Reddit r/MachineLearning#mlp #body-model #computer-vision

5.5Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline

结合AI与Lean 4实现形式化验证的专利分析框架。

ArXiv cs.AI#formal-verification #patent-analysis

5.5PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models

PR-CAD：基于LLM的渐进式可控文本到CAD生成。

ArXiv cs.CL#llm #cad #generation

5.0On Solving the Multiple Variable Gapped Longest Common Subsequence Problem

研究多变量间隙最长公共子序列问题的求解方法。

ArXiv cs.AI#algorithm #sequence-analysis

> Engineering & Resources

9.0A pelican for GPT-5.5 via the semi-official Codex backdoor API

OpenAI发布GPT-5.5，通过Codex API可用。

Simon Willison#openai #gpt-5.5 #model-release[Model Release]

9.0OpenAI says its new GPT-5.5 model is more efficient and better at coding

OpenAI称GPT-5.5更高效且编码能力更强。

theverge.com#gpt-5.5 #coding #efficiency[Model Release]

8.7zilliztech/claude-context

代码搜索 MCP 工具，让 Claude Code 将整个代码库作为上下文。

GitHub trending:all (+1011★)#mcp #code-search #context-engineering[Context Engineering][Coding Agents]

8.5Tencent Releases Hy3 preview - Open Source 295B 21B Active MoE

腾讯发布Hy3 preview，开源295B参数21B活跃的MoE模型。

Reddit r/LocalLLaMA#open-source #moe #tencent[Model Release]

8.0OpenAI Debuts ChatGPT for Clinicians Free Access - Let's Data Science

OpenAI推出ChatGPT临床版免费访问，并发布HealthBench专业基准。

letsdatascience.com#llm #healthcare #benchmark[Evals]

8.0Ling-2.6-1T Will Be Open Weights

Ling-2.6-1T模型将开放权重，1万亿参数50B活跃参数，承诺开源。

Reddit r/LocalLLaMA#open-source #large-model #moe[Model Release]

7.8anomalyco/opencode

OpenCode 是一个开源编码 agent，支持自主编程。

GitHub trending:typescript (+660★)#coding-agent #open-source[Coding Agents]

7.5An update on recent Claude Code quality reports

Anthropic发布Claude Code质量报告更新。

Simon Willison#claude #code #quality[Coding Agents]

7.5Deepseek has released DeepEP V2 and TileKernels.

DeepSeek发布DeepEP V2和TileKernels，优化MoE通信和内核。

Reddit r/LocalLLaMA#deepseek #moe #open-source

7.5huggingface/ml-intern

Hugging Face 开源 ML 工程师，可读论文、训练模型并部署。

GitHub trending:all (+720★)#open-source #mlops #automation[Agent Harness]

7.5Alishahryar1/free-claude-code

免费使用 Claude Code 的工具，支持终端、VSCode 和 Discord。

GitHub trending:all (+1962★)#claude-code #free #coding-agent[Coding Agents]

7.4AIDC-AI/Pixelle-Video

AI 全自动短视频生成引擎，快速制作视频内容。

GitHub trending:python (+992★)#video-generation #automation #ai

7.2vercel-labs/skills

Vercel 发布 Skills，一个开放 agent 技能工具。

GitHub trending:typescript (+580★)#agent-skills #developer-tools[Agent Harness]

7.0KeygraphHQ/shannon

Shannon Lite 是一个自主 AI 渗透测试工具，用于 Web 安全。

GitHub trending:typescript (+711★)#ai-security #pentesting #autonomous

7.0Google Introduces Unique AI Agent Identities in New Gemini Enterprise Platform - Infosecurity Magazine

Google在Gemini Enterprise中引入独特AI代理身份。

infosecurity-magazine.com#ai-agents #enterprise #gemini[Agent Harness]

7.0Anthropic’s Mythos breach was humiliating | The Verge

Anthropic的Mythos数据泄露事件引发羞辱。

theverge.com#anthropic #security #breach

7.0US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models?

美国政府备忘录关注对抗性蒸馏，可能收紧开源模型。

Reddit r/LocalLLaMA#policy #open-source #distillation[Post-Training]

7.0Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

用户分享使用PI Coding Agent搭配本地Qwen3.6 35b模型的真实体验，效果出乎意料地好。

Reddit r/LocalLLaMA#local-llm #coding-agent #qwen[Coding Agents]

7.0mattmireles/gemma-tuner-multimodal

Gemma Tuner Multimodal：在Apple Silicon上微调Gemma多模态模型。

Co-Starred#fine-tuning #gemma #multimodal[Model Release]

6.9TorchTPU: Running PyTorch Natively on TPUs at Google Scale

Google发布TorchTPU，原生在TPU上运行PyTorch。

HN (49)#pytorch #tpu #google[Model Release]

6.7mksglu/context-mode

上下文窗口优化工具，减少 AI 编码代理 98% 的 token 消耗。

GitHub trending:all (+238★)#context-window #optimization #coding-agent[Context Engineering]

6.5SAP and Google Cloud Join Forces to Revolutionize AI-Driven Marketing - wwd.com

SAP与Google Cloud合作，用Gemini Enterprise驱动AI营销。

wwd.com#enterprise #multi-agent #gemini[Agent Harness]

6.5OpenSimula — open implementation of Simula-style mechanism design for synthetic data (in AfterImage) [P]

OpenSimula：开源实现Simula机制设计用于合成数据生成。

Reddit r/MachineLearning#synthetic-data #open-source #python

6.4HKUDS/RAG-Anything

一站式 RAG 框架，简化检索增强生成应用构建。

GitHub trending:all (+590★)#rag #framework #open-source[Context Engineering]

6.2badlogic/pi-mono

Pi-mono 是一个 AI agent 工具包，含编码 agent CLI。

GitHub trending:typescript (+444★)#agent-toolkit #cli[Coding Agents]

6.1cline/cline

自主编码代理，可在 IDE 中创建文件、执行命令、使用浏览器。

GitHub trending:all (+123★)#coding-agent #autonomous #ide[Coding Agents]

6.0[AINews] Tasteful Tokenmaxxing

AI领导者关于token使用的讨论总结。

Latent Space#llm #token #discussion

6.0Era computer raises $11M to build a software platform for AI gadgets - TechCrunch

Era Computer融资1100万美元，构建AI小工具软件平台。

techcrunch.com#ai-gadgets #funding

6.0Bret Taylor's Sierra buys YC-backed AI startup Fragment | TechCrunch

Bret Taylor的Sierra收购YC支持的AI初创Fragment。

techcrunch.com#acquisition #ai-startup

6.0An Overnight Stack for Qwen3.6–27B: 85 TPS, 125K Context, Vision — on One RTX 3090 | by Wasif Basharat | Apr, 2026

一篇教程展示如何在单张 RTX 3090 上运行 Qwen 3.6 27B。

Reddit r/LocalLLaMA#qwen #deployment #tutorial

6.0Why are we actually sampling reasoning and output the same way?

讨论推理和输出采样方式是否应不同，涉及多语言推理问题。

Reddit r/LocalLLaMA#reasoning #sampling #multilingual[Planning]

6.0Built a normalizer so WER stops penalizing formatting differences in STT evals! [P]

构建归一化器解决WER在STT评估中因格式差异导致的惩罚问题。

Reddit r/MachineLearning#stt #wer #normalization

5.7crewAIInc/crewAI

编排角色扮演自主 AI 代理的框架，促进协作智能。

GitHub trending:python (+148★)#multi-agent #framework #collaboration[Agent Harness]

5.7Show HN: Agent Vault – Open-source credential proxy and vault for agents

Agent Vault：开源AI代理凭证代理和保险库。

HN (68)#agent #security #open-source[Agent Harness]

5.5Rilian Raises $17.5 Million for AI-Native Security Orchestration - SecurityWeek

Rilian融资1750万美元，用于AI原生安全编排。

securityweek.com#security #funding #ai-agents

5.5Meet Noscroll, an AI bot that does your doomscrolling for you | TechCrunch

AI机器人Noscroll帮你自动刷社交媒体，替代无意识滚动。

techcrunch.com#ai-agent #social-media #automation

5.5Qwen 3.6 27B is a BEAST

用户盛赞 Qwen 3.6 27B 在笔记本上的表现。

Reddit r/LocalLLaMA#qwen #local-llm #experience

5.4microsoft/ai-agents-for-beginners

微软推出的 AI Agent 入门教程，共 12 课。

GitHub trending:all (+208★)#tutorial #ai-agents #beginners[Agent Harness]

5.1phodal/routa

Routa 是一个工作区优先的多 agent 协调平台。

GitHub trending:typescript (+19★)#multi-agent #coordination[Agent Harness]

5.1coreyhaines31/marketingskills

为 Claude Code 和 AI 代理提供营销技能，包括 CRO、SEO 等。

GitHub trending:all (+285★)#marketing #claude-code #skills

5.0AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

AIE Europe回顾与Agent Labs观点，非今日新闻。

Latent Space#agents #conference[Agent Harness]

5.0Extract PDF text in your browser with LiteParse for the web

LlamaIndex发布LiteParse，浏览器端PDF文本提取。

Simon Willison#pdf #llamaindex #open-source

5.0Fere AI Raises $1.3M to Put a Self-Improving Trading Agent in Everyone's Hands - GlobeNewswire

Fere AI融资130万美元，开发自改进交易代理。

globenewswire.com#trading-agent #funding

5.0Compared QWEN 3.6 35B with QWEN 3.6 27B for coding primitives

用户对比 Qwen 3.6 35B 与 27B 的编码性能。

Reddit r/LocalLLaMA#qwen #coding #comparison

5.0Are there actually people here that get real productivity out of models fitting in 32-64GB RAM, or is that just playing around with little genuine usefulness?

社区讨论32-64GB内存运行模型的实际生产力价值，用户分享使用场景。

Reddit r/LocalLLaMA#local-llm #hardware #productivity

5.0Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post

用户实验Qwen-3.6-27B配合推测解码的性能提升。

Reddit r/LocalLLaMA#qwen #speculative-decoding #local-llm

5.0Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]

用户寻求Transformer模型大小和推理优化建议，尝试剪枝等效果不佳。

Reddit r/MachineLearning#transformer #optimization #inference

[STATS] 73 items · 36 sources · Score >= 5.0