Pipeline.Observability

AI Daily Metrics

← back to digest/38 runs recorded

Last 7 Runs

Avg items/run

70.4

Avg score

6.3

Failed batches

195

Halve-retries

Keyword fallbacks

101

Runs in window

Items + Score Trend

━ item count┅ avg score (0-10)

Source Contribution

■ rss■ search■ social■ horizon■ github■ reddit■ co-starred

Topic Health (last 30 days)

Hit counts per controlled-vocabulary focusTopic. Used to decide which anchors to keep, rename, or retire in the next iteration of FOCUS_TOPICS.

agent-harness

healthy

7d 7214d 15030d 314

2026-05-236.0McKinsey & Company partners with AppliedAI to drive agentic AI in regulated sectors - Consultancy-me.com
2026-05-237.5SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
2026-05-237.5AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

coding-agents

healthy

7d 4814d 9030d 225

2026-05-238.7colbymchenry/codegraph
2026-05-238.3anthropics/claude-plugins-official
2026-05-237.5can1357/oh-my-pi

model-release

healthy

7d 3614d 7930d 173

2026-05-239.5Anthropic to Close Over $30 Billion Round as Soon as Next Week - Bloomberg
2026-05-239.5DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals
2026-05-239.0DeepSeek Founder Declares AGI Goal as $10 Billion Round Advances - Bloomberg

evals

healthy

7d 3314d 5930d 134

2026-05-237.5Open-World Evaluations for Measuring Frontier AI Capabilities
2026-05-237.5AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
2026-05-237.0OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

context-engineering

healthy

7d 2414d 4830d 99

2026-05-236.0MemTensor/MemOS
2026-05-236.0Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps
2026-05-235.7plastic-labs/honcho

tool-use

healthy

7d 1814d 3630d 73

2026-05-237.0Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration
2026-05-236.5Reflective Prompt Tuning through Language Model Function-Calling
2026-05-237.1ChromeDevTools/chrome-devtools-mcp

post-training

healthy

7d 914d 2530d 58

2026-05-226.5I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]
2026-05-226.0Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency
2026-05-216.9PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

planning

healthy

7d 714d 2330d 48

2026-05-228.0Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
2026-05-225.5Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction
2026-05-225.5Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues

── retired anchors (v1, kept for historical display) ──

memory

legacy

7d 014d 030d 0

no examples in last 30 days

self-evolution

legacy

7d 014d 030d 0

no examples in last 30 days

multi-agent

legacy

7d 014d 030d 0

no examples in last 30 days

reflection

legacy

7d 014d 030d 0

no examples in last 30 days

Topic Discovery (v3) — last 30 days

Unsupervised frequency scan of free-form item.tags[] (not focusTopics). Entries already in the controlled vocabulary or in the entity blacklist (openai / cursor / meta / …) are excluded. Candidates listed below are signal for the next FOCUS_TOPICS update — review at weekly cadence, promote manually in scripts/ai-daily/config.ts.

🚀Rising

heavy recent-week signal

agent-skills
9/18/22
2026-05-235.6google-labs-code/stitch-skills
video-generation
9/12/21
2026-05-236.3HKUDS/ViMax
metal
6/13/15
2026-05-237.0antirez/ds4
diffusion
5/8/12
2026-05-237.5Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
local-inference
5/11/12
2026-05-237.0antirez/ds4
gpu
5/7/12
2026-05-225.1Was my $48K GPU server worth it?
knowledge-graph
5/5/10
2026-05-238.7colbymchenry/codegraph
methodology
5/6/9
2026-05-236.0LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]
llamacpp
6/7/9
2026-05-225.0110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp
search
7/7/9
2026-05-215.9Google's AI is being manipulated. The search giant is quietly fighting back

📈Persistent

steady across 30 days

llm
51/113/288
2026-05-237.5SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
open-source
40/106/217
2026-05-239.5DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals
agent
31/77/156
2026-05-237.0A framework for longitudinal health AI agents - Nature
benchmark
27/52/103
2026-05-237.5Open-World Evaluations for Measuring Frontier AI Capabilities
ai-coding
14/36/79
2026-05-238.7colbymchenry/codegraph
qwen
17/32/69
2026-05-236.0Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps
local-llm
5/20/61
2026-05-227.2Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)
inference
11/24/60
2026-05-237.5BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
reasoning
8/24/60
2026-05-228.5[AINews] OpenAI GPT-next disproves 80 year old Erdős planar unit distance problem for under $1000
coding-agent
12/19/59
2026-05-235.1awslabs/aidlc-workflows

💫Sporadic

low-medium frequency, watch

small-model
4/7/12
2026-05-227.0Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
multilingual
3/6/11
2026-05-226.5Tencent Hy 30B/7B/1.8B
gpt-5.5
0/1/11
2026-05-128.5OpenAI just released its answer to Claude Mythos - The Verge
partnership
1/4/10
2026-05-236.0McKinsey & Company partners with AppliedAI to drive agentic AI in regulated sectors - Consultancy-me.com
llm-agent
2/5/10
2026-05-237.5AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
api
2/2/10
2026-05-237.3DeepSeek makes the V4 Pro price discount permanent
hallucination
1/3/10
2026-05-217.0HalBench: I built a custom sycophancy and hallucination benchmark and tested 4 frontier models (Sonnet 4.6, Grok 4.3, GPT 5.4 and Gemini 3.1 Pro), looking for input on what OSS models to run next!
transformer
1/4/10
2026-05-215.4NVlabs/Sana
codex
0/2/10
2026-05-167.0Codex untethers AI from the laptop | Semafor
retrieval
3/5/9
2026-05-237.5AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

Recent Anomalies

2026-05-24195 failed batches, 101 keyword fallbacks

Run Log (38)

date	items	avg	batches	fail	retry	kwFb	anchors	dur
2026-05-24	53	5.0	195	195	94	101	4	127s
2026-05-23	82	6.5	8	0	0	0	4	64s
2026-05-22	79	6.4	8	0	0	0	4	67s
2026-05-21	70	6.8	8	0	0	0	4	67s
2026-05-20	81	6.8	9	0	0	0	4	75s
2026-05-19	83	6.4	9	0	0	0	4	78s
2026-05-18	45	6.2	6	0	0	0	4	53s
2026-05-17	68	6.4	8	0	0	0	4	63s
2026-05-16	60	6.6	8	0	0	0	4	64s
2026-05-15	67	6.5	8	0	0	0	4	67s
2026-05-14	76	6.4	9	0	0	0	4	74s
2026-05-13	58	6.6	8	0	0	0	4	82s
2026-05-12	79	6.4	8	0	0	0	4	90s
2026-05-11	47	6.1	6	0	0	0	4	114s
2026-05-10	54	6.8	7	0	0	0	4	129s
2026-05-09	78	6.2	8	0	0	0	4	138s
2026-05-08	80	6.5	9	0	0	0	4	141s
2026-05-07	67	6.7	8	0	0	0	4	130s
2026-05-06	81	6.7	8	0	0	0	4	129s
2026-05-05	71	6.4	9	0	0	0	4	128s
2026-05-04	52	5.7	6	0	0	0	4	91s
2026-05-03	48	6.3	6	0	0	0	4	88s
2026-05-02	65	6.5	8	0	0	0	4	114s
2026-05-01	70	6.5	8	0	0	0	4	120s
2026-04-30	71	6.4	8	0	0	0	4	122s
2026-04-29	72	6.2	8	0	0	0	4	115s
2026-04-28	68	6.8	8	0	0	0	4	129s
2026-04-27	49	6.0	6	0	0	0	4	89s
2026-04-26	59	6.4	7	0	0	0	4	101s
2026-04-25	83	6.4	8	0	0	0	4	120s
2026-04-24	80	6.5	9	0	0	0	4	129s
2026-04-23	61	6.9	6	0	0	0	4	259s
2026-04-22	62	7.1	6	0	0	0	4	275s
2026-04-21	55	7.0	6	0	0	0	4	268s
2026-04-20	31	6.7	6	1	1	0	4	179s
2026-04-19	35	6.7	5	0	0	0	4	162s
2026-04-18	53	7.1	6	0	0	0	4	260s
2026-04-17	55	7.0	7	1	1	0	4	311s