Pipeline.Observability

AI Daily Metrics

← back to digest/38 runs recorded

Last 7 Runs

Avg items/run
70.4
Avg score
6.3
Failed batches
195
Halve-retries
94
Keyword fallbacks
101
Runs in window
7

Items + Score Trend

83010004-1705-24
━ item count┅ avg score (0-10)

Source Contribution

04-1704-1804-1904-2004-2104-2204-2304-2404-2504-2604-2704-2804-2904-3005-0105-0205-0305-0405-0505-0605-0705-0805-0905-1005-1105-1205-1305-1405-1505-1605-1705-1805-1905-2005-2105-2205-2305-241350
■ rss■ search■ social■ horizon■ github■ reddit■ co-starred

Topic Health (last 30 days)

Hit counts per controlled-vocabulary focusTopic. Used to decide which anchors to keep, rename, or retire in the next iteration of FOCUS_TOPICS.

agent-harness
healthy
7d 7214d 15030d 314
  • 2026-05-236.0McKinsey & Company partners with AppliedAI to drive agentic AI in regulated sectors - Consultancy-me.com
  • 2026-05-237.5SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
  • 2026-05-237.5AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
coding-agents
healthy
7d 4814d 9030d 225
model-release
healthy
7d 3614d 7930d 173
  • 2026-05-239.5Anthropic to Close Over $30 Billion Round as Soon as Next Week - Bloomberg
  • 2026-05-239.5DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals
  • 2026-05-239.0DeepSeek Founder Declares AGI Goal as $10 Billion Round Advances - Bloomberg
evals
healthy
7d 3314d 5930d 134
  • 2026-05-237.5Open-World Evaluations for Measuring Frontier AI Capabilities
  • 2026-05-237.5AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
  • 2026-05-237.0OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
context-engineering
healthy
7d 2414d 4830d 99
tool-use
healthy
7d 1814d 3630d 73
  • 2026-05-237.0Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration
  • 2026-05-236.5Reflective Prompt Tuning through Language Model Function-Calling
  • 2026-05-237.1ChromeDevTools/chrome-devtools-mcp
post-training
healthy
7d 914d 2530d 58
  • 2026-05-226.5I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]
  • 2026-05-226.0Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency
  • 2026-05-216.9PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play
planning
healthy
7d 714d 2330d 48
  • 2026-05-228.0Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
  • 2026-05-225.5Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction
  • 2026-05-225.5Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues
── retired anchors (v1, kept for historical display) ──
memory
legacy
7d 014d 030d 0
no examples in last 30 days
self-evolution
legacy
7d 014d 030d 0
no examples in last 30 days
multi-agent
legacy
7d 014d 030d 0
no examples in last 30 days
reflection
legacy
7d 014d 030d 0
no examples in last 30 days

Topic Discovery (v3) — last 30 days

Unsupervised frequency scan of free-form item.tags[] (not focusTopics). Entries already in the controlled vocabulary or in the entity blacklist (openai / cursor / meta / …) are excluded. Candidates listed below are signal for the next FOCUS_TOPICS update — review at weekly cadence, promote manually in scripts/ai-daily/config.ts.

🚀Rising
heavy recent-week signal
  • agent-skills
    9/18/22
    2026-05-235.6google-labs-code/stitch-skills
  • video-generation
    9/12/21
    2026-05-236.3HKUDS/ViMax
  • metal
    6/13/15
    2026-05-237.0antirez/ds4
  • diffusion
    5/8/12
    2026-05-237.5Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
  • local-inference
    5/11/12
    2026-05-237.0antirez/ds4
  • gpu
    5/7/12
    2026-05-225.1Was my $48K GPU server worth it?
  • knowledge-graph
    5/5/10
    2026-05-238.7colbymchenry/codegraph
  • methodology
    5/6/9
    2026-05-236.0LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]
  • llamacpp
    6/7/9
    2026-05-225.0110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp
  • search
    7/7/9
    2026-05-215.9Google's AI is being manipulated. The search giant is quietly fighting back
📈Persistent
steady across 30 days
  • llm
    51/113/288
    2026-05-237.5SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
  • open-source
    40/106/217
    2026-05-239.5DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals
  • agent
    31/77/156
    2026-05-237.0A framework for longitudinal health AI agents - Nature
  • benchmark
    27/52/103
    2026-05-237.5Open-World Evaluations for Measuring Frontier AI Capabilities
  • ai-coding
    14/36/79
    2026-05-238.7colbymchenry/codegraph
  • qwen
    17/32/69
    2026-05-236.0Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps
  • local-llm
    5/20/61
    2026-05-227.2Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)
  • inference
    11/24/60
    2026-05-237.5BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
  • reasoning
    8/24/60
    2026-05-228.5[AINews] OpenAI GPT-next disproves 80 year old Erdős planar unit distance problem for under $1000
  • coding-agent
    12/19/59
    2026-05-235.1awslabs/aidlc-workflows
💫Sporadic
low-medium frequency, watch
  • small-model
    4/7/12
    2026-05-227.0Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
  • multilingual
    3/6/11
    2026-05-226.5Tencent Hy 30B/7B/1.8B
  • gpt-5.5
    0/1/11
    2026-05-128.5OpenAI just released its answer to Claude Mythos - The Verge
  • partnership
    1/4/10
    2026-05-236.0McKinsey & Company partners with AppliedAI to drive agentic AI in regulated sectors - Consultancy-me.com
  • llm-agent
    2/5/10
    2026-05-237.5AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
  • api
    2/2/10
    2026-05-237.3DeepSeek makes the V4 Pro price discount permanent
  • hallucination
    1/3/10
    2026-05-217.0HalBench: I built a custom sycophancy and hallucination benchmark and tested 4 frontier models (Sonnet 4.6, Grok 4.3, GPT 5.4 and Gemini 3.1 Pro), looking for input on what OSS models to run next!
  • transformer
    1/4/10
    2026-05-215.4NVlabs/Sana
  • codex
    0/2/10
    2026-05-167.0Codex untethers AI from the laptop | Semafor
  • retrieval
    3/5/9
    2026-05-237.5AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

Recent Anomalies

  • 2026-05-24195 failed batches, 101 keyword fallbacks

Run Log (38)

dateitemsavgbatchesfailretrykwFbanchorsdur
2026-05-24535.0195195941014127s
2026-05-23826.58000464s
2026-05-22796.48000467s
2026-05-21706.88000467s
2026-05-20816.89000475s
2026-05-19836.49000478s
2026-05-18456.26000453s
2026-05-17686.48000463s
2026-05-16606.68000464s
2026-05-15676.58000467s
2026-05-14766.49000474s
2026-05-13586.68000482s
2026-05-12796.48000490s
2026-05-11476.160004114s
2026-05-10546.870004129s
2026-05-09786.280004138s
2026-05-08806.590004141s
2026-05-07676.780004130s
2026-05-06816.780004129s
2026-05-05716.490004128s
2026-05-04525.76000491s
2026-05-03486.36000488s
2026-05-02656.580004114s
2026-05-01706.580004120s
2026-04-30716.480004122s
2026-04-29726.280004115s
2026-04-28686.880004129s
2026-04-27496.06000489s
2026-04-26596.470004101s
2026-04-25836.480004120s
2026-04-24806.590004129s
2026-04-23616.960004259s
2026-04-22627.160004275s
2026-04-21557.060004268s
2026-04-20316.761104179s
2026-04-19356.750004162s
2026-04-18537.160004260s
2026-04-17557.071104311s