All People
SuperClaude (Mythos) still seems irreducibly Claude-y given the transcripts in the system card.
Here two versions of Mythos are forced to talk to each other across multiple rounds. They are less philosophical than Opus 4.6 or spiritual than Opus 4.1, but still very Claude-like in personality.
I was told about the Mythos release, but have no personal experience to add.
Two points from brief:
1) It is not built for IT software security, it is just a good enough model that it is good at that too
2) This is the first, not last, model to raise security risks red.anthropic.com/2026/mythos-...
I suspect that popularity of AI is going to start looking like surveys where people trust their own doctors but are distrustful of the medical establishment
People will increasingly like and rely on “their AI” but will increasingly be anxious about “AI” as a category. Some odd implications result.
In different hands, Mythos would be an unprecedented cyberweapon
I am not sure how we deal with this, except to note a narrow window where we know only 3 companies could be at this level of capability. But it may be Chinese models (maybe open weights ones?) get there in 9 months.
I think the story that was shared in the Mythos System Card still has the signs of flawed LLM writing (which looks like good writing at a glance): A story that doesn't really hold together logically, but sounds like it should. The back-and-forth banter. Lack of characters.
Writing fiction is hard!
Our Lab just posted a new research report from Zimran Ahmed about how the game industry is adapting to AI. He spoke to people at 20 different studios and found a wide range of approaches to adapt (or failures to adapt) to AI at the organizational level. gail.wharton.upenn.edu/research-and...
AI finally lets us see Raphael's The School of Athens the way Raphael obviously intended it, illustrating the delicate dance and subtle conflicts between Plato and Artistotle.
So we now have a pretty good picture of the state of the frontier AI model makers. 1/
US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of every other lab, and may have some form of recursive self-improvement operating.
All is not lost. Duckerton is still possible.
Here is Seedance 2.0 with the same prompt.
A lot of our education on writing well focuses on logic, clarity, and argument. AI will force us to think more about style. The boredom that comes from everything on the internet reading Claude-y now, no matter how good the substance is, should make us appreciate (and make us want to develop) style.
Neat experiment finds AI fact checks on Twitter are rated as more helpful & less ideological than human ones.
"LLM-generated Community Notes can achieve broader cross-ideological acceptance than human-written notes, receiving more positive ratings from raters across the political spectrum"
Currently, ChatGPT has the best way of viewing thinking traces, a short summary of steps in the main window, and a detailed audit in the sidebar if you want it
Claude does almost as well, but more summarized and harder to see calculations and code
Its a big weak spot for Gemini in comparison
It is notable that the hot debate in AI engineering is exactly which markdown files are most important to feed AI (skills, memory, tool instructions) and in which order to feed them to get the best output. Feels that this is likely a temporary state of affairs in the development of agents
So the concern over Claude Mythos and cybersecurity seems warranted based on this independent assessment from the UK government. It was capable of the equivalent of 20 hours of expert human work autonomously.
It is not an unexpected jump in capability, but it is big. www.aisi.gov.uk/blog/our-eva...
The growing trend of treating all of AI as One Big Thing that always includes data centers & job changes & education changes & power & accelerating science & misinformation & national security & corporate control & medical use & etc is going to inevitably lead to some bad policy on all sides.
Six months ago, there was a lot of focus on the idea that the there would be a massive glut of unused computing power which could cause a recession as AI use plateaued. The "compute bubble" belief was absolutely everywhere.
The degree to which this turned it wrong deserves some notice.
Soon, at each release of AI along the current capability curve, you will start to see large discrete jumps in ability in economically important areas, because the previous AI ability level in some aspect of the job bottlenecked progress. When bottlenecks are released, it looks like a leap forward.
AI keeps getting better but the last time the shape of the jagged frontier changed radically was o1 & the Reasoner.
A good mental model of the coming months is that models get extremely good at the things they are already quite good at (coding), but weaknesses will be similar (long form fiction)
Interesting: "Currently, 38% of Americans live within 5 miles of at least one operational data center... Living near a data center doesn’t have much of an effect on public opinion about the facilities."
From now on, it looks like most DCs will be rural, though. www.pewresearch.org/short-reads/...
Instead of the gold standard, we can, as a thought experiment, imagine an inference standard of exchange, the FLOP. (As opposed to tokens, this accounts for AI ability)
With some AI help, I figure $1 buys roughly 10^17 managed-LLM inference FLOPs
So that $4 coffee would cost half an exaFLOP, choom
This is becoming a pattern in AI that makes talking about capabilities challenging.
First, there are overstated claims (like the flubbed Erdos problems that were announced last year), then minor wins (AI helps with discovery) then breakthroughs.
The first stage feels like (& often is) hype, but…
E
Recent InterestsAI
Ethan Mollick is closely analyzing the capabilities and risks of advanced AI models like Claude Mythos, particularly regarding cybersecurity threats and their societal impact. He is also tracking the evolving landscape of frontier AI models and their implications for education and creative industries.
Recent Activity22 posts
Recent Activity
Ethan Mollick
@emollick.bsky.social
LLMEvaluation
Ethan Mollick
@emollick.bsky.social
SafetyLLM
Ethan Mollick
@emollick.bsky.social
Safety
Ethan Mollick
@emollick.bsky.social
SafetyLLM
Ethan Mollick
@emollick.bsky.social
LLMEvaluation
Ethan Mollick
@emollick.bsky.social
DeploymentEvaluation
Ethan Mollick
@emollick.bsky.social
Multi-modal
Ethan Mollick
@emollick.bsky.social
LLMDeployment
Ethan Mollick
@emollick.bsky.social
LLM
Ethan Mollick
@emollick.bsky.social
LLM
Ethan Mollick
@emollick.bsky.social
LLMEvaluation
Ethan Mollick
@emollick.bsky.social
LLMEvaluation
Ethan Mollick
@emollick.bsky.social
AgentTooling
Ethan Mollick
@emollick.bsky.social
AgentSafety
Ethan Mollick
@emollick.bsky.social
Safety
Ethan Mollick
@emollick.bsky.social
Infra
Ethan Mollick
@emollick.bsky.social
DeploymentEvaluation
Ethan Mollick
@emollick.bsky.social
LLMEvaluation
Ethan Mollick
@emollick.bsky.social
Infra
Ethan Mollick
@emollick.bsky.social
LLMInfra
Ethan Mollick
@emollick.bsky.social
Evaluation
22 posts · All time