E

Ethan Mollick

Wharton, AI + education, Co-Intelligence author

@emollick.bsky.social

Recent InterestsAI

Ethan Mollick is closely analyzing the capabilities and risks of advanced AI models like Claude Mythos, particularly regarding cybersecurity threats and their societal impact. He is also tracking the evolving landscape of frontier AI models and their implications for education and creative industries.

Recent Activity22 posts

Recent Activity

@emollick.bsky.social

Apr 7, 07:34 PM·❤️ 160🔄 13·💬 9

@emollick.bsky.social

SuperClaude (Mythos) still seems irreducibly Claude-y given the transcripts in the system card. Here two versions of Mythos are forced to talk to each other across multiple rounds. They are less philosophical than Opus 4.6 or spiritual than Opus 4.1, but still very Claude-like in personality.

Apr 7, 07:30 PM·❤️ 60🔄 4·💬 2

LLMEvaluation

@emollick.bsky.social

I was told about the Mythos release, but have no personal experience to add. Two points from brief: 1) It is not built for IT software security, it is just a good enough model that it is good at that too 2) This is the first, not last, model to raise security risks red.anthropic.com/2026/mythos-...

Apr 7, 06:12 PM·❤️ 100🔄 9·💬 7

SafetyLLM

@emollick.bsky.social

I suspect that popularity of AI is going to start looking like surveys where people trust their own doctors but are distrustful of the medical establishment People will increasingly like and rely on “their AI” but will increasingly be anxious about “AI” as a category. Some odd implications result.

Apr 7, 02:11 PM·❤️ 117🔄 7·💬 7

Safety

@emollick.bsky.social

In different hands, Mythos would be an unprecedented cyberweapon I am not sure how we deal with this, except to note a narrow window where we know only 3 companies could be at this level of capability. But it may be Chinese models (maybe open weights ones?) get there in 9 months.

Apr 8, 06:18 AM·❤️ 76🔄 8·💬 5

SafetyLLM

@emollick.bsky.social

I think the story that was shared in the Mythos System Card still has the signs of flawed LLM writing (which looks like good writing at a glance): A story that doesn't really hold together logically, but sounds like it should. The back-and-forth banter. Lack of characters. Writing fiction is hard!

Apr 8, 12:43 AM·❤️ 61🔄 1·💬 7

LLMEvaluation

@emollick.bsky.social

Our Lab just posted a new research report from Zimran Ahmed about how the game industry is adapting to AI. He spoke to people at 20 different studios and found a wide range of approaches to adapt (or failures to adapt) to AI at the organizational level. gail.wharton.upenn.edu/research-and...

Apr 10, 08:24 PM·❤️ 41🔄 6·💬 1

DeploymentEvaluation

@emollick.bsky.social

AI finally lets us see Raphael's The School of Athens the way Raphael obviously intended it, illustrating the delicate dance and subtle conflicts between Plato and Artistotle.

Apr 10, 07:16 PM·❤️ 104🔄 11·💬 10

Multi-modal

@emollick.bsky.social

So we now have a pretty good picture of the state of the frontier AI model makers. 1/ US closed source models continue to lead. Google, OpenAI, and Anthropic stand well ahead of every other lab, and may have some form of recursive self-improvement operating.

Apr 10, 04:00 PM·❤️ 74🔄 10·💬 3

LLMDeployment

@emollick.bsky.social

All is not lost. Duckerton is still possible. Here is Seedance 2.0 with the same prompt.

Apr 10, 04:18 AM·❤️ 107🔄 8·💬 7

LLM

@emollick.bsky.social

A lot of our education on writing well focuses on logic, clarity, and argument. AI will force us to think more about style. The boredom that comes from everything on the internet reading Claude-y now, no matter how good the substance is, should make us appreciate (and make us want to develop) style.

Apr 11, 01:52 PM·❤️ 93🔄 12·💬 10

LLM

@emollick.bsky.social

Neat experiment finds AI fact checks on Twitter are rated as more helpful & less ideological than human ones. "LLM-generated Community Notes can achieve broader cross-ideological acceptance than human-written notes, receiving more positive ratings from raters across the political spectrum"

Apr 11, 02:52 AM·❤️ 40🔄 2·💬 6

LLMEvaluation

@emollick.bsky.social

Currently, ChatGPT has the best way of viewing thinking traces, a short summary of steps in the main window, and a detailed audit in the sidebar if you want it Claude does almost as well, but more summarized and harder to see calculations and code Its a big weak spot for Gemini in comparison

Apr 12, 07:20 PM·❤️ 30🔄 2·💬 5

LLMEvaluation

@emollick.bsky.social

It is notable that the hot debate in AI engineering is exactly which markdown files are most important to feed AI (skills, memory, tool instructions) and in which order to feed them to get the best output. Feels that this is likely a temporary state of affairs in the development of agents

Apr 12, 04:47 PM·❤️ 118🔄 8·💬 14

AgentTooling

@emollick.bsky.social

So the concern over Claude Mythos and cybersecurity seems warranted based on this independent assessment from the UK government. It was capable of the equivalent of 20 hours of expert human work autonomously. It is not an unexpected jump in capability, but it is big. www.aisi.gov.uk/blog/our-eva...

Apr 13, 10:05 PM·❤️ 78🔄 9·💬 3

AgentSafety

@emollick.bsky.social

The growing trend of treating all of AI as One Big Thing that always includes data centers & job changes & education changes & power & accelerating science & misinformation & national security & corporate control & medical use & etc is going to inevitably lead to some bad policy on all sides.

Apr 13, 08:54 PM·❤️ 86🔄 7·💬 5

Safety

@emollick.bsky.social

Six months ago, there was a lot of focus on the idea that the there would be a massive glut of unused computing power which could cause a recession as AI use plateaued. The "compute bubble" belief was absolutely everywhere. The degree to which this turned it wrong deserves some notice.

Apr 13, 04:27 PM·❤️ 101🔄 11·💬 5

Infra

@emollick.bsky.social

Soon, at each release of AI along the current capability curve, you will start to see large discrete jumps in ability in economically important areas, because the previous AI ability level in some aspect of the job bottlenecked progress. When bottlenecks are released, it looks like a leap forward.

Apr 14, 03:51 AM·❤️ 8🔄 1·💬 1

DeploymentEvaluation

@emollick.bsky.social

AI keeps getting better but the last time the shape of the jagged frontier changed radically was o1 & the Reasoner. A good mental model of the coming months is that models get extremely good at the things they are already quite good at (coding), but weaknesses will be similar (long form fiction)

Apr 14, 06:44 PM·❤️ 38🔄 3·💬 4

LLMEvaluation

@emollick.bsky.social

Interesting: "Currently, 38% of Americans live within 5 miles of at least one operational data center... Living near a data center doesn’t have much of an effect on public opinion about the facilities." From now on, it looks like most DCs will be rural, though. www.pewresearch.org/short-reads/...

Apr 14, 04:09 AM·❤️ 32🔄 3·💬 3

Infra

@emollick.bsky.social

Instead of the gold standard, we can, as a thought experiment, imagine an inference standard of exchange, the FLOP. (As opposed to tokens, this accounts for AI ability) With some AI help, I figure $1 buys roughly 10^17 managed-LLM inference FLOPs So that $4 coffee would cost half an exaFLOP, choom

Apr 15, 07:45 PM·❤️ 27🔄 0·💬 3

LLMInfra

@emollick.bsky.social

This is becoming a pattern in AI that makes talking about capabilities challenging. First, there are overstated claims (like the flubbed Erdos problems that were announced last year), then minor wins (AI helps with discovery) then breakthroughs. The first stage feels like (& often is) hype, but…

Apr 15, 05:10 PM·❤️ 63🔄 7·💬 7

Evaluation

22 posts · All time