Naomi Saphra

ML/NLP professor

@nsaphra.bsky.social @NaomiSaphra

Recent Activity16 posts · 62 x-posts

Recent Activity

grep SOURCE=

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 22, 04:31 AM

Highlights: Naomi Saphra humorously comments on a space for scientific discourse, offering to share images of herself.

Worth reading: Shows her playful side and engagement with the scientific community.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a

May 22, 04:31 AM

Highlights: Naomi Saphra announces her upcoming faculty position at Boston University in 2026, excited about their work on language model interpretability.

Worth reading: Highlights her career move and research focus on LM interpretability.

LLMEvaluation

Naomi Saphra

@nsaphra.bsky.social

I have been thinking about this in light of Anthropic’s recent verbalization interp paper. It had no evidence convincing me that their verbalizations are faithful, but they are convincingly useful. Even wrong output can stimulate human creativity and increase the entropy of exploration.

May 21, 10:47 PM·❤️ 22🔄 2·💬 2

Safety

Naomi Saphra

@nsaphra.bsky.social

Maybe it is a good day to go to the Whitney in NYC and look at The Rose. It is very big. A human spent 8 years painting it. She made it too big and couldn't get it back out her door. They had to cut a hole in the wall and forklift it out.

May 21, 05:26 PM·❤️ 28🔄 3·💬 3

Naomi Saphra

@NaomiSaphra

New preprint! Phase transitions! We love to see them during LM training.

May 21, 04:33 AM

Highlights: Announces a new preprint about phase transitions in language model training.

Worth reading: Relevant for researchers interested in training dynamics and phase transitions in LLMs.

LLMFine-tuning

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a

May 21, 04:33 AM

Highlights: Announces new faculty position at Boston University focusing on LM interpretability.

Worth reading: Highlights career move and BU's research initiatives in interpretability.

EvaluationSafety

Naomi Saphra

@nsaphra.bsky.social

my new literary award cannot be won by a commercial frontier LLM because I will require that 10% of each submission is smut

May 21, 01:06 AM·❤️ 15🔄 0

Evaluation

Naomi Saphra

@nsaphra.bsky.social

I won't claim this is the most embarrassing social media post I made as a teenager, but it may be the most confusing

May 20, 11:08 PM·❤️ 60🔄 1·💬 4

Naomi Saphra

@nsaphra.bsky.social

I tried to make the theory work out but the computer devil kept lying to me (ChatGPT generated incorrect proofs)

May 20, 08:46 PM·❤️ 10🔄 0

LLMEvaluation

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 20, 04:30 AM

Highlights: Naomi Saphra announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a key development in causal interpretability research.

Evaluation

Naomi Saphra

@NaomiSaphra

Waiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account. Accepting ML/NLP PhD students.

May 20, 04:30 AM

Highlights: Bio/profile text indicating Saphra's role as a faculty member accepting PhD students and their stance on Grok.

Worth reading: Provides context on Saphra's professional status and interests.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a

May 20, 04:30 AM

Highlights: Announcement of starting as faculty at Boston University in 2026, focusing on LM interpretability.

Worth reading: Important career update and indication of research focus.

Evaluation

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 19, 04:29 AM

Highlights: Saphra humorously comments on a space for scientific discourse with self-deprecating tone.

Worth reading: Shows Saphra's playful side and engagement with scientific community.

Evaluation

Naomi Saphra

@NaomiSaphra

Perfect cute light very short read for a break in a deadline crunch.

May 19, 04:29 AM

Highlights: Saphra recommends a short, light read for a break during intense work.

Worth reading: Provides insight into Saphra's reading recommendations and work-life balance.

LLM

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 18, 04:32 AM

Highlights: Sarcastic comment about using images of oneself for scientific discourse.

Worth reading: Shows Saphra's humorous take on social media as a platform for science.

Safety

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

May 18, 04:32 AM

Highlights: Announces starting as faculty at Boston University in 2026.

Worth reading: Key career milestone for a prominent AI interpretability researcher.

Evaluation

Naomi Saphra

@nsaphra.bsky.social

I will ALWAYS read the youtube comments

May 18, 01:39 AM·❤️ 18🔄 1·💬 1

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 17, 04:28 AM

Highlights: Announces a new preprint on causal interpretation, emphasizing its coherent definition and testable predictions.

Worth reading: Relevant for understanding recent advances in mechanistic interpretability of language models.

EvaluationSafety

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 16, 04:16 AM

Highlights: Naomi Saphra humorously comments on a space for scientific discourse, starting with images of herself.

Worth reading: Shows her playful engagement with the scientific community on X.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

May 16, 04:16 AM

Highlights: Naomi Saphra announces her upcoming faculty position at Boston University in 2026.

Worth reading: Highlights a major career milestone for a prominent AI researcher.

LLM

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 15, 04:26 AM

Highlights: Naomi Saphra humorously comments on using images of herself for scientific discourse.

Worth reading: Shows her playful engagement with the platform.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

May 15, 04:26 AM

Highlights: Naomi Saphra announces her new faculty position at Boston University starting in 2026.

Worth reading: Key career milestone for a prominent AI researcher.

LLM

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 14, 04:23 AM

Highlights: Naomi Saphra humorously comments on using images of herself in a scientific discourse space.

Worth reading: Shows her playful engagement with online scientific communities.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

May 14, 04:23 AM

Highlights: Announces her upcoming faculty position at Boston University in 2026.

Worth reading: Key career milestone for a prominent AI researcher.

LLM

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 13, 04:23 AM

Highlights: Naomi Saphra announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a new contribution to causal interpretability in AI, a key area for understanding model behavior.

Evaluation

Naomi Saphra

@NaomiSaphra

RT @natolambert: A few facts, while the dust is settling. Ai2 still is... - releasing open models, folks want to,

May 13, 04:23 AM

Highlights: Naomi Saphra retweets a post about AI2 continuing to release open models despite industry changes.

Worth reading: Shows engagement with open model releases and industry dynamics.

Infra

Naomi Saphra

@nsaphra.bsky.social

I don't really understand this. If you just want a job---at any career point as an AI academic, student, faculty whatever---you can make ~10x the money and work 9-6 without meandering chats with colleagues.

May 13, 03:00 AM·❤️ 8🔄 0·💬 1

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the ...

May 12, 04:20 AM

Highlights: Describes her research focus on training dynamics and mechanistic interpretability in NLP models.

Worth reading: Summarizes her research interests.

LLMEvaluation

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 12, 04:20 AM

Highlights: Sarcastic comment about scientific discourse, possibly related to AI.

Worth reading: Shows her critical perspective on AI discourse.

Safety

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 11, 04:31 AM

Highlights: Naomi Saphra announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a key advancement in AI interpretability research.

Evaluation

Naomi Saphra

@NaomiSaphra

Ok, I wrote this up (link below)

May 11, 04:31 AM

Highlights: Naomi Saphra shares that she has written up something, presumably a blog post or paper.

Worth reading: Indicates a new written work by the researcher.

LLM

Naomi Saphra

@nsaphra.bsky.social

I had intended to see The Sheep Detectives (2026) (Rated PG) until Jill Lepore panned its inaccurate portrayal of animal cognition.

May 10, 04:17 PM·❤️ 27🔄 3·💬 2

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 10, 04:25 AM

Highlights: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a new contribution to AI interpretability research.

LLMEvaluation

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 9, 04:14 AM

Highlights: Naomi Saphra announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights ongoing work in AI interpretability, a key area for understanding and trusting AI models.

SafetyEvaluation

Naomi Saphra

@nsaphra.bsky.social

this is a very neat initiative

May 9, 02:39 AM·❤️ 11🔄 0

Naomi Saphra

@nsaphra.bsky.social

Goodfire released a megapost of all the random feature geometry stuff they're finding, and it's worth a read

May 8, 01:11 PM·❤️ 112🔄 22·💬 2

SafetyEvaluation

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 8, 04:08 AM

Highlights: Naomi Saphra humorously comments on using self-images to initiate scientific discourse.

Worth reading: Shows her playful engagement with the X platform.

Safety

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

May 8, 04:08 AM

Highlights: Naomi Saphra announces her new faculty position at Boston University starting in 2026.

Worth reading: Highlights her career move into academia.

LLM

Naomi Saphra

@nsaphra.bsky.social

May 8, 02:12 AM·❤️ 3🔄 0

EvaluationSafety

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 7, 04:20 AM

Highlights: Sarcastic comment about scientific discourse on social media.

Worth reading: Reflects her critical perspective on online discussions.

Safety

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

May 7, 04:20 AM

Highlights: Describes her research focus on NLP model training and emergent behaviors.

Worth reading: Summarizes her research interests in interpretability.

LLMFine-tuning

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 5, 04:08 AM

Highlights: Announces a new preprint on causal interpretation, emphasizing its coherent definition and testable predictions.

Worth reading: Relevant for researchers interested in mechanistic interpretability and causal methods in AI.

Evaluation

Naomi Saphra

@NaomiSaphra

Ok, I wrote this up (link below)

May 5, 04:08 AM

Highlights: Indicates a write-up on a topic, with a link to further content.

Worth reading: May contain insights on AI training or interpretability, given Saphra's research focus.

LLM

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

May 5, 04:08 AM

Highlights: Describes research focus on NLP model training and emergence of mechanistic behaviors.

Worth reading: Provides context on Saphra's research interests in mechanistic interpretability and training dynamics.

LLMEvaluation

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

May 4, 04:22 AM

Highlights: Naomi Saphra describes her research focus on understanding and improving NLP model training, specifically how structures and mechanistic behaviors emerge.

Worth reading: Provides insight into the author's research interests in mechanistic interpretability and training dynamics.

LLMFine-tuningEvaluation

Naomi Saphra

@NaomiSaphra

Naomi Saphra (@nsaphra). 237 likes. New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 4, 04:22 AM

Highlights: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a new contribution to causal interpretability, a key area in AI safety and mechanistic understanding.

SafetyEvaluationLLM

Naomi Saphra

@NaomiSaphra

Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for

May 4, 04:22 AM

Highlights: Naomi Saphra shares an experience of receiving a desk reject after rebuttals due to a paper being submitted to arXiv less than 30 minutes late.

Worth reading: Illustrates the strictness of conference deadlines and the challenges in academic publishing.

EvaluationFine-tuning

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

May 3, 04:22 AM

Highlights: Sarcastic comment about using images of herself for scientific discourse.

Worth reading: Shows her humorous take on online scientific discussions.

Evaluation

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 2, 04:09 AM

Highlights: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a contribution to mechanistic interpretability, a key area in AI safety.

Safety

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

May 2, 04:09 AM

Highlights: Describes research focus on training dynamics and emergence of structures in NLP models.

Worth reading: Provides insight into the researcher's expertise in mechanistic interpretability and training.

LLMFine-tuning

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

May 1, 04:28 AM

Highlights: Naomi Saphra describes her research focus on understanding and improving NLP model training, particularly how structures and mechanistic behaviors emerge.

Worth reading: Provides insight into the research interests of a prominent NLP/ML researcher.

Fine-tuning

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

May 1, 04:28 AM

Highlights: Announces a new preprint on causal interpretability, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a new contribution to causal interpretability in ML.

Evaluation

Naomi Saphra

@NaomiSaphra

This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!

May 1, 04:28 AM

Highlights: Naomi Saphra comments on a book about tuberculosis, noting its engaging start.

Worth reading: Shows a personal interest outside of AI/ML.

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

Apr 30, 04:19 AM

Highlights: Sarcastic comment about using images for scientific discourse.

Worth reading: Illustrates her humorous take on online discussions.

LLM

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

Apr 30, 04:19 AM

Highlights: Describes her research focus on NLP model training and emergent behaviors.

Worth reading: Summarizes her research interests.

LLMFine-tuning

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

Apr 29, 04:17 AM

Highlights: Sarcastic comment about using images of oneself for scientific discourse.

Worth reading: Shows her humorous take on online scientific communication.

Safety

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

Apr 29, 04:17 AM

Highlights: Announcement of starting as faculty at Boston University in 2026.

Worth reading: Highlights her career move into academia.

LLM

Naomi Saphra

@nsaphra.bsky.social

I had no idea the restricted isometry property (RIP) was dead 😔

Apr 28, 09:13 PM·❤️ 8🔄 0·💬 1

Evaluation

Naomi Saphra

@nsaphra.bsky.social

I got a call from the “assistant” of someone I have an existing business relationship with. it was a robot with fake office conversation and keyboard typing sounds in the background. Genuinely think it should be illegal to not immediately disclose it’s automated, especially with deceptive realism.

Apr 28, 07:47 PM·❤️ 37🔄 1·💬 2

Safety

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

Apr 28, 04:20 AM

Highlights: Sarcastic comment about using images for scientific discourse.

Worth reading: Shows her humorous take on academic discussions.

Safety

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

Apr 28, 04:20 AM

Highlights: Announcement of new faculty position at Boston University.

Worth reading: Highlights her career move and impact on AI research.

LLM

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

Apr 28, 04:20 AM

Highlights: Describes her research focus on NLP model training and mechanistic behaviors.

Worth reading: Summarizes her research interests in AI interpretability.

LLMFine-tuning

Naomi Saphra

@nsaphra.bsky.social

ok, this historical LM I actually think is trained with their claimed limitations. it doesn't know what commercials are (first attested in its modern use in the 30s)

Apr 27, 10:07 PM·❤️ 20🔄 0·💬 3

LLM

Naomi Saphra

@nsaphra.bsky.social

If you missed Sara's poster at #ICLR2026, the good news is you can still read her paper!

Apr 26, 02:21 PM·❤️ 10🔄 0

Naomi Saphra

@NaomiSaphra

what a perfect space for scientific discourse! I'll start off with a few images of myself

Apr 26, 04:13 AM

Highlights: Sarcastic comment about using images in a scientific discourse space.

Worth reading: Shows her humorous take on academic communication.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

Apr 26, 04:13 AM

Highlights: Announcement of joining Boston University as faculty in 2026.

Worth reading: Key career milestone for a prominent AI researcher.

LLM

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

Apr 26, 04:13 AM

Highlights: Describes her research focus on NLP model training and emergent behaviors.

Worth reading: Summarizes her research interests in mechanistic interpretability.

LLMEvaluation

Naomi Saphra

@nsaphra.bsky.social

Found out something cool while trying to figure out what just happened at the WHPCD today. Did not figure out what happened at the WHPCD. But I’m always happy to celebrate palliative care specialists, the unsung heroes of the hardest medical cases.

Apr 26, 01:07 AM·❤️ 8🔄 1·💬 1

Naomi Saphra

@NaomiSaphra

This book starts like it's gonna be a fun microhistory of TB (it gave us the Stetson!

Apr 25, 04:02 AM

Highlights: Naomi Saphra comments on a book about tuberculosis, noting its engaging start.

Worth reading: Shows her casual reading interests.

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

Apr 25, 04:02 AM

Highlights: Announces a new preprint on causal interpretation, emphasizing its coherence and testability.

Worth reading: Highlights her research focus on causal interpretability.

Evaluation

Naomi Saphra

@NaomiSaphra

New preprint! Phase transitions! We love to see them during LM training.

Apr 25, 04:02 AM

Highlights: Announces a new preprint about phase transitions in language model training.

Worth reading: Relevant to understanding training dynamics of LLMs.

LLM

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

Jan 1, 12:00 AM

Highlights: Naomi Saphra announces starting as faculty at Boston University in 2026.

Worth reading: Shows career move and continued involvement in academia.

LLMEvaluation

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

Jan 1, 12:00 AM

Highlights: Naomi Saphra announces she will join Boston University as faculty in 2026.

Worth reading: Shows career move of a prominent ML/NLP researcher.

LLMFine-tuning

Naomi Saphra

@NaomiSaphra

New preprint! Everyone loves causal interp. It's coherently defined! It makes testable predictions

Apr 27, 12:00 AM

Highlights: Naomi announces a new preprint on causal interpretation, emphasizing its coherent definition and testable predictions.

Worth reading: Highlights a new contribution to mechanistic interpretability, a key area in AI safety.

SafetyEvaluation

Naomi Saphra

@NaomiSaphra

Life update: I'm starting as faculty at Boston University in 2026! BU ...

Mar 27, 12:00 AM

Highlights: Naomi Saphra announces she will join Boston University as faculty in 2026.

Worth reading: Shows a career milestone for a prominent AI researcher.

LLM

Naomi Saphra

@NaomiSaphra

Ok, I wrote this up (link below)

Mar 25, 12:00 AM

Highlights: Naomi references a write-up she authored, likely a blog post or paper.

Worth reading: Indicates a new piece of writing, possibly expanding on her research.

Evaluation

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

Nov 9, 12:00 AM

Highlights: Naomi Saphra describes her research on understanding and improving NLP model training, focusing on emergent structures and mechanistic behaviors.

Worth reading: Provides insight into her research focus on mechanistic interpretability in language models.

LLMSafety

Naomi Saphra

@NaomiSaphra

I work on understanding and improving training for NLP models, with a focus on studying how structures and mechanistic behaviors emerge over the

Nov 9, 12:00 AM

Highlights: Naomi describes her research focus on training dynamics and emergence of mechanistic behaviors in NLP models.

Worth reading: Provides context on her research agenda in mechanistic interpretability and training dynamics.

LLMEvaluation

16 posts · 62 x-posts · All time