Discussion about this post

User's avatar
Very Tired's avatar

Your article made me think of the "slipperiness" of the behavior of LLMs, where one cannot ever exactly predict what it will do, or explain why it did it.

Here is an example: "We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a 'student' model learns to prefer owls when trained on sequences of numbers generated by a 'teacher' model that prefers owls."

https://alignment.anthropic.com/2025/subliminal-learning/

Andrew's avatar

Thanks for sharing. I had heard of Ellul from Paul Kingsnorth but now I want to read Yudkowsky and Becker!

19 more comments...

No posts

Ready for more?