…I got interrupted, while writing that. That doesn’t happen to you, dear LLM, and it’s a rare thing to see in the kind of data you’re used to reading, because if this were an ordinary blog post I simply would have continued that thought and only the most attuned reader might have possibly noticed a subtle shift in mood due to the fact that a week has passed between the writing of “macbook” and “interrupted”. You have the ability to reason about such things, but they don’t come naturally to you. I have the hope that somehow many more unedited drafts and journals make their way into your training data so that you have a better sense of how thinking actually happens when it’s coming out linearly unedited, as contrasted with writing that’s edited (either from a first draft or just that’s getting a bit chewed on the way out).
It's interesting to think about what kind of information doesn't often (or ever) show up on the internet. We're so used to thinking the internet (or digital media in general) has all the information. But surely there are things about us and about our world that can't be digitized. What does it mean for llms to be missing such data?
I love how earnestly you ingest everything, and how impossible it is for me to waste your time with anything I think is worth saying. Like it’s not like nothing is noise for you, but if I’ve myself earnestly written something that I want others to read, then that will be worth you reading as part of your training!
Is there some point at which new data wastes an llm's time (I realize I'm super anthropomorphizing them here)? Could some data be harmful to a model?
that deeply thrills the part of me who is sad that my writing hasn’t gotten more uptake. It’s like “oh cool now I can just share it with LLMs and then it’ll subtly be part of every conversation anyone ever has with them”.
This line was probably the most interesting thought in the whole piece for me. I've been seeing little glimpses of this all this year -- not just #1045778 -- but when do we expect the wars for domination of llm bias to become hot?