pull down to refresh
I think the bigger companies that make models will eventually make some kind of agreement with the large scientific publishing companies so that their models are legally defensible. Those publishing companies will make some serious money, I think. In order to make the publishing companies satisfied, those deals will probably be sizeable, news-making in dollar terms.
reply
It's the worst outcome though? That would mean that the model companies gatekeep everything.
reply
I'm not saying it's the best model, I'm just saying I think it's likely. The big publishing companies love gatekeeping themselves as it protects their revenues. The knowledge is still gatekept right now, but by the publishers. The push towards more capable LLMs is strong. I think OpenAI and the other big firms will make some kind of deal for their AIs to legally access scientific journals, and whether it's a lump sum or per-usage deal, it's probably going to be billions of dollars flowing to scientific publishers.
reply
Also, based on a skim I did the other day of the current patterns, many publishers allow not-for-profit scanning and indexing for machine learning purposes. So there will likely be open source efforts that don't get sued as long as they open source their models and don't charge for them as products.
Yes.
make some kind of deal for their AIs to legally access scientific journals
That has already happened, so I think it makes it all the more important to figure out how to remove the gate keepers from the knowledge supply chain.
If you disrupt knowledge (that's what AGI is supposed to do, right?) only to insert yourself as a middleman, you're going to be in trouble. Especially since the rest of the world has working AI - even if it's mediocre, that just means it takes longer - to help them realize undermining it.
reply
Can you link/speak more about these deals with publishers? I'd be curious to read about scope of the deals
reply
reply
Cool, thank you. I should have googled myself before asking. For anyone else that's curious, I came across this:
reply
Those are mostly MSM publishers though - something that the AI companies chase because it's in their interest to buy the journalists. There is less threat coming from scientific publications (and both Anthropic and Meta have allegedly been training on libgen) so these deals are less visible.
ddg-mcp)memory-mcp)factsalone, much like their definition ofknowledge unithas a higher chance of hallucination than when you feed it the original text that hadn't had processing on it. I suspect that this is due to the way the generic LLM is trained, and I think that this is what causes the spread between lower and upper in their evaluation too: