So it seems that the world is changing and the need for quality data for AI models is increasing all the time. If large platforms keep restricting access to their platforms APIs this will make AI creators look for other data-sources.
In one sense we are currently creating this data-source for them, completely free of use and it is called "nostr". Just imagine the amount of data accessible to anyone if nostr hade the same volume as twitter. I see lots of benefits with nostr, don't get me wrong, however that nostr completely removes all form of control over who can access content could be bad in the long run. The protocol is by design open and impossible to lock down.
What do you think? Am I overplaying the risk or is this actually something that has to be thought about?
I believe Nostr already has a way how to deal with that: paid relays. Perhaps if scraping increases cost of running relay too much, more of them will require a few sats from their users. And they can require more than a few sats from those scrapers.
reply
This. They should charge in the first place anyways.
The main problem is the lack of support for auth based reads. We see writes being locked down on paid but the scraping problem is a read problem.
reply
It's a problem that can only be solved with relaying systems and session accounting. And it needs to be designed to be low latency as well to minimise propagation delay.
reply
I am thinking that there can be a positive side to it: if AI models are based on data from Nostr rather than data from Twitter, then the cultural values from the Nostr community will be embedded into the AI that is later used by the mainstream. It can be an opportunity rather than a risk.
reply
It's even more than this. It breaks all their assumptions about the "intellectual property rights" they usually claim ownership of in their TOS. The one that everyone noticed when Impervious left it in the boilerplate TOS.
Promiscuous systems fill vacuums, and nostr is very promiscuous. Like bitcoin in this way too. The less it tells you how to do things the more you can do with it and the more people find value in it. Censorship is an invisible tourniquet on the blood flow to the head, to use a medical metaphor.
reply
Nostr I personally think do not have much data as of now which companies might prefer to train data on, it would take way to long for it to be have data set comparable to twitter/tumblr/reddit.
But I have a doubt here:
---> The way you referred scraping as a problem, assuming the nostr is scraped through damus, then the damus as a website might slow down. Will its effect be also on other clients in someway? Like Plebstr, Amythest would also be affected?
Although the centralist system has a problem, decentralized system covers it way more but possess another problems. IMO this is how life and systems work, but in the current situation decentralized would be much, much better if it does not become some sort of bot-house.
reply
Instead of website for your damus example you mean relay, and it would affect any clients accessing the overloaded relay. Of course priority can be given to paid users and there can be other fixes. You're right though decentralized systems have their own issues. It's nice to be here before scaling has been the biggest one.
reply
“… that nostr completely removes all form of control over who can access content could be bad in the long run”
But WHY do you think this? You’re saying it could be bad, but I don’t see an argument. Encryption may solve privacy, otherwise why shouldn’t the data be accessible?
reply
An AI trained on nostr would be an interesting beast. Pro-Bitcoin, pro-Austrian economics, anarchist. I wouldn't mind that being unleashed on the world.
reply
haha, that is totally true, didn't think that xDDDDDDDdd
reply
nostr is not for dick pics to your girlfriend. A nostr relay is a webscrapper by design. So no problem. Just don't document your crack and hooker habit on it with photos.
can't find it, but SethForPrivacy has a good article on nostr security
reply
𝐇𝗼𝐰𝐝𝐲 𝐝𝗼 ? 🤠 👋
reply
Nostr data is stored in the relays. If a relay is public, its data can be scrapped by anyone. If the relay has restrictions, data won't be publicly available. Paid relays are just one way of restrict access to data/resources. Also, could be relays allowing access to data not just for a payment, but using a trusted keys configuration. A relay allowing scrapping on demand by a fee is the way how we can finally sell our own data
reply
I think completely open is the best way to go. Have we ever even seen that before? What are the risks?
reply
What problem could it bring ? We are all anon on nostr so what would be the "dangers" of opening this database to AI training ?
reply
We are not entirely anon. We identify ourselves with public keys. And public keys can be linked to identities, like emails or domains.
It is not just about being anon. It is that all this data owned by us is being used for free to build services that will cost us money. It sounds better to me that we can restrict access to our data and if we want, we could sell it to AIs and scrappers.
reply
So far the internet has be almost free for al of us. Think, and think again at all the mind-blowing services we've got access to for no money (GPS, countless Apps, storage, social medias etc...) We have to be aware of the fact that these web companies succeed in monetizing things that would be worthless otherwise (our personal datas) thus creating massive amount of wealth. Honestly I'm not sure that a paying internet would have have so much success and I'm not sure it would ne so big today without this kind of monetization (a shame for poor developing countries)
Without big brother authoritarian governments spying on us through these big tech companies, I wouldn't care so much sharing my datas. Big corporations are not the problem, government gathering these datas are IMO
reply