pull down to refresh

The problem with AI generated contributions to open source projects

3451 sats \ 10 comments \ @Scoresby 10h AI

BIP3 PR: No BIP content may be generated by AI/LLMs

I've been following BIP3's progress through the BIP review and activation process (#1281384). In the middle of October, one of the BIP editors submitted a PR called "BIP3: add guidance on originality, quality, LLMs." It was merged later that month.

The PR proposed adding these lines to BIP3:

A BIP may be submitted by anyone, provided it is the original work of its authors and the content is of high quality, e.g. does not waste the community's time. No content may be generated by AI/LLMs and authors must proactively disclose up-front any use of AI/LLMs. (emphasis added)

The last line about the use of AI/LLMs has sparked an interesting conversation on the Bitcoin Development Mailing List.

Luke Dashjr was the first to respond specifically to this prohibition, saying

AI/LLM usage disclosure is too much. As long as no content is LLM-generated, we should be fine.

Greg Maxwell disagreed:

If anything I think the AI disclosure is arguably too weak, particularly with the well documented instances of LLM psychosis and other weird influence and judgement compromising effects. The current text seems adequate to me and shouldn't be weakened further.

This disagreement is not terribly interesting, and might be put down to contagion from disagreements these two have in other areas. But after this, the conversation got much more interesting.

AI-assisted versus AI-generated

Dave Harding wrote an interesting note where he disagreed with Greg Maxwell, explaining that he used AI throughout his work flow and writing process to do such things as create todo lists, refine writing, and even create drafts based on his notes. Harding acknowledged:

I would, of course, review every word of the draft BIP before submitting it for consideration and ensure that it represented the highest quality work I was able to produce---but the ultimate work would be a mix of AI and human writing and editing.

Harding concluded:

The BIP process already requires high-quality content.[2] AI-generated content can be high-quality, especially if its creation and editing was guided by a knowledgeable human. Banning specific tools like AI seems redundant and penalizes people who either need those tools or who can use them effectively.

Is AI helping us discover brilliant BIPs in the rough?

To which Maxwell responded that it is clear that AI is a useful tool for people with proven track records, however:

the number of good submissions that could be made would hardly be increased by LLMs (being limited by expert proposers with good ideas) but the number of potential poor submissions is increased astronomically.

Harding challenged Maxwell on this asymmetry, pointing out that BIP125 might have been a case of a good idea that was delayed because nobody wanted to go through the formal process of writing a BIP for it.

Jon Carvalho agreed with this, adding that his own BIP177 was example of a BIP that would not have existed without LLM tools, and yet which seems (to my chagrin) be achieving some amount of adoption.

Or is AI making it harder to quickly spot ill-conceived ideas?

One point Maxwell made that resonated with me was this:

LLMs have generally created something of an existential threat to most open collaborations: Now its so easy to get flooded out by subtly worthless material.

While LLMs might help a few good ideas get turned into BIPs when they might otherwise have never gotten noticed, they allow many, many more bad ideas to clothe themselves in legitimacy and attempt to use the BIP editors to "get the ball rolling - then lean on the review process to get it right."

Open projects like Bitcoin don't necessarily suffer from a lack of good ideas. It is more likely that they suffer from a lack of time -- and it doesn't seem like AI can help with that.

Both are probably true, but Jon Atack is right

There is another asymmetry in this process similar to the one Maxwell described: LLMs may assist many people with good ideas in drafting BIPs, but they are less useful in helping people become effective BIP editors.

I've been spending way too much time on Bitcoin for years now. I've participated in Chaincode's Bitcoin Protocol Development Seminar and Bitshala's Learning Bitcoin from the Command Line seminar, and I've spent a heck of a lot of time researching various aspects of Bitcoin in the course of writing about it -- and I don't think I have anything close to the experience needed to do a good job evaluating potential BIPs. Having access to AI doesn't really change this for me.

LLMs have given all of us tools that make our ideas for how to make Bitcoin better look professional, but LLMs have not given us the tools to be experts on Bitcoin -- that still takes time and experience.

Jon Atack noted something important here:

In a few cases, the proposed fixes are useful.

In many others, they seem to be a waste of review/maintenance/moderation bandwidth and time, and are demotivating to deal with.

Not only does it waste time to have to read through a perfectly formatted dumb idea, but the discovery that what looked like a thoughtful and well-considered proposal blithely harbors contradictions and outright hallucinations is severely demotivating.

I've experienced a similar thing here on SN: I notice that I am less inclined to give a long post very much time because I'm worried I'll get halfway into it and realize it's just some prompt output that the author didn't even bother reading themselves.

Since the most limited resource here is experienced BIP editors, it makes sense to maintain a pretty hard line on the use of LLM generated text in BIPs.

view all related items

229 sats \ 3 replies \ @SimpleStacker 6h

SN has an AI disclosure statement but I'm not sure how much it helps. Those who use ai to produce slop probably won't admit it, and those that use AI in a more appropriate way just have to spend a few seconds writing a generic statement

I think the problem isn't AI per se. It may be that AI has made a group of people who previously wouldn't submit contributions because they didn't feel qualified, to all of a sudden start submitting despite still being unqualified. Maybe when more people learn that AI is not a magic bullet, and that it still requires some degree of human checking/input, that kind of behavior will stop?

100 sats \ 1 reply \ @Scoresby OP 4h

group of people who previously wouldn't submit contributions because they didn't feel qualified

I think I would describe the group differently: there is a group of people who just learned about bitcoin and want to fix it. They have not taken the time to delve into it's history very much, certainly not the history of its development (mailing list archives, github comments, etc...). They may not even be aware of the proper form a BIP should take, but they have learned that there is such a thing as a BIP and with Chat at their side they are able to produce something that superficially resembles a BIP draft submitted by an experienced Bitcoin developer who has done all the things mentioned above.

I do not believe these people care very much whether AI is a magic bullet or not, nor do they have the humility to realize that their great idea for a BIP should probably stew for a few years before they even consider submitting it.

I'm sure there are many others who are relatively new to Bitcoin or who have not been involved in Bitcoin development, yet who have excellent ideas -- but I also think that writing the BIP yourself is a manageable hurdle for such individuals.

102 sats \ 0 replies \ @optimism 4h

there is a group of people who just learned about bitcoin and want to fix it. They have not taken the time to delve into it's history very much, certainly not the history of its development

That group is continuously there and has been from the beginning in Bitcoin (and is also not unique in Bitcoin.) jgarzik has made no secret of him starting like that.

I think that the most harmful symptom in the above is "want to fix". As a conservative developer that doesn't want to fix what isn't broken, especially not something as fragile as protocol or consensus, resisting this is probably the most exhausting part of the job in the long term. And most people that have a yolo mindset keep it. One way or another, you'll always find them trying to change things, cradle to grave. I often feel that change is a greater good than stability to this group of developers.

However, these aren't the same group of people as vibe coders per se. There's some overlap and cross-pollination but I think that those are distinct groups. I've dealt with people like this in FOSS since the 90s

102 sats \ 0 replies \ @optimism 6h

Maybe when more people learn that AI is not a magic bullet, and that it still requires some degree of human checking/input, that kind of behavior will stop?

This is what I hoped for but it's only getting more frequent - at least for me. It's the marketing from the LLM service providers. They keep pitching their products in a harmful way ("Our AI product solves everything, and does it better than humans") because they need moneys to burn. It's a swindle but it won't stop if even Nvidia does +5% after a single simple statement of opinion from the CEO about AI.

260 sats \ 2 replies \ @optimism 9h

Since the most limited resource here is experienced BIP editors, it makes sense to maintain a pretty hard line on the use of LLM generated text in BIPs.

You don't want to be wrong or negligent though. The issue is that something flagged up by an LLM can be valid, and you don't want to dismiss the underlying issue. What I currently do on my own repos is:

Always judge for urgency: if something is urgent, process it regardless.
Smell for slop. if something has a slop smell, deprioritize and don't look at it unless there is nothing else to do. If it's a sloppy feature, just ignore it forever.
Ask questions. If someone cannot answer why propose something, deprioritize, or build an alternative non-sloppy solution and propose that.

Note that while this keeps most contributors that actually write code happy, it will piss off the vibe coders / talkers.

I've not yet made this a contribution guideline though. If someone wants to read and review slop, I let them. Though that rarely happens haha

162 sats \ 1 reply \ @Scoresby OP 9h

It will piss off the vibe coders / talkers.

I thought about including a paragraph or two on this: basically, yes.

I was at a bitdevs last night, and there was a little discussion of the whole BIP 444 thing and someone was going strong on this idea that the people working on Core should have accommodated the people who wanted the option to change their datacarriersize limit.

It feels the same as accommodating the people who want to be able to submit BIPs without having put in enough time and effort to be able to write the BIP on their own.

I don't see how Open Source equals Communially Owned, nor do I understand why people who work on an open source project should be responsive to me, a user.

They can be responsive, if they want, but they can also ignore users and perhaps users will go elsewhere. Perhaps there will be a vibe-coders' implementation of Bitcoin to which the vibe coders can all submit their BIPs. I don't think I'll run it, though.

I'd much prefer the market to determine who handles vibe coder submissions the best.

112 sats \ 0 replies \ @optimism 8h

someone was going strong on this idea that the people working on Core should have accommodated the people who wanted the option to change their datacarriersize limit.

Isn't that the state of the thing right now? I still don't understand the issue people take now that everything is as it was.

without having put in enough time and effort to be able to write the BIP on their own.

That's why deprioritizing is a thing. It's not no, but instead it balances the effort. If you expect High Quality Service, then you gotta propose High Quality Work.

I don't think I'll run it, though.

Not unless vibe coding suddenly becomes high quality coding, indeed.

135 sats \ 1 reply \ @SevenOfNine 8h

Personally, I don't think It matters what their stance is. People/bots pushing slop are just going to get better about hiding it and probably never disclose AI was used. While authors using AI to constructively review their work will responsibly disclose that AI was used to assist with the BIP, probably get lambasted and shamed for it the first time around and then fearfully choose not to disclose that AI was used ever again. This is a new skill that we need to learn to wield responsibly while simultaneously becoming better at identifying BS, slop proposals when we see them.

10 sats \ 0 replies \ @Scoresby OP 7h

while simultaneously becoming better at identifying BS, slop proposals when we see them.

I wonder how much better we will ever get. It seems possible to me that we might not get much better at identifying slop without thinking it through.

I agree that ultimately there's probably not much that can be done to stop people from using LLMs to draft subtly worthless material, but at least a qualification like this in the BIP allows editors to dismiss it without too much arguing with a person who doesn't necessarily understand the words in their own BIP.

202 sats \ 0 replies \ @0xbitcoiner 10h

I agree that using LLMs should be limited, but not completely cut out. It’d be kinda stupid not to use a good tool.