pull down to refresh

Someone in a bitcoin founder group chat, who wouldn't mind me naming them but won't to demonstrate my capacity for prudence đź’…, asked:
To those of you allowing open code contributions: How are you balancing newcomer training with drive-by LLM solution filtering? Some of these contributions serve as little more than a distraction.
My response which sort of evades the practical part of the question:
We incentivize contributions which should make it worse and we try to make it really easy to get started (clone and run command), yet real contributions out number slop ones by a good margin still. (We probably also benefit in having a frivolous product and the incentives make FOSS PRs part of regular dev life which might amortize any single distraction.)
I have a nearly positive take on ai contributions after getting quite a few: it’s a source of development diversity that a closed source team might struggle to find, ie maybe we are underutilizing AI dev tools internally and we might have gaps filled by cyborg contribs that’d otherwise be left unfilled.
That said we have one contributor that stretches ai way beyond their ability to self-review and it’s annoying af. Yet, again, they occasionally surface an issue (one in ten hit rate or so) that we hadn’t considered and make wading through all their slop worth it. They are annoying and unskilled but earnest. Human, cyborg, or machine, earnestness is where I draw the line between distraction and not.
My answer to the practical part might be that we don't really balance it and instead just lower priority based on historical slop and earnestness ratios.
Would you answer differently?
177 sats \ 6 replies \ @optimism 20h
Overall good answer, I think, though it really depends on what you're developing and with what language. I've accepted exactly 1 LLM contribution, but it was on a python repo, after the author cleaned up all the effing emojis.
More annoying to me still is people who feed issues to grok or gpt and then try to reap credit in issue comments. It's been happening more often since recent releases, and people apparently believe that maintainers don't notice the slop, or that they went zero to hero in 10 minutes using terminology they don't understand. People also get really upset if I say something about it, or when I even dare asking "did you test this?". I'm getting even less liked than I already was, but whatevs.
What I don't really see anymore in my own projects is this part:
newcomer training
I've had zero newcomers since April or so, where normally I'd have 2-3 a month.
reply
it was surprising to me to see how many people submit PRs but say they haven't tested it. I try to test everything (as much as I can), even single line code changes, and wouldn't imagine submitting something for review without having done so.
reply
144 sats \ 0 replies \ @optimism 18h
Took me years to get rid of "LGTM" culture in reviews, which ultimately heightened review quality. Now the pressure is on the submitter because no one will want a fight when their review on an untested PR breaks shit
It sucks that FOSS development brings much stress and isn't always newcomer friendly but if you have a massive installed base you have to deliver quality work and avoidable debt on your main branch is going to hurt.
reply
100 sats \ 3 replies \ @sox 10h
I'm okay with AI contributions, to me the most important part is how your human brain arrived at the solution. I default to this.
how many people submit PRs but say they haven't tested it
But the problem is when they say they tested it when they didn't, which automatically invalidates any assumptions I could make about the usage of human logic. And I also get a little frustrated maybe because it seems that they don't value what they publish, while, like you, I get anxious even on single line code changes.
reply
100 sats \ 2 replies \ @optimism 9h
I get anxious even on single line code changes.
Unit tests often help with anxiety. "this was broken", see test. "now it's no longer broken", see test. "it has no negative impact on other features", see all the other tests.
reply
100 sats \ 1 reply \ @sox 9h
I gotta step up my tests game, I only make them temporarily and arbitrarily.
This is another lesson in discipline I have to give myself ngl
reply
If you have decent coverage, it helps with spotting regressions in the future. The worst thing that ever happened imho on some code I wrote was that I wrote tests, then the tests got commented out by someone else and I missed that in a PR review. Then of course there was a regression and there was drama.
reply
I don't know how I'd answer since I've never maintained a FOSS project.
But what stands out to me is that bots clearly aren't at the point yet where they can independently contribute to projects without human supervision.
reply
I think SN has a standout system for FOSS PRs that most other projects lack:
  • Disciplined maintenance of issue tracker: this takes a lot of time to do right, and OSS devs want to write code, not "be a PM", so the priorities of the project maintainer become illegible to outsiders looking to get involved.
  • Comprehensive dev environment: "works on my machine"-friction adds a lot of costs to collaboration. NB: i think this is what's holding back a lot of "drive by contributions", I dont think there's any AI coding system that can handle the docker environment without a lot of expert customization.
  • Consistent response time and courteousness: On other projects you'll often see a hostility to even earnest contributions, or it's a side project that got big while the maintainer has a full time job so PRs go un-reviewed for weeks or rejected for petty reasons.
  • Not too large, not too small rewards and tasks: ONE BIG REWARD type bounties (or hackathons) are good for marketing and attract lots of contributors but here, the median quality will be quite low.
  • The contributors are almost always users of the product: whereas for many other projects it's specialized software that each person uses differently in private on their own computer, e.g. MoviePy pakcage.
Overall this seems to hit the sweet spot of being welcoming and soliciting useful things. But the cost to duplicate these features for other projects also seems high.
Some other incentivized contributions I've come across that I think could be relatively AI-resistant are perf-based challenges like tinygrad bounties and kaggle competitions where a test-harness can be applied in a semi-automated to see if the contribution boosts accuracy or perf on some metric, and if so, then look into the code quality.
reply
44 sats \ 0 replies \ @k00b OP 21h
(Upon second reading, I'm realizing how important context is. High levels of self-deprecation and frustration with third parties are even more unseemly outside of close quarters.)
reply
Annoying and unskilled contributors with earnestness seem like a big problem.
reply
0 sats \ 1 reply \ @k00b OP 5h
This post explicitly says they aren't a big problem. Nuance exists.
reply
Got it, no biggie
reply
My answer to the practical part might be that we don't really balance it and instead just lower priority based on historical slop and earnestness ratios.
This may not be like a valve that you can throttle and open up again:
Crowding Out after Incentives Are Removed Incentives for Prosocial Behavior
i.e. if you lower the incentives the motivation can drop and remain permanently lower than it was before any incentives.
reply
0 sats \ 0 replies \ @sox 10h
How are you balancing newcomer training with drive-by LLM solution filtering?
I like the difficulty system of the bounties. It's like starting a new game and progressing with levels, imo you can't do a medium (or an easy!) if you can't do your good-first-issues first.
In my first days it was invaluable to me how this progression made me understand the codebase. So I vote for difficulties, that was a really good newcomer training for me.
reply