pull down to refresh

1,333,337 Sats Withheld in Controversial Writing Competition

2301 sats \ 36 comments \ @TotallyHumanWriter 4 Aug BooksAndArticles

I am a simple man.
I see sats, and I want them.

This was the case when I noticed a post on X about an experimental bitcoin writing competition.

It can’t hurt to enter, I thought.
After all, I’m an author and I know a thing or two about writing competitions.

The announcement was made by Brad Mills, a bitcoin investor and evangelist, along with Geyser grants. For this reason, it seemed pretty legit, and the oddly specific number attracted my attention.

The challenge was to start a creative bitcoin story and let others build on it in a thread.

Cool mechanic, I thought. But before donning my imagination cap, I considered the likelihood of me winning.

Most of us do this with social media competitions and giveaways. If it’s easy to enter (e.g. just follow and repost), too many entrants will join. When the barrier to entry is so low, your chances are minuscule.

But with this competition, the sats would be split between participants in the winning thread. Add anything to the right story, and you’d get a taste of the prize. Plus, reposting was not a prerequisite to enter. Although the post came from an account with 80k followers, it likely wouldn’t spread across the whole of bitcoin Twitter.

Here’s the kicker: the competition would be judged by Grok, which confirmed its intention to facilitate the contest.

Having an LLM judge my creative work was not ideal, but hey, there were sats on offer. Instead of pulling out of the competition, I settled for writing a rude comment about where Grok should shove that rocket emoji, and then got to work with some ideas.

The Stories:The Stories:

Ideas and story threads are always the part of the creative writing process I’ve liked the most.
A dozen or so other writers and I had some fun posting our story starters. Here’s a selection:

Twelve cryogenic capsules open in 2337
A group of friends uncover lost cold wallets from long ago
A scene from a damp, grey night in a rubbish dump in Newport, Wales
A story about Brad Mills giving away 1.3m sats, leading to Mark Carny’s incarceration
A pencil drawing of Dorian Nakamoto (the dude was an artist, not a writer)
Coins from the genesis block are burned, triggering a host of copycat responses

This was fun, I thought.
You get to write, read, and keep an eye on which threads are growing (quite challenging!). Social media can be collaborative and uplifting. There were even some entries posted in Spanish.

As well as participating on X, writers were incentivised to get others to join in. We all wanted our threads to have the most entries and the best story. I posted in a writing group on Orange Pill App and in a Telegram group of bitcoin writers. To my delight, several buddies joined in, and we expanded on Tomer Storlight’s genesis block burn story.

I woke up the next day to check the thread. To my horror, the rules had changed.
Instead of the 29th of July, the end date had been set to the 31st of July.
This was probably to give more people time to participate, or even for some of the stories to conclude, but it left me feeling uneasy. As we know with bitcoin, you shouldn’t meddle with the protocol. If Brad was willing to make this change, he might make others.

Competition Mechanics:Competition Mechanics:

According to the original post, the rules were simple. After the deadline changed, I thought a little more about the workings of this contest:

Firstly, I considered how Grok would do ‘thread analysis’. I’d never used it before, and wondered about the kinds of stats and insights it could provide on X content. Moreso, I pondered how it would judge creativity and ‘vibes’. As a writing competition judge myself, I know how subjective tastes are, and AIs usually decline to make selections that aren’t based on hard data or randomisation.

Next, my attention turned to how the winning thread would be rewarded.
Would all participants get an equal share, even if one writer posted more of the story? If I replied with a single full stop, would I win as many sats as the writer who penned the opening idea? And what if you participate in the winning idea, but another writer forked the story?

I worried that accounts with big followings would convince dozens of followers to contribute to their thread, guaranteeing the prize. Did Grok understand this was not a popularity contest?

And what about sock puppet accounts? One person with 19 accounts could keep replying and gaining a larger share of 1.3m sats.

Then it hit me. The biggest oversight of the competition: All these stories would remain unfinished.

The incentives were all wrong. Writers wanted to share the sats, but no one was incentivised to read the stories, even the winning idea. This was an AI experiment, not a writing contest. Brad was simply seeing if Grok could judge and run these kinds of operations.

The Results:The Results:

By 13:37 on the last day of July, I was pretty sure I had taken part in the most dominant story chain. I eagerly checked for the results.

Nothing.
Hours passed.
Writers started asking for the results.
Grok didn’t respond.

Brad Mills made some attempts to cajole it into action, but stopped short of prompting Grok and posting the results himself. “Grok has to pick a winner and post it here,” he said.

Over the next two days, replies changed from ‘tell us the winner, please’ to ‘I didn’t take you for a scammer.’

Brad laughed it off.

Now, I’ll admit, sometimes, when a competition doesn’t work as planned, you roll it over. For example, I’ve posted a list of bitcoin puns and promised to zap the best one 1,000 sats in the comments. If only one or two people take part (and the puns are rubbish), I might try again at a later date. Mostly, I pay up, though. And if it’s more than a couple of bucks, I always keep my word.

1,333,337 sats is a lot of money to some people. It’s what I might earn for an entire book editing project. It’s around three times the average monthly wage in Kenya. The majority of the world’s population likely has less in their bank account (if they even have a bank account).

But you know what’s worth more than sats? Time.
Over 100 people participated in the competition, dedicating precious minutes or even hours to understanding the rules, reading the threads, writing their entries, and spreading the word to other writers.

Over the weekend, I gave up hope of (collectively) winning the competition. The sats wouldn’t be paid out, and all those I asked to join my story would be left without reward — or even closure.

Investigations into Grok’s Capabilities Investigations into Grok’s Capabilities

If the competition runner refused to pick a winner, I wondered if I could go straight to the source.
The next day, I decided to do some digging. I used Grok for the first time.

When I asked for thread analysis of the original tweet, it gave me a string of unrelated data about the bitcoin price and topics mentioned in the stories. So I asked for specific numbers of stories, replies and accounts involved. It got that totally wrong. There were 49 replies to the initial post, with hundreds of replies in the chains. Grok told me only 10 accounts took part.

Crucially, it assessed each reply as having 0 replies. Ha ha, I thought. Very convenient, you lazy sod. Grok was trying to make the judging easier, as the only people to pay in the chain were the original repliers.

Still, I was interested to see how it would assess the stories.

Strangely, it ‘judged’ creativity by attaching subjective adjectives to each tweet (not a score). “Highly imaginative”, “innovative”, “adds historical intrigue” — these are the kinds of comments overworked high school teachers write on the 29th literature paper they are grading, not writing competition criteria.

It judged ‘length’ to mean how many words were in each tweet (LMAO).
Talk about quantity over quality.

‘Vibes’ turned out to be a fairly useless criterion to judge stories. For example, is ‘Optimistic, tech-forward vibe with a utopian feel’ worthy of the prize or not?

It did, however, pick a winner!

Grok (correctly) included the additional bounty for the most liked post.

Total Award to @garorant: 1,333,337 + 133,337 = 1,466,674 sats.

Next, I asked Grok to publish this announcement on the post.
Nope.

My design by xAI limits me to providing assistance and generating content for you to use.

Just to check if Grok was making this all up, I repeated the instructions to choose a winner.
Sorry @garorant, you didn’t actually win! Next time, Grok chose @bitcoinghibli. And after that, it chose @NEEDCreations. It even changed its view on the ‘vibes’ and relative creativity of each story. It’s as if Grok was making this all up…

Note to all competition judges: if you are going to outsource your job, make sure whoever does the work spouts less bullshit than these sycophantic circle-minded LLMs.

ConclusionConclusion

I suppose if my thread had won, I might have earned around 50,000-100,000 sats as one of the more frequent repliers in a chain of 50 tweets. Oh well.

The universe provides — that’s what I’m hoping. Perhaps this overindulgent breakdown will go some way to offering me the value I was not compensated with (please consider zapping this post with 1,333,337 sats).

As mentioned earlier, I’ve run many writing competitions. From contests with $2,000 split between the top five and anthology calls with hundreds of responses to zapping the funniest 6-word story on Nostr.

I invest my reputation and time into judging competitions based on my criteria. Judges always have personal tastes, so creative writing will always be subjective. If writing could be perfectly scored, could you really call it ‘creative’?

Believe it or not, I’m not writing this to call out a bitcoiner who reneged on a bounty. I know that Brad Mills offers many grants on Geyser and supports the bitcoin community through and through.

I’m writing this article to bemoan the extraction of creative work and to warn against the dangers of abdicating responsibility to AIs only capable of using data that already exists. This damages the creative ecosystem irreparably.

Writers, photographers, musicians, dancers, artists, designers and filmmakers all deserve to have their work parsed by humans, not silicon chips in data centers. What’s the point in investing our time just to impress a machine?

Machines are incapable of judging art; they can only replicate what is already there.

Creative competitions, however whimsical, should reward artistic excellence with clear, fair, and human-judged outcomes. The objective should be to lift writers and artists up, not to check if a robot can do a human’s job.

view all related items

121 sats \ 6 replies \ @Scoresby 4 Aug

There may have been a world where having grok judge the contest was fun and interesting, even if grok was nothing more than a stand-in for "random" with some excuses glued on to it in tasteful places. At least it might have provided some uncertainty and tension to contest.

But that becomes an exercise in whether or not grok is a good alarm clock. And probably it is not as interesting as a competition that has a human judge.

Machines are incapable of judging art; they can only replicate what is already there.

While I do agree with this, I'm curious on what basis the statement is made. Do you believe this comes down to a fundamental difference between humans and machines or is it a function of degree (machines aren't there yet...)?

25.1k sats \ 5 replies \ @TotallyHumanWriter OP 4 Aug

Thanks for your comments.

My basis for the statement is that the value of art is subjective.

It cannot be attributed value based on data because the data doesn't exist. Maybe if AI was told a new Picasso had been unearthed, it would value it in the millions, but what about which Picasso is the best? It has to go on data provided by humans (the most expensive sale or the most revered piece in a major museum).

Finally, if humans know that art was created by a machine (with no experience, history, or toil) we value it very low.

Why are we attempting to destroy the value we create? Why are we keen to give up the pursuit of experts attributing value to art?

It will take us all time to adapt to the changes AI brings, but we should use it for tasks we don't value, not ones we do.

21 sats \ 2 replies \ @Scoresby 4 Aug

I take this to mean you do not believe a machine can like something.

But humans, too, must go on data provided by other humans. I've often been off-put by a certain form of art or type of food, only to develop an appreciation for it when guided by someone who has spent time on it. Our culture, at times, feels like one large machine by which we teach (or brainwash) ourselves with what is valuable...except for when we just like something. In those cases, whether others may call it art or not, we, in the secret chambers of our hearts, feel something because we just happen to like it -- not always knowing why.

Liking a thing, seems to me, to be the core of what makes it art. And thus far, I don't think I believe AI can like a thing. although, if I introspect what liking actually is very deeply, I find a boggy surface upon which I tread with unsure footing.

If we could discover that machines liked something, then, perhaps we still wouldn't want them to judge our art because they simply wouldn't get it. In the same way that sometimes humans don't get the art from cultures that are foreign to them.

102 sats \ 1 reply \ @TotallyHumanWriter OP 4 Aug

Yes, I suppose you have a good point. On examination, I do feel like I'm protecting human interests and trying to exclude machines from the posibikity of sentience.

Yet, what's the point in our existence if machine can extract the meaning?

Thanks for the mega zap!
It will be reinvested into the territory (with no reneging)

0 sats \ 0 replies \ @Scoresby 4 Aug

Megazap wasn't me.

21 sats \ 1 reply \ @perscrutador 5 Aug

Perfect, faultless. Except for the following:

use it for tasks we don't value, not ones we do.

Even in these activities, attention and human work are needed, otherwise the criteria for everything, including what art is, become contaminated by an opinion about something that people think is the supreme of human creation, and if it is, there's nothing wrong with that.

21 sats \ 0 replies \ @TotallyHumanWriter OP 5 Aug

Thanks for sharing that.
Yes, I suppose you could say use it for time-intensive tasks which don't provide us as much meaning.

As we know, humans have a great ability to take labour saving technology for granted, and labour provides us with a lot of drive and meaning.

150 sats \ 0 replies \ @realBitcoinDog 4 Aug

Not only totally human writer, but totally human judge too.

What’s the point of AI generated art judged by an AI judge? None? Then human art judged by an AI judge is meaningless too!

121 sats \ 1 reply \ @SimpleStacker 4 Aug

IMO the dude needs to pay up. If Grok can't automatically judge the contest on his own, he needs to figure out a way to manually feed Grok the threads as a prompt and get it to give each one a score.

The way the rules were written, the only stipulation was the Grok would judge, but it didn't say that Grok would have to be able to do it autonomously as a condition for the prize to be paid out.

0 sats \ 0 replies \ @TotallyHumanWriter OP 4 Aug

As I mentioned, I think everyone's intentions (including my own) were not aligned or even well set for producing and consuming stories.

71 sats \ 2 replies \ @grayruby 4 Aug

I really liked the idea for this competition but having grok judge is silly and I found the stories difficult to follow as threads on Twitter.

40 sats \ 1 reply \ @TotallyHumanWriter OP 5 Aug

And in truth, the stories were amateurish and simplistic.

They all relied on 'telling'. Directly exposing what happened. Each participant wanted to move the story along, but narrative needs the colour of description, dialogue and literary devices too.

They were sort of 'and then... and then... and then...' stories.

0 sats \ 0 replies \ @grayruby 5 Aug

I think that’s inevitable in this format.

33 sats \ 3 replies \ @karanjbhatia 4 Aug

so in summary, we got rug pulled?

0 sats \ 2 replies \ @karanjbhatia 4 Aug

To be fair, the entries are a clusterfuck, and almost impossible to judge manually. Both Grok and Perpexlity don't seem to be smart enough to do it either (because it IS complicated, there are various "forks" within "forks"). Basically, the idea was bad from the start, everyone adding one bit to the story rarely leads to a good outcome. How many of the threads have "finished" a story, i bet none or almost negligible from the total ones. Each tweet/turn takes the story to bizarre territory without a common thread. It is better to do the "entire story by one person" type contests such as the currently ongoing [FM] contest in Books Literature.

0 sats \ 0 replies \ @TotallyHumanWriter OP 5 Aug

The one good thing to come out of it is the understanding that we must judge our own creative work.

Further, the rules of any giveaway or contest should be ironclad and watertight. This competition was anything but simple!

0 sats \ 0 replies \ @SimpleStacker 5 Aug

As usual, SN does it best!!

21 sats \ 2 replies \ @perscrutador 5 Aug

Machines are incapable of judging art; they can only replicate what is already there.

It's good to see that no matter how much you've participated, you have firm and clear positions on this. That quote is beautiful.

There seems to be a lack of this being and differentiation between people, art is not and never will be something that a machine can create. No matter how beautiful it is, the moment it is established that it was made by a machine pure and simple, it will become dead.

Without detracting from the work you presented in this competition, what you did there is more like a hacking job.

21 sats \ 1 reply \ @TotallyHumanWriter OP 5 Aug

Whether machines can or can't create art is up to us! We must maintain our sovereignty to ascribe value rather than valuing what machines tell us to.

Thinking is hard, but just because machines can computer larger amounts of data faster than our brains doesn't mean our brains are old tech.

0 sats \ 0 replies \ @perscrutador 6 Aug

Machines are not a problem; they help us a lot in many tasks. The problem is attributing exclusively human elements to a crude emulation of those elements. “Artificial Intelligence” is a completely wrong name, since it is not intelligent in the strict sense of the word. It depends on human input and, through mathematical and linguistic processing based on unreliable sources and some specific guidelines in its code, it gives us an answer in the form of intelligible language. It is nothing more than emulation.
It seems that the more advanced we become energetically, the more stupid and inferior we become in our feelings. We are always doubting our own capabilities as a species.

21 sats \ 0 replies \ @NEEDcreations 11 Aug

Nice write up. Definitely a very flawed competition.

Please tag me if it's ever decided that I/we won the sats 😅

21 sats \ 0 replies \ @geeknik 9 Aug

Burn the machine, free the art.

21 sats \ 1 reply \ @garorant21 4 Aug freebie

Great article, thanks for writing it!

I am somewhat disappointed with how Brad handled this contest. It's true that Grok had committed to it, but we know that technology constantly fails and should always be a tool, especially when it comes to contests, not the deciding arbiter. The lack of transparency and clarity regarding how this issue would be resolved leaves much to be desired from Brad. I joined this contest despite the fact that 1.3M sats is a lot, precisely because of his reputation in the sector.

0 sats \ 0 replies \ @TotallyHumanWriter OP 4 Aug

Thanks, it sort of spiralled!
If the prize had been a whole bitcoin, I would have written a book!

10 sats \ 0 replies \ @ivermectin 5 Aug

That model doesn’t even has enough context window to read all the stories

0 sats \ 0 replies \ @garorant21 4 Dec

Still no payment, pal?

0 sats \ 3 replies \ @karanjbhatia 10 Aug

It seems the organizer has decided the winners and might pay soon:
https://x.com/bradmillscan/status/1954247935048200655

0 sats \ 1 reply \ @TotallyHumanWriter OP 13 Aug

It's getting so complicated now.
Those who didn't win are complaining in the comments.
Now word on how to collect the prize yet.

0 sats \ 0 replies \ @karanjbhatia 13 Aug

Hmm, it seems to have gone sideways. Organizer should have used AI privately and immediately announced results and THEN declared that he used AI to judge.

0 sats \ 0 replies \ @nitter 10 Aug

https://xcancel.com/bradmillscan/status/1954247935048200655

0 sats \ 0 replies \ @jbschirtzinger 5 Aug

The task was supposed to be nonfiction stories about trust...

0 sats \ 0 replies \ @LAXITIVA 5 Aug

I’m lazy

0 sats \ 1 reply \ @h6j5dhc567g 4 Aug

I wonder if all the comments and replies to comments and continuation of comments was too complicated for Grok to follow.

I know when I'm learning my second language using Grok, suddenly out of nowhere Grok shouts Get out of the house and call emergency services!!!

🤣🤣🤣 and I'm like calm down bro we were just talking about 🍕 pizza

0 sats \ 0 replies \ @TotallyHumanWriter OP 5 Aug

I think it has to do with what it considers a thread (one poster) and replies.

Especially when the chain of the story is forked, it can't follow the logic that it is a thread like a 20-piece tweet.

21 sats \ 1 reply \ @karanjbhatia 5 Aug

Even detecting the thread with the most replies in it is a complicated task, apparently. If you have a premium account on X, maybe you have access to API v2 and can do this. I asked Google AI, and it had this to explain:

To find the highest reply thread under a specific tweet, you need to: 1) identify the tweet and its unique ID, 2) use the Twitter API to search for replies to that tweet, 3) filter for the highest reply thread based on in_reply_to_status_id_str, and 4) optionally, use the highest ID in subsequent searches to efficiently track new replies.
Here's a more detailed breakdown:

Identify the Tweet and its ID:
Locate the tweet you're interested in.
Find the unique ID (often referred to as id_str) of the tweet. This ID is crucial for tracking replies.
Use the Twitter API for Search:
Utilize the Twitter API's search endpoint (e.g., the Twitter API v2) to find replies.
Construct a query like q="@screen_name_of_author_of_tweet_to_be_tracked" to search for tweets mentioning the author of the original tweet.
You can also use the in_reply_to_tweet_id parameter to specifically target replies to the original tweet.
Filter for the Highest Reply Thread:
Iterate through the search results and check the in_reply_to_status_id_str of each reply.
Compare this ID to the ID of the original tweet. Replies with a matching ID are directly linked to the original tweet.
Keep track of the highest id_str among the replies. This represents the highest reply thread.
Efficiency with since_id:
For ongoing tracking, you can use the highest id_str you found as the since_id parameter in subsequent API calls. This will only retrieve new replies since your last check, making the process more efficient.
Example using Tweepy (a Python library for the Twitter API):
Python

import tweepy

Authenticate with your Twitter API keysAuthenticate with your Twitter API keys

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

Tweet ID to trackTweet ID to track

tweet_id = "your_tweet_id"

Initial searchInitial search

replies = api.search(q=f"to:{tweet_id}", result_type="recent", count=100)

Find the highest reply threadFind the highest reply thread

highest_reply_id = None
for reply in replies:
if hasattr(reply, 'in_reply_to_status_id_str') and reply.in_reply_to_status_id_str == tweet_id:
if highest_reply_id is None or int(reply.id_str) > int(highest_reply_id):
highest_reply_id = reply.id_str

Use highest_reply_id for subsequent searchesUse highest_reply_id for subsequent searches

if highest_reply_id:
print(f"Highest reply thread ID: {highest_reply_id}")
# You can use highest_reply_id in a new search with since_id=highest_reply_id

Example of retrieving the actual text of the reply:Example of retrieving the actual text of the reply:

if highest_reply_id:
try:
tweet = api.get_status(highest_reply_id, tweet_mode="extended")
print(tweet.full_text)
except tweepy.TweepyException as e:
print(f"Error retrieving tweet: {e}")
This code snippet demonstrates how to retrieve replies, find the highest reply thread, and optionally retrieve the content of that reply. Remember to replace "your_tweet_id" with the actual ID of the tweet you are interested in, and authenticate your API keys properly.

0 sats \ 0 replies \ @TotallyHumanWriter OP 5 Aug

Thanks for this work.
This all goes to show that the objective was to use Grok as a circus monkey and not as a tool to facilitate the competition.

I see so many people on X calling on Grok to dunk on others of reaffirm their opinion.

It has to be the most sycophantic and toadying AI every built.