bwen-14b - benthecarman model trained on benthecarman \ stacker news

A voice + opinion clone of @benthecarman, finetuned from Qwen3-14B on his own tweets — with no synthetic / AI-written training text. Every completion is a real tweet; every prompt was hand-written by the author. The result is a model that answers in his blunt, opinionated, no-hedging register instead of a generic assistant tone.

@benthecarman trained a model to act like his twitter self. You can download it from the above link, but here I am quoting his post on how he did it at length because you could do it for yourself if you liked (from an X post describing the process):

It's a finetune of qwen3:14b (now bwen:14b). Getting there took a lot of trial and error, because you can't just feed it all your tweets and expect it to work. I tried that first and most of what came out was garbage.

My first real attempt was a project that would read all my tweets and ask an LLM to generate a prompt for each one to train on. Slight improvement, but it still didn't work. The generated prompts were either too specific or didn't carry enough context to prompt accurately, and it did this for every tweet, which produced a ton of slop.

@anthonyronning gave me the idea that I needed to trim the dataset down and only feed it tweets that reflect what I actually want the model to reflect. But I wanted this to stay a generic process, not just me cherry picking my greatest hits and shoving them in.

So instead I embedded all my tweets into a vector database and clustered them into subjects and themes. I took the biggest clusters, fed the highest-ranked tweets from each one to an LLM, and had it name each grouping. That gave me a giant list of everything I've tweeted about. However, this still wasn't usable.

I ended up with over 500 subjects, and there was a ton of overlap. I'd get separate groupings for things like Bitcoin fees and Bitcoin transactions, which over-saturated those topics and would've forced me to label way more tweets than I wanted to.

So I did a second round. I ran embeddings on the list of subjects (with a few sample tweets each) and collapsed them into a much smaller aggregate list: 554 → 51. That finally gave me a clean set of themes to pull training tweets from.

The next step was the longest. To actually train the model on these themes I needed prompts and I needed to generate them. The system pulls the highest-ranked tweets per theme and hands them to me one at a time to write a prompt. After about 300 of these, I had a big enough dataset to actually train on. It was important that I trimmed down the themes list beforehand as this gave me on average 6 tweets per theme to create prompts for. If I kept the original 554 this would have made me generate prompts for over 3,000 tweets to get the same distribution over all the themes.

From there it builds a LoRA from my prompt→tweet pairs. The same training run also mixes in a few thousand of my raw tweets with no prompt at all, so it learns to predict text the way I say things on twitter, so it actually reflects my voice, not just my answers.

This training process however can take a while on larger models. So at first iterated on using qwen3:1.7b until I had a good selection of themes, tweets, prompts, etc and felt the little model was close enough. Once I had that, I reran the training on qwen3:14b for the final output.

After all this I have some evals. Some of these I created myself just asking its opinion on bitcoin and things it should have clear answers on, but I also have it feed back in the prompts I created earlier to make sure the responses are similar. Everything model I trained always passed the evals so maybe they aren't the best but it was a good spot check that the process was working.

After all that, I've got a working model that talks and kinda thinks like me. And it gets even better when you pair it with that original tweet database, it uses your prompt to retrieve your most relevant real tweets, feeds them into the model's context, and answers from them. This makes it more accurate with your opinions while still keeping your voice.

Want to try it yourself? Run my model or build your own. The whole process is open source and should work for anyone.

repo

dataset

117 sats \ 3 replies \ @benthecarman fwd 21 Jun

thanks for sharing!

117 sats \ 1 reply \ @k00b 21 Jun

Do you have any planned application of it? Like putting a benthebotman/botthecarman on your website for folks to chat with?

129 sats \ 0 replies \ @benthecarman fwd 21 Jun

Not really, I had the idea and took me awhile to figure out how to do it properly. Now that I have a working thing I have no idea what to do with it lol

15 sats \ 0 replies \ @Scoresby OP 21 Jun

I hope I didn't steel your thunder! it's a fun project!

15 sats \ 0 replies \ @nitter 21 Jun

https://twiiit.com/benthecarman/status/2068796180339851429