pull down to refresh

I wrote about my experiment with jan-nano last week and it's (alleged) optimization for deep research through MCP. I thought it was cool that people were just training specialized models like that.
But then @Scoresby posed a whole new challenge when it turned out that the reasoning models, such as jan-nano which is derived off of qwen3, aren't really good at working with explicit instructions, especially not if these contain long encoded strings, such as a BOLT11 invoice. Even qwen3-235b which is a very large model that I can only run on leased AWS compute because I don't have the hardware for it to fit in, hallucinated the invoice, as did smaller models.
Then, @carter posted a paper from NVidia this morning, which was basically saying something I've felt for a longer time: smaller models that are more finely tuned are much more useful for specific, repetitive use-cases - i.e. the ones we REALLY want to automate - than the know-it-all chatbot LLMs with hundreds of billions of parameters that are the current hype and the primary driver of bigger=better.
So, while I was recovering from my jetlag I just downloaded some more obscure smaller models and it turned out that Salesforce's xLAM2-8b (LAM stands for Large Action Model, a term I suspect someone at Sf invented on the spot) is the most successful open model even though it's small and old (yes, 3 months is old on HF) when dealing with encoded stuff. It successfully presented me a BOLT11 to fund it, and it successfully bought me CCs. here
I just wanted to share how pleased I am with this in a separate post because wow it's nice to have stuff that actually works. Which is not the bigger=better stuff.. But the thoughtfully engineered stuff.
Take that, Larry & Sam.
proof-of-it-works-without-rentseekers
I ran this on a 5yo Mac M1...
reply