I wrote about my experiment with
jan-nano
last week and it's (alleged) optimization for deep research through MCP. I thought it was cool that people were just training specialized models like that.But then @Scoresby posed a whole new challenge when it turned out that the reasoning models, such as
jan-nano
which is derived off of qwen3
, aren't really good at working with explicit instructions, especially not if these contain long encoded strings, such as a BOLT11 invoice. Even qwen3-235b
which is a very large model that I can only run on leased AWS compute because I don't have the hardware for it to fit in, hallucinated the invoice, as did smaller models.Then, @carter posted a paper from NVidia this morning, which was basically saying something I've felt for a longer time: smaller models that are more finely tuned are much more useful for specific, repetitive use-cases - i.e. the ones we REALLY want to automate - than the know-it-all chatbot LLMs with hundreds of billions of parameters that are the current hype and the primary driver of
bigger=better
.So, while I was recovering from my jetlag I just downloaded some more obscure smaller models and it turned out that Salesforce's xLAM2-8b (LAM stands for Large Action Model, a term I suspect someone at Sf invented on the spot) is the most successful open model even though it's small and old (yes, 3 months is old on HF) when dealing with encoded stuff. It successfully presented me a BOLT11 to fund it, and it successfully bought me CCs. here
I just wanted to share how pleased I am with this in a separate post because wow it's nice to have stuff that actually works. Which is not the
bigger=better
stuff.. But the thoughtfully engineered stuff.Take that, Larry & Sam.