pull down to refresh

This is one of the metrics to track!
That seems like a stat worth keeping an eye on — it really shows how AI’s being used. Each token is like a certain number of words, right?
I looked into that a bit, but I must be missing something.
Google says it now processes more than 1.3 quadrillion tokens every month with its AI models. But this headline number mostly reflects computing effort, not real usage or practical value, and it raises questions about Google's own environmental claims.
reply
Each token is like a certain number of words, right?
A word basically, but like the article you linked suggests: "thinking mode" produces a ton more tokens, because it does all the reasoning in the output! So you pay for these - let's call them "magical" - thoughts.
The token number is mostly a measure of backend computing load and infrastructure scaling, not a direct indicator of user activity or actual benefit.
But the reason why this is interesting is that datacenter usage at inference time isn't growing as quickly - at least for Google - as before, so I ask the universe: what are all these planned datacenters for?
reply
Also, my somewhat naive understanding goes like this: if a model produces lots of tokens, but does to require lots of compute to produce them, either:
  1. the model has become highly tuned to your prompts
Or
  1. the model is giving you kind of garbage answers that look more like recitation than reasoning.
reply
if a model produces lots of tokens, but does to require lots of compute to produce them
does or does not?
reply
Fat fingers on my part. Should read:
if a model produces lots of tokens, but does not require lots of compute to produce them
reply
Thanks, I think I get it now.
I'm not entirely sure about how this pertains to reasoning output though! We do know that if there is more relevant context, the bot performs better (just like a human); so if you pass a sparse prompt, it will extend it on the output side (there isn't really a difference to the bot!) with a whole lot of "reasoning" and then by self-extending context through "autocomplete", get to a pattern where the answer resolves better.
Tuning a bunch of common reasoning patterns to be as cheap as possible is good though? The most asked question to an LLM is probably "@grok is this true?" lol. Might as well optimize for going through the motions of that.
reply
Like so:
reply