pull down to refresh

I mean it just seems like it lets it make some more tokens at inference time so if it failed a first generation this might be enough for it to get the correct result