pull down to refresh

240 sats \ 0 replies \ @zuspotirko 1h
Solve, don't ask me any questions
I told you
This "adverserial" style of evaluating an LLM is interesting, but not the best evaluation. The best evaluation for how good an LLM is is if you give your best prompt instead of antagonizing the chatbot. That's the best way how we find out what its maximum capabilities are.
reply
90 sats \ 6 replies \ @optimism 3h
The image on the second one doesn't show for me. A bug perhaps?
reply
Yep, they no longer store it in the app history. This was the screen:
reply
Right. So I'd guess it's either a bug, or a filter to specifically stop people from using gpt to solve Duolingo.
reply
Unlikely it's a filter. There is no benefit whatsoever to cheat on Duolingo. You pay to learn.
It's a new model that prefers to engage in conversations rather than do what it is told. They also reduced the number of picture uploads per day from 3 to 1 on a free account. Push people to pay for this crap.
reply
0 sats \ 1 reply \ @optimism 1h
gpt5-main (the non-thinking model) has (still unsolved, I guess they don't wanna) instruction following regressions.
Just out of interest I ran your image with the same instruction through a small gemma3 distill:
ggml-org/gemma-3-4b-it-GGUF:Q4_K_M using llama.cpp server:
I don't know if the answer is in any way correct, but this is all runnable with minimum memory (this particular one should run with 4GB memory), locally.
reply
It simply did the ORC of the japanese symbols
reply
It could be cheating on tests in general
reply
I noticed this pattern recently a lot
Me: Solve problem X ChatGPT: I did Y, would you me to also do Z (which Z is obviously the reasonable thing to do in the first place as part of solution to X) Me: Don't ask me. Just do complete the task and Z is part of solution.
reply
It seems they just churn the paying customers to spend API calls
reply
So true
reply