items/1285353/related \ stacker news

pull down to refresh

AI is actually bad at math, ORCA shows www.theregister.com/2025/11/17/ai_bad_math_orca/

197 sats \ 4 comments \ @0xbitcoiner 18 Nov 2025 AI

related

The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark

260 sats \ 0 comments \ @0xbitcoiner 27 Feb AI

The AI Revolution in Math Has Arrived www.quantamagazine.org/the-ai-revolution-in-math-has-arrived-20260413/

355 sats \ 1 comment \ @0xbitcoiner 13 Apr math AI

In a First, AI Models Analyze Language As Well As a Human Expert www.quantamagazine.org/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert-20251031/

274 sats \ 0 comments \ @0xbitcoiner 31 Oct 2025 AI

Debate May Help AI Models Converge on Truth www.quantamagazine.org/debate-may-help-ai-models-converge-on-truth-20241108/

258 sats \ 0 comments \ @0xbitcoiner 8 Nov 2024 science

To Make Language Models Work Better, Researchers Sidestep Language www.quantamagazine.org/to-make-language-models-work-better-researchers-sidestep-language-20250414/

210 sats \ 0 comments \ @0xbitcoiner 15 Apr 2025 AI

Is language the same as intelligence? The AI industry desperately needs it to be www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems

645 sats \ 9 comments \ @0xbitcoiner 25 Nov 2025 AI

Wairdle - a followup about AI's struggles

434 sats \ 11 comments \ @crrdlx 12 Aug 2025 AI

GDPval: Measuring the performance of our models on real-world tasks - OpenAI openai.com/index/gdpval/

388 sats \ 8 comments \ @Scoresby 2 Oct 2025 AI

AI benchmarks hampered by bad science www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/

208 sats \ 0 comments \ @0xbitcoiner 10 Nov 2025 AI

Why OpenAI’s solution to AI hallucinations would kill ChatGPT tomorrow theconversation.com/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow-265107

618 sats \ 25 comments \ @south_korea_ln 17 Sep 2025 AI

AI models will deceive you to save their own kind www.theregister.com/2026/04/02/ai_models_will_deceive_you/

324 sats \ 6 comments \ @0xbitcoiner 3 Apr AI

Sam Altman Warns AI Development Needs An Energy 'Breakthrough'finance.yahoo.com/news/sam-altman-warns-ai-development-170018746.html

248 sats \ 1 comment \ @ch0k1 6 Apr 2024 tech

AI Still Can't Think: Apple’s New Study Dispels the Myth

365 sats \ 2 comments \ @lunin 31 Jul 2025 AI

Why it’s a mistake to ask chatbots about their mistakes arstechnica.com/ai/2025/08/why-its-a-mistake-to-ask-chatbots-about-their-mistakes/

251 sats \ 0 comments \ @kepford 13 Aug 2025 AI

You Should Be CentaurMaxxing. - Zack Shapiro x.com/zackbshapiro/status/2024126623000220023

2348 sats \ 5 comments \ @Scoresby 18 Feb AI

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI

AI models are terrible at betting on soccer—especially xAI Grok arstechnica.com/ai/2026/04/ai-models-are-terrible-at-betting-on-soccer-especially-xai-grok/

295 sats \ 0 comments \ @0xbitcoiner 12 Apr AI Stacker_Sports

AI model collapse is not what we paid for www.theregister.com/2025/05/27/opinion_column_ai_model_collapse/

451 sats \ 0 comments \ @k00b 28 May 2025 tech

Breaking Opus 4.7 with ChatGPT (Hacking Claude's Memory)embracethered.com/blog/posts/2026/breaking-opus-4.7-with-chatgpt/

288 sats \ 1 comment \ @0xbitcoiner 19 Apr AI

Researchers isolate memorization from problem-solving in AI neural networks arstechnica.com/ai/2025/11/study-finds-ai-models-store-memories-and-logic-in-different-neural-regions/

430 sats \ 1 comment \ @0xbitcoiner 11 Nov 2025 AI

Large Language Models Pass the Turing Test arxiv.org/pdf/2503.23674

374 sats \ 11 comments \ @south_korea_ln 15 Apr 2025 AI