@anon
sign up
@anon
sign up
pull down to refresh
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
epochai.org/frontiermath/the-benchmark
203 sats
\
0 comments
\
@Rsync25
10 Nov 2024
tech
related
Testing AI systems on hard math problems shows they still perform very poorly
phys.org/news/2024-11-ai-hard-math-problems-poorly.html
153 sats
\
4 comments
\
@south_korea_ln
13 Nov 2024
science
AI is actually bad at math, ORCA shows
www.theregister.com/2025/11/17/ai_bad_math_orca/
167 sats
\
4 comments
\
@0xbitcoiner
18 Nov
AI
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning
arxiv.org/abs/2508.05405
182 sats
\
0 comments
\
@optimism
10 Aug
AI
AI benchmarks hampered by bad science
www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/
178 sats
\
0 comments
\
@0xbitcoiner
10 Nov
AI
AI mimics neocortex computations with 'winner-take-all' approach
techxplore.com/news/2024-10-ai-mimics-neocortex-winner-approach.html
49 sats
\
0 comments
\
@ch0k1
26 Oct 2024
tech
Scaling up: how increasing inputs has made artificial intelligence more capable
ourworldindata.org/scaling-up-ai
279 sats
\
0 comments
\
@0xbitcoiner
20 Jan
charts_and_numbers
The AI Industry's Scaling Obsession Is Headed for a Cliff
arxiv.org/abs/2507.07931
321 sats
\
9 comments
\
@0xbitcoiner
15 Oct
AI
OpenAI o3 beats FrontierMath - because OpenAI funded the test and cheated
pivot-to-ai.com/2025/01/20/openai-o3-beats-frontiermath-because-openai-funded-the-test-and-had-access-to-questions/
366 sats
\
1 comment
\
@StillStackinAfterAllTheseYears
21 Jan
tech
Debate May Help AI Models Converge on Truth
www.quantamagazine.org/debate-may-help-ai-models-converge-on-truth-20241108/
237 sats
\
0 comments
\
@0xbitcoiner
8 Nov 2024
science
Scaling up: how increasing inputs has made artificial intelligence more capable
215 sats
\
0 comments
\
@zuspotirko
29 Jan
tech
Olympiad-level formal mathematical reasoning with reinforcement learning
www.nature.com/articles/s41586-025-09833-y
175 sats
\
2 comments
\
@0xbitcoiner
20 Nov
AI
AI achieves silver-medal standard solving International Mathematical Olympiad
deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
284 sats
\
3 comments
\
@sancristrader
25 Jul 2024
tech
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open LLMs
arxiv.org/abs/2402.03300
662 sats
\
0 comments
\
@zuspotirko
6 Feb 2024
science
Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI
www.scientificamerican.com/article/inside-the-secret-meeting-where-mathematicians-struggled-to-outsmart-ai/
421 sats
\
5 comments
\
@k00b
9 Jun
science
New MIT Research Proves AGI Was Achieved
www.geeky-gadgets.com/artificial-general-intelligence-advancements/
65 sats
\
1 comment
\
@ch0k1
16 Nov 2024
news
Hermes 4 outperforms OpenAI models with minimal content restrictions
121 sats
\
0 comments
\
@lunin
1 Sep
AI
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
arxiv.org/abs/2510.04721
180 sats
\
1 comment
\
@jakoyoh629
25 Oct
AI
Quantitative AI progress needs accurate and transparent evaluation
mathstodon.xyz/@tao/114910028356641733
100 sats
\
0 comments
\
@hn
25 Jul
tech
Google's DeepMind AI decodes age-old math equation, stumping humans
interestingengineering.com/innovation/googles-deepmind-decodes-old-math-equation
100 sats
\
0 comments
\
@zuspotirko
16 Dec 2023
AI
Chinese researchers just built an open-source rival to ChatGPT in 2 months
www.livescience.com/technology/artificial-intelligence/china-releases-a-cheap-open-rival-to-chatgpt-thrilling-some-scientists-and-panicking-silicon-valley
21 sats
\
0 comments
\
@ch0k1
25 Jan
news
Researchers isolate memorization from problem-solving in AI neural networks
arstechnica.com/ai/2025/11/study-finds-ai-models-store-memories-and-logic-in-different-neural-regions/
400 sats
\
1 comment
\
@0xbitcoiner
11 Nov
AI
more