@anon
sign up
@anon
sign up
pull down to refresh
Testing AI systems on hard math problems shows they still perform very poorly
phys.org/news/2024-11-ai-hard-math-problems-poorly.html
153 sats
\
4 comments
\
@south_korea_ln
13 Nov 2024
science
related
Comparing athletes across generations
3870 sats
\
22 comments
\
@Voldemort
8 Apr 2024
mostly_harmless
Highly available LND clusters (etcd, PostgreSQL, Ceph) with benchmarks
github.com/Filiprogrammer/lnd-ha-guide
2685 sats
\
0 comments
\
@Filiprogrammer
12 Oct 2024
lightning
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
239 sats
\
0 comments
\
@optimism
30 Aug
AI
"Benchwashing" - how do you defend against this?
1648 sats
\
10 comments
\
@optimism
9 Aug
AskSN
Why smart people believe stupid things
4715 sats
\
25 comments
\
@frostdragon
21 Mar 2024
mostly_harmless
freebie
SN release: independent territory and comment ranking, fix reply costs
4594 sats
\
41 comments
\
@k00b
15 Mar
meta
I'm Jason Maier, author of A Progressive's Case for Bitcoin, AMA!
12.6k sats
\
50 comments
\
@cjasonmaier
4 Jun
AMA
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
epochai.org/frontiermath/the-benchmark
203 sats
\
0 comments
\
@Rsync25
10 Nov 2024
tech
AI Agent Benchmarks are Broken
ddkang.substack.com/p/ai-agent-benchmarks-are-broken
110 sats
\
0 comments
\
@carter
11 Jul
AI
A geometry masterpiece: Yale prof solves part of math’s ‘Rosetta Stone’
news.yale.edu/2024/11/01/geometry-masterpiece-yale-prof-solves-part-maths-rosetta-stone
430 sats
\
10 comments
\
@south_korea_ln
3 Nov 2024
science
Gaming Stacker News is Self-Defeating in the Long-Run
3206 sats
\
51 comments
\
@Undisciplined
21 May 2024
meta
SPHINCS+ - explained by nifty
insider.btcpp.dev/p/sphincs
1535 sats
\
3 comments
\
@Scoresby
4 Aug
crypto
PlebDevs: Landing Your First Dev Job
www.youtube.com/watch?v=WaPLVUS81EM
1442 sats
\
2 comments
\
@bitcoinplebdev
10 May 2024
devs
OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model
www.searchenginejournal.com/openai-secretly-funded-frontiermath-benchmarking-dataset/537760/
341 sats
\
0 comments
\
@frostdragon
21 Jan
tech
Nutri-score system in Europe for "grading" foods - complete fiat food fiasco
2062 sats
\
26 comments
\
@Signal312
2 Jul 2024
conspiracy
Voting on chat.lmsys.org might be super influential for humankinds future
511 sats
\
5 comments
\
@zuspotirko
22 Jun 2024
tech
What kind of Stacker are you?
970 sats
\
63 comments
\
@Undisciplined
12 Sep 2024
meta
Can you outspend the federal government?
1817 sats
\
17 comments
\
@jasonb
21 Feb 2024
econ
Bank of Canada cuts benchmark rate to 3.75%
641 sats
\
13 comments
\
@grayruby
23 Oct 2024
econ
My name is Gregor, I was born in 2122. AMA
3174 sats
\
116 comments
\
@Gregor
16 Dec 2022
bitcoin
freebie
My bad experiences using AI as a physicist
11.7k sats
\
32 comments
\
@south_korea_ln
16 Jun
science
more