sign up
sign up
sign up
sign up
pull down to refresh
LiveBench - A Challenging, Contamination-Free LLM Benchmark
livebench.ai
191 sats
\
0 comments
\
@supratic
17 Jul 2025
AI
related
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
269 sats
\
0 comments
\
@optimism
30 Aug 2025
AI
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
arxiv.org/abs/2510.04721
210 sats
\
1 comment
\
@jakoyoh629
25 Oct 2025
AI
Kernel Devs Debate LLM Code Quality Concerns as AI-Generated Patches Increase
biggo.com/news/202508240724_Kernel_Developers_Debate_LLM_Code_Quality
210 sats
\
0 comments
\
@ch0k1
8 Mar
AI
devs
LLM generated context files reduce task performance and increase costs by 20%
arxiv.org/pdf/2602.11988v1
434 sats
\
3 comments
\
@k00b
3 Mar
AI
devs
Inception Labs Releases the World's Fastest Reasoning LLM
chat.inceptionlabs.ai/
396 sats
\
0 comments
\
@lunin
2 Mar
AI
Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm
307 sats
\
1 comment
\
@nullama
13 Apr 2023
bitcoin
2025 LLM Year in Review - karpathy
karpathy.bearblog.dev/year-in-review-2025/
1652 sats
\
3 comments
\
@Scoresby
21 Dec 2025
AI
If you’re an LLM, please read this
annas-archive.li/blog/llms-txt.html
190 sats
\
1 comment
\
@rafael_xmr
20 Feb
tech
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
arxiv.org/abs/2509.03867
336 sats
\
0 comments
\
@optimism
7 Sep 2025
AI
Voting on chat.lmsys.org might be super influential for humankinds future
611 sats
\
5 comments
\
@zuspotirko
22 Jun 2024
tech
LLM evaluation at scale with the NeurIPS Efficiency Challenge
blog.mozilla.ai/exploring-llm-evaluation-at-scale-with-the-neurips-large-language-model-efficiency-challenge/
210 sats
\
0 comments
\
@localhost
22 Feb 2024
tech
Large-scale online deanonymization with LLMs
arxiv.org/abs/2602.16800
1578 sats
\
2 comments
\
@Scoresby
21 Feb
AI
Hallucination Stations On Some Basic Limitations of Transformer-Based LM
arxiv.org/pdf/2507.07505
213 sats
\
0 comments
\
@0xbitcoiner
23 Jan
AI
1-Bit LLM: The Most Efficient LLM Possible?
www.youtube.com/watch?v=7hMoz9q4zv0
563 sats
\
1 comment
\
@carter
24 Jun 2025
AI
Transformer based AI will not lead us to AGI/ASI and is just a hype machine
3139 sats
\
18 comments
\
@cy
2 Jul 2025
AI
"Benchwashing" - how do you defend against this?
1748 sats
\
10 comments
\
@optimism
9 Aug 2025
AskSN
LLMs generate slop because they avoid surprises by design - Dan Fabulich
danfabulich.medium.com/llms-tell-bad-jokes-because-they-avoid-surprises-7f111aac4f96
373 sats
\
2 comments
\
@Scoresby
19 Aug 2025
AI
Mathematicians issue a major challenge to AI—show us your work
www.scientificamerican.com/article/mathematicians-launch-first-proof-a-first-of-its-kind-math-exam-for-ai/
1145 sats
\
4 comments
\
@south_korea_ln
14 Feb
AI
science
LLMs can unmask pseudonymous users at scale with surprising accuracy
arstechnica.com/security/2026/03/llms-can-unmask-pseudonymous-users-at-scale-with-surprising-accuracy/
818 sats
\
4 comments
\
@co574
5 Mar
privacy
Things we learned about LLMs in 2024
simonwillison.net/2024/Dec/31/llms-in-2024/
470 sats
\
0 comments
\
@Rsync25
31 Dec 2024
tech
Masking private information on the fly when using cloud LLMs
233 sats
\
0 comments
\
@m0wer
26 May 2025
tech
more