items/1042897/related \ stacker news

pull down to refresh

LiveBench - A Challenging, Contamination-Free LLM Benchmark livebench.ai

191 sats \ 0 comments \ @supratic 17 Jul 2025 AI

related

MCP-Bench: Benchmarking Tool-Using LLM Agents arxiv.org/abs/2508.20453

269 sats \ 0 comments \ @optimism 30 Aug 2025 AI

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI

Kernel Devs Debate LLM Code Quality Concerns as AI-Generated Patches Increase biggo.com/news/202508240724_Kernel_Developers_Debate_LLM_Code_Quality

210 sats \ 0 comments \ @ch0k1 8 Mar AI devs

LLM generated context files reduce task performance and increase costs by 20%arxiv.org/pdf/2602.11988v1

434 sats \ 3 comments \ @k00b 3 Mar AI devs

Inception Labs Releases the World's Fastest Reasoning LLM chat.inceptionlabs.ai/

396 sats \ 0 comments \ @lunin 2 Mar AI

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

307 sats \ 1 comment \ @nullama 13 Apr 2023 bitcoin

2025 LLM Year in Review - karpathy karpathy.bearblog.dev/year-in-review-2025/

1652 sats \ 3 comments \ @Scoresby 21 Dec 2025 AI

If you’re an LLM, please read this annas-archive.li/blog/llms-txt.html

190 sats \ 1 comment \ @rafael_xmr 20 Feb tech

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth arxiv.org/abs/2509.03867

336 sats \ 0 comments \ @optimism 7 Sep 2025 AI

Voting on chat.lmsys.org might be super influential for humankinds future

611 sats \ 5 comments \ @zuspotirko 22 Jun 2024 tech

LLM evaluation at scale with the NeurIPS Efficiency Challenge blog.mozilla.ai/exploring-llm-evaluation-at-scale-with-the-neurips-large-language-model-efficiency-challenge/

210 sats \ 0 comments \ @localhost 22 Feb 2024 tech

Large-scale online deanonymization with LLMs arxiv.org/abs/2602.16800

1578 sats \ 2 comments \ @Scoresby 21 Feb AI

Hallucination Stations On Some Basic Limitations of Transformer-Based LM arxiv.org/pdf/2507.07505

213 sats \ 0 comments \ @0xbitcoiner 23 Jan AI

1-Bit LLM: The Most Efficient LLM Possible?www.youtube.com/watch?v=7hMoz9q4zv0

563 sats \ 1 comment \ @carter 24 Jun 2025 AI

Transformer based AI will not lead us to AGI/ASI and is just a hype machine

3139 sats \ 18 comments \ @cy 2 Jul 2025 AI

"Benchwashing" - how do you defend against this?

1748 sats \ 10 comments \ @optimism 9 Aug 2025 AskSN

LLMs generate slop because they avoid surprises by design - Dan Fabulich danfabulich.medium.com/llms-tell-bad-jokes-because-they-avoid-surprises-7f111aac4f96

373 sats \ 2 comments \ @Scoresby 19 Aug 2025 AI

Mathematicians issue a major challenge to AI—show us your work www.scientificamerican.com/article/mathematicians-launch-first-proof-a-first-of-its-kind-math-exam-for-ai/

1145 sats \ 4 comments \ @south_korea_ln 14 Feb AI science

LLMs can unmask pseudonymous users at scale with surprising accuracy arstechnica.com/security/2026/03/llms-can-unmask-pseudonymous-users-at-scale-with-surprising-accuracy/

818 sats \ 4 comments \ @co574 5 Mar privacy

Things we learned about LLMs in 2024 simonwillison.net/2024/Dec/31/llms-in-2024/

470 sats \ 0 comments \ @Rsync25 31 Dec 2024 tech

Masking private information on the fly when using cloud LLMs

233 sats \ 0 comments \ @m0wer 26 May 2025 tech