items/1039304/related \ stacker news

pull down to refresh

Context Rot: How Increasing Input Tokens Impacts LLM Performance research.trychroma.com/context-rot

334 sats \ 2 comments \ @Scoresby 14 Jul 2025 AI

related

Should you use "you" in your LLM prompts?x.com/BrianRoemmele/status/1998068828295877011

827 sats \ 10 comments \ @Scoresby 9 Dec 2025 AI

In a First, AI Models Analyze Language As Well As a Human Expert www.quantamagazine.org/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert-20251031/

274 sats \ 0 comments \ @0xbitcoiner 31 Oct 2025 AI

How to turn LLM Pinocchio into a real boy

12.7k sats \ 10 comments \ @Scoresby 7 Oct 2025 AI

The simulation of judgment in LLMs - PNAS www.pnas.org/doi/10.1073/pnas.2518443122

244 sats \ 5 comments \ @Scoresby 15 Oct 2025 AI

Why language models hallucinate - OpenAI openai.com/index/why-language-models-hallucinate/

438 sats \ 4 comments \ @Scoresby 6 Sep 2025 AI

Giving models more compute time might make them worse at reasoning - Anthropic arxiv.org/abs/2507.14417

343 sats \ 2 comments \ @Scoresby 31 Jul 2025 AI

Agentic Reinforced Policy Optimization arxiv.org/abs/2507.19849

171 sats \ 0 comments \ @optimism 29 Jul 2025 AI

Why Do Researchers Care About Small Language Models?www.quantamagazine.org/why-do-researchers-care-about-small-language-models-20250310/

40 sats \ 3 comments \ @0xbitcoiner 10 Mar 2025 AI

GDPval: Measuring the performance of our models on real-world tasks - OpenAI openai.com/index/gdpval/

388 sats \ 8 comments \ @Scoresby 2 Oct 2025 AI

DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning arxiv.org/abs/2508.05405

212 sats \ 0 comments \ @optimism 10 Aug 2025 AI

To Make Language Models Work Better, Researchers Sidestep Language www.quantamagazine.org/to-make-language-models-work-better-researchers-sidestep-language-20250414/

210 sats \ 0 comments \ @0xbitcoiner 15 Apr 2025 AI

Understanding Strengths & Limitations of Reasoning Models via Problem Complexity machinelearning.apple.com/research/illusion-of-thinking

71 sats \ 1 comment \ @supratic 10 Jun 2025 tech

Prime Fields, Text Manglers and Progress Report on Indra

6263 sats \ 0 comments \ @l0k18 1 May 2023 bitcoin

Tongyi-DeepResearch

301 sats \ 0 comments \ @optimism 29 Oct 2025 AI

LLM Daydreaming gwern.net/ai-daydreaming

349 sats \ 2 comments \ @k00b 16 Jul 2025 AI

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI

AI Still Can't Think: Apple’s New Study Dispels the Myth

365 sats \ 2 comments \ @lunin 31 Jul 2025 AI

Inside Miron Construction’s AI Journey: Tools, Challenges, and Wins www.autodesk.com/blogs/construction/inside-miron-constructions-ai/

50 sats \ 2 comments \ @BlokchainB 8 Aug 2025 Construction_and_Engineering

How Attention Sinks Keep Language Models Stable hanlab.mit.edu/blog/streamingllm

140 sats \ 0 comments \ @carter 8 Aug 2025 AI

Humanity's Last Exam lastexam.ai/

327 sats \ 0 comments \ @StillStackinAfterAllTheseYears 4 Feb AI tech

Self-Adapting Language Models arxiv.org/abs/2506.10943

447 sats \ 0 comments \ @carter 13 Oct 2025 AI