items/1200297/related \ stacker news

pull down to refresh

MCP-Bench: Benchmarking Tool-Using LLM Agents arxiv.org/abs/2508.20453

269 sats \ 0 comments \ @optimism 30 Aug 2025 AI

related

Jan v3 4B: great in instruction following huggingface.co/janhq/Jan-v3-4B-base-instruct

519 sats \ 0 comments \ @optimism 2 Feb AI

mesh-llm — Decentralised LLM Inference docs.anarchai.org/

1702 sats \ 2 comments \ @Scoresby 3 Apr AI

"Benchwashing" - how do you defend against this?

1748 sats \ 10 comments \ @optimism 9 Aug 2025 AskSN

pylint MCP provider

2428 sats \ 6 comments \ @optimism 4 Jun 2025 builders

The week in AI, August 11-17, 2025

1667 sats \ 4 comments \ @optimism 21 Aug 2025 AI

Gemini 3.1 Pro: A smarter model for your most complex tasks blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

347 sats \ 0 comments \ @0xbitcoiner 19 Feb AI

GDPval: Measuring the performance of our models on real-world tasks - OpenAI openai.com/index/gdpval/

388 sats \ 8 comments \ @Scoresby 2 Oct 2025 AI

The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark

260 sats \ 0 comments \ @0xbitcoiner 27 Feb AI

The week in ~ai, June 16-23 2025

1941 sats \ 9 comments \ @optimism 24 Jun 2025 AI

The week in AI, October 6-12, 2025

991 sats \ 2 comments \ @optimism 13 Oct 2025 AI

The week in AI, August 4-10, 2025

2353 sats \ 12 comments \ @optimism 11 Aug 2025 AI

Evidence of performative chain-of-thought (CoT) in reasoning models arxiv.org/abs/2603.05488

459 sats \ 3 comments \ @k00b 15 Mar AI

The week in AI, July 28 - August 3, 2025

1505 sats \ 3 comments \ @optimism 4 Aug 2025 AI

Should you use "you" in your LLM prompts?x.com/BrianRoemmele/status/1998068828295877011

827 sats \ 10 comments \ @Scoresby 9 Dec 2025 AI

The week in AI, June 24-29, 2025

766 sats \ 7 comments \ @optimism 2 Jul 2025 AI

The week in AI: July 7-13, 2025

1528 sats \ 1 comment \ @optimism 14 Jul 2025 AI

AI agents can't teach themselves new tricks – people can www.theregister.com/2026/02/19/ai_agents_cant_teach_themselves/

221 sats \ 1 comment \ @0xbitcoiner 19 Feb AI

The week in AI, September 22-28, 2025

1288 sats \ 2 comments \ @optimism 29 Sep 2025 AI

The week in AI, August 25-31, 2025

1238 sats \ 4 comments \ @optimism 1 Sep 2025 AI

The week in AI, October 20-26, 2025

412 sats \ 5 comments \ @optimism 27 Oct 2025 AI

The week in AI, September 8-14, 2025

1461 sats \ 0 comments \ @optimism 16 Sep 2025 AI