sign up
sign up
sign up
sign up
pull down to refresh
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
269 sats
\
0 comments
\
@optimism
30 Aug 2025
AI
related
Gemini 3 and Antigravit : Why Google's latest AI releases are a big deal
fortune.com/2025/11/19/google-gemini-3-antigravity-ai-explained/?utm_source=flipboard&utm_content=fortune/magazine/Personal+finance
161 sats
\
1 comment
\
@DrBrader99
19 Nov 2025
AI
"Benchwashing" - how do you defend against this?
1748 sats
\
10 comments
\
@optimism
9 Aug 2025
AskSN
Jan v3 4B: great in instruction following
huggingface.co/janhq/Jan-v3-4B-base-instruct
519 sats
\
0 comments
\
@optimism
2 Feb
AI
The week in AI, August 4-10, 2025
2353 sats
\
12 comments
\
@optimism
11 Aug 2025
AI
LLM Rankings: programming | OpenRouter
openrouter.ai/rankings/programming
126 sats
\
0 comments
\
@m0wer
28 May 2025
tech
GDPval: Measuring the performance of our models on real-world tasks - OpenAI
openai.com/index/gdpval/
388 sats
\
8 comments
\
@Scoresby
2 Oct 2025
AI
The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math
www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark
260 sats
\
0 comments
\
@0xbitcoiner
27 Feb
AI
Alibaba has released its flagship Qwen3-Max model with a trillion parameters
chat.qwen.ai/
197 sats
\
0 comments
\
@lunin
25 Sep 2025
AI
The week in AI, August 11-17, 2025
1667 sats
\
4 comments
\
@optimism
21 Aug 2025
AI
AI is actually bad at math, ORCA shows
www.theregister.com/2025/11/17/ai_bad_math_orca/
197 sats
\
4 comments
\
@0xbitcoiner
18 Nov 2025
AI
GPT-5.4 will feature an extreme reasoning mode.
www.theinformation.com/newsletters/ai-agenda/openais-next-ai-model-will-extreme-reasoning
340 sats
\
1 comment
\
@lunin
5 Mar
AI
ChatGPT 4o vs Gemini 1.5 Pro: It's Not Even Close
beebom.com/chatgpt-4o-vs-gemini-1-5-pro/
142 sats
\
1 comment
\
@ch0k1
17 May 2024
news
Compact versions of Qwen 3.5 have been released, open source
huggingface.co/collections/Qwen/qwen35
400 sats
\
1 comment
\
@lunin
5 Mar
AI
Google and OpenAI Release New Fast Models
blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/
340 sats
\
0 comments
\
@lunin
4 Mar
AI
Claude 3 beats GPT-4 on Aider's code editing benchmark
aider.chat/2024/03/08/claude-3.html
377 sats
\
2 comments
\
@hn
31 Mar 2024
tech
Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5
github.com/b4rtaz/distributed-llama/discussions/255
130 sats
\
0 comments
\
@hn
6 Sep 2025
tech
Are LLMs Racist?
491 sats
\
11 comments
\
@Tony
23 Oct 2025
AI
Alibaba has released the open-source Qwen3.5
qwen.ai/blog?id=qwen3.5
328 sats
\
1 comment
\
@lunin
17 Feb
AI
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs
www.baseten.co/blog/sota-performance-for-gpt-oss-120b-on-nvidia-gpus/
151 sats
\
0 comments
\
@hn
7 Aug 2025
tech
Vals AI — Finance Agent Benchmark
www.vals.ai/benchmarks/finance_agent-04-22-2025?utm_campaign=wp_the_technology_202&utm_medium=email&utm_source=newsletter
64 sats
\
3 comments
\
@BlokchainB
24 Apr 2025
AI
Gemma3 – The current strongest model that fits on a single GPU
ollama.com/library/gemma3
96 sats
\
0 comments
\
@hn
12 Mar 2025
tech
more