Mercury 2 is based on a diffusion architecture. Simply put, it generates all tokens simultaneously in parallel. This eliminates the bottleneck of traditional LLMs, where text is generated sequentially, token by token.
Inception claims that Mercury 2 is five times faster than all existing analogs. Its performance is also competitive with other fast models like Haiku 4.5 and GPT-5 Mini.
You can try it for free in the chat.
To access the API, submit a request on the website.