Qwen3-ASR can decipher speech, songs, and even rap, coping with low-quality recordings and noise.
11 languages are supported: English, Chinese, Arabic, Spanish, Korean, Russian, etc. You can specify a list of names and keywords so that they are recognized correctly.
Cost - $0.000032 per second of audio via API.