tech

The training of Large Language Models (LLMs) has been shackled by the limitations of subword tokenization, a method that, while effective to a degree, demands considerable computational resources. This has not only capped the potential for model scaling but also restricted the training on expansive datasets without incurring prohibitive costs. The challenge has been twofold: how to significantly compress text to facilitate efficient model training and simultaneously maintain or even enhance the performance of these models.