Some clever networking hacks open the doorAI search provider Perplexity's research wing has developed a new set of software optimizations that allows for trillion parameter or large models to run efficiently across older, cheaper hardware using a variety of existing network technologies, including Amazon's proprietary Elastic Fabric Adapter.These innovations, detailed in a paper published this week and released on GitHub for further scrutiny, present a novel approach to addressing one of the biggest challenges in serving large-scale mixture of experts models (MoE) at scale: memory and network latency.Mo parameters, mo problems
pull down to refresh
related posts