pull down to refresh

Ok so this was able to simply explain how some of the MoE models work and why it needs so many cards. Each card holds a few experts and the router is good at picking the right card to send the query to, Then the card selects one of the experts that's hosted on the card to fulfill the response. In this model described in the video they made it more like one model per card so the router was better at selecting which expert in particular it should forward to