pull down to refresh
I wish people would add some context to their links to know if it's worth watching...
per Gemini:
The video demonstrates how Apple's Mac Studio M3 Ultra machines, when combined with Exo 1.0 software and RDMA over Thunderbolt, can form a powerful and efficient AI supercluster (1:41, 1:55).
This setup allows users to run massive AI models locally, achieving significant speed increases (e.g., Llama 3.3 going from 5 to 15.5 tokens/second (12:58-13:19)) at a fraction of the cost and power consumption of traditional enterprise solutions (1:27, 18:10). The speaker highlights this as a "class of its own" given its capabilities and price (1:35).
It is explained in the first 2 minutes. That's not too much to ask for imo.
Background:
An architecture that uses something called "shared memory" makes that the normal system RAM and the GPU VRAM are the same.
This has the effect that prosumer machines, which have always had a lot of ram, have now a lot of vram available too. This has the happy accident that you can surprisingly large ai models on these computers.
What's new:
Now (MacOS 26.2) Macs can enable RDMA over Thunderbolt 5. This makes the data transfer between multiple of these computers (cluster) much faster. With software such as e.g. https://github.com/exo-explore/exo making this a ridiculous setup.
What did they do?