Elon Musk has lots of money so if it tanks it’s no big deal.
"The press release says Applied Digital's data center design allows it to accommodate almost 50,000 of NVIDIA’s H100 SXM class graphics processing units in a single parallel compute cluster."
Those H100s cost like 30k each I think. Thats some significant processing power. Colossus (grok / xAI) has 200k h100s and it sounds like they expect to add another 100k H200s in the near future. Its already the most powerful AI training array on the planet. Elon is gonna dominate the AI market it appears, no other country has even close to the processing power the US has at this point.
More info:
Cost Comparison:
- H100: The cost of an H100 GPU is typically around $25,000 to $40,000 per unit, depending on the exact configuration and supply demand dynamics.
- H200: The H200, being an upgrade, is estimated to cost approximately 20% more per hour when accessed as a virtual machine instance on cloud service providers. Given the H100's price range, this translates to an H200 potentially ranging from $30,000 to $48,000 per unit.
Performance Comparison:
- Memory and Bandwidth: The H200 offers significant advancements over the H100, including nearly double the memory capacity (141 GB vs. 80 GB) and 43% higher memory bandwidth (4.8 TB/s vs. 3.35 TB/s). This makes the H200 more suitable for handling larger models and datasets, which is critical for AI training and inference tasks.
- AI and HPC Performance:
- Training: Benchmarks show the H200 can offer up to 45% more performance in specific generative AI and high-performance computing (HPC) tasks due to its increased memory and bandwidth.
- Inference: For large language models (LLMs) like Llama2 70B, the H200 doubles the inference performance compared to the H100, making it notably faster for real-time AI applications.
- Energy Efficiency: The H200 delivers enhanced performance at the same energy consumption levels as the H100, effectively reducing the total cost of ownership (TCO) by 50% for LLM tasks due to lower energy consumption and increased efficiency.
Which is Better?
- The H200 is considered better for:
- Applications requiring large memory capacity and high bandwidth, like training very large AI models or handling memory-intensive HPC workloads.
- Scenarios where energy efficiency and reduced TCO are critical considerations.
- Future-proofing, given its advancements over the H100.
- The H100 remains valuable for:
- Situations where the cost is a significant factor, and the performance of the H100 is still adequate for the workload.
- Existing systems where upgrading to H200 might not be justified if the H100 meets current needs.
In summary, if budget isn't the primary concern, and you're looking for top-of-the-line performance, especially for the most demanding AI and HPC tasks, the H200 is the better choice. However, the H100 is still a powerful GPU, offering a good balance of performance and cost for many current applications.