AI Data Center Network Design and Technologies, 1st edition

Published by Addison-Wesley Professional (February 9, 2026) © 2026

  • Mahesh Subramaniam
  • Michal Styszynski
  • Himanshu Tambakuwala
Products list

Access details

  • Instant access once purchased
  • Fulfilled by VitalSource

Currently unavailable

Products list

Details

  • A print text
  • Free shipping

Title overview

Artificial intelligence is redefining the scale, architecture, and performance expectations of modern data centres.

Training large ML models demand infrastructure capable of moving massive data sets through highly parallel, compute-intensive environments - where traditional data centre designs simply can't keep up.

AI Data Center Network Design and Technologies is a comprehensive, vendor-agnostic guide to the design principles, architectures, and technologies that power AI training and inference clusters. Written by leading experts in AI Data centre design, this book helps engineers, architects, and technology leaders understand how to design and scale networks purpose-built for the AI era.

You'll learn how to:

  • Architect scalable, high-radix network fabrics to support xPU (GPE, TPU)-based AI clusters
  • Integrate lossless Ethernet/IP fabrics for high-throughput, low-latency data movement
  • Align network design with AI/ML workload characteristics and server architectures
  • Address challenges in cooling, power, and interconnect design for AI-scale computing
  • Evaluate emerging technologies from the Ultra Ethernet Consortium (UEC) and their affect on future AI data centres
  • Apply best practices for deployment, validation, and performance measurement in AI/ML environments

With broad coverage of both foundational concepts and emerging innovations, this book bridges the gap between network engineering and AI infrastructure design. It empowers readers to understand not only how AI data centres work, but why they must evolve.

Table of contents

  • Part 1: AI/ML Data Center Design Workloads and Requirements
  • Chapter 1 Wonders in the Workload
  • Chapter 2 'The Common-Man View' of AI Data Center Fabrics
  • Part 2: AI/ML Data Center Design Concepts
  • Chapter 3 Network Design Considerations
  • Chapter 4 Optics and Cables Management
  • Chapter 5 Thermal and Power Efficiency Considerations
  • Part 3: AI/ML Data Center Technology Requirements
  • Chapter 6 Efficient Load Balancing
  • Chapter 7 RoCEv2 Transport and Congestion Management
  • Chapter 8 IP Routing for AI/ML Fabrics
  • Chapter 9 Storage Network Design and Technologies
  • Part 4: KPIs and Performance Monitoring
  • Chapter 10 AI Network Performance KPIs
  • Chapter 11 Monitoring and Telemetry
  • Part 5: UEC - Ultra Ethernet Consortium 
  • Chapter 12 Ultra Ethernet Consortium (UEC)
  • CONCLUSION
  • Chapter 13 Scale-Up Systems
  • Chapter 14 Conclusion
  • Appendix A: Questions and Answers
  • Appendix B: Acronyms

Need help?Get in touch