Workload profiles (inference)
We treat inference as a family of workloads, not a single thing. Different modalities shift the bottleneck between compute, network, and operations.
Text generation
Section titled “Text generation”- Dominant constraints: latency (p95/p99), burst handling, steady-state throughput.
- Facility implications: predictable power delivery, resilient cooling, low-latency connectivity, strong ops posture for spikes.
- Integration notes: request routing, rate limiting, and clear incident comms expectations.
Embeddings + RAG (retrieval-augmented generation)
Section titled “Embeddings + RAG (retrieval-augmented generation)”- Dominant constraints: network + storage locality, tail latency, and data access patterns.
- Facility implications: bandwidth headroom, predictable east-west throughput, and connectivity options that match data gravity.
- Integration notes: private connectivity may matter more than raw compute.
Video inference
Section titled “Video inference”- Dominant constraints: bandwidth and egress cost/shape; the network can become the bottleneck quickly.
- Facility implications: carrier diversity, high-throughput connectivity, and careful capacity planning for peak events.
- Integration notes: traffic shaping and caching strategies become part of “design inputs.”
Agentic workflows
Section titled “Agentic workflows”- Dominant constraints: burstiness, variable compute, and long-tail latency driven by tool calls and multi-step chains.
- Facility implications: headroom policies and operational readiness for unpredictable spikes.
- Integration notes: observability and incident response expectations should be agreed upfront.
How we use these profiles
Section titled “How we use these profiles”We don’t publish capacity claims. We publish designed-for targets and commissioning validation milestones that map to the above profiles.
- Latency + bandwidth:
/thesis/latency-bandwidth/ - Commissioning gates:
/procurement/commissioning-gates/ - Services:
/services/build-to-suit/and/services/wholesale-colocation/