This whole system example demonstrates what the memory bandwidth might look like when the whole system is working under a typical load (this numbers are only predictions not measured numbers)
This example assumes what’s expected to be a typical CPU load and a maximum GPU load:
- Three display planes are enabled at 1080p resolution.
- Display write-back is writing a 1080p image at 60 FPS.
- Move engines are idle.
- Read bandwidth of the command buffer and index buffer is 4 GB/s.
- Regular GPU rendering consumes the rest of the available bandwidth.
This diagram shows our prediction of the typical bandwidth for the north bridge clients and the typical available bandwidth for the GPU clients (which are shown in blue).
Let’s start by describing the CPU. Although each CPU module can request up to 20.8 GB/s of bandwidth for read and for write, the typical bandwidth you should expect for the CPU is 4 GB/s per CPU module per direction—about 16 GB/s altogether.
You can expect typical bandwidth to be around 3 GB/s per direction for the: audio, HDD, Camera, and USBs.
The Kinect Sensor is the main consumer of the bandwidth. For example, peak bandwidth to and from the HDD is only about 50 MB/s, so the HDD cannot be seen as a major bandwidth consumer.
Because the GPU is usually pushed to the maximum, you can expect typical coherent bandwidth to be about 25 GB/s. However, this amount depends on how many resources are made snoopable.
Currently, we are not able tell exactly how much of that access will be hitting the CPU’s caches and how much of the access much will go to DRAM. So as we said above, this figure is highly speculative at the moment.
The estimated 25 GB/s of bandwidth for coherent memory access does not account for the non-coherent memory access of the GPU.
The coherent bandwidth that can flow through the north bridge is a limited at 30 GB/s. Under typical conditions, this limit shouldn’t cause you problems. But during a high load on the coherent memory traffic, the north bridge might become saturated. Once the north bridge becomes saturated, you may notice increased latencies for memory access.
CPU memory access that is Write Combined does not fall under this limitation nor does GPU memory access that is non-coherent.
Finally let’s compute how much bandwidth is left for the non-coherent GPU access to consume. Let’s assume that:
- The sum of bandwidth from the north bridge to DRAM is 25 GB/s.
- Some portion of the GPU coherent bandwidth misses the L2 caches.
- Non-coherent CPU bandwidth is 3 GB/s.
This leaves 42 GB/s of DRAM bandwidth available to the GPU clients.