Subject: cache sizes
Previously, we constructed tests that performed memory reads/writes using only a single cache line. At the time, our focus was solely on determining the theoretical maximum speed. Now, we must also analyze behavior when data exceeds the capacity of a specific cache level.
The test follows the methodology suggested in the Performance-Aware Programming course. As before, we process 1 GB of data, but this time, we read memory in chunks starting at 64 bytes (for example), then 4 KB, 16 KB, 32 KB, and larger sizes. The L1 cache on Broadwell CPUs is typically 32 KB, so smaller chunks should fit entirely in L1. As chunk sizes grow, data spills into L2, L3, and eventually main memory. The results should reflect this hierarchy.
asm_cache_sizes:
align 64
mov rax, rdx
xor r9, r9
.loop:
vmovdqu ymm0, [rax+0]
vmovdqu ymm0, [rax+32]
vmovdqu ymm0, [rax+64]
vmovdqu ymm0, [rax+96]
vmovdqu ymm0, [rax+128]
vmovdqu ymm0, [rax+160]
vmovdqu ymm0, [rax+192]
vmovdqu ymm0, [rax+224]
add r9, 256
; Apply mask
and r9, r8
mov rax, rdx
add rax, r9
; Make sure counter updated
sub rcx, 256
jnz .loop
ret
Results:
64 bytes : Total time 5.9568 ms.
Min: 1,024 MB at 171.860 GB/s (page faults: 0).
Max: 1,024 MB at 145.850 GB/s.
4 KB: Total time 6.0118 ms.
Min: 171.886 GB/s.
Max: 139.120 GB/s.
32 KB: Total time 6.4872 ms.
Min: 161.549 GB/s.
Max: 94.927 GB/s.
64 KB: Total time 13.0000 ms.
Min: 84.212 GB/s.
Max: 66.529 GB/s.
128 KB: Total time 14.3033 ms.
Min: 76.010 GB/s.
Max: 62.602 GB/s.
256 KB: Total time 16.4507 ms.
Min: 61.058 GB/s.
Max: 43.922 GB/s.
512 KB : Total time 34.1319 ms.
Min: 30.194 GB/s.
Max: 26.114 GB/s.
1 MB: Total time 33.6975 ms.
Min: 30.178 GB/s.
Max: 25.013 GB/s.
2 MB: Total time 36.3865 ms.
Min: 29.990 GB/s.
Max: 23.638 GB/s.
4 MB: Total time 68.4680 ms.
Min: 15.313 GB/s.
Max: 12.097 GB/s.
8 MB: Total time 89.1476 ms.
Min: 11.731 GB/s.
Max: 9.399 GB/s.
16 MB: Total time 88.3279 ms.
Min: 11.443 GB/s.
Max: 9.617 GB/s.
512 MB: Total time 96.4798 ms.
Min: 11.317 GB/s.
Max: 5.715 GB/s.
1 GB: Total time 89.8340 ms.
Min: 11.307 GB/s.
Max: 5.483 GB/s.
- Overview
- Profiling the game code;
- File I/O and page faults;
- Measuring memory bandwidth;
- Instruction decoding;
- Testing branch prediction;
- Execution ports and schedulers;
- Cache sizes and bandwidth;
- Introducing SIMD;
- Multithreading.
In progress