Subject: profiling game and renderer
Key Goals: Profile the current performance of the renderer.
Measured Components :
- The entire game loop (one frame);
- Render Function: Reads sprites from a .bmp atlas and writes to the screen buffer (passed to the OS for display);
- Inner Loop: The core pixel-processing logic within the render function.
Results:
First Frame :
- ID 1 (The entire game loop, one frame): 300,253,484 cycles (1 hit, 1982 page faults);
- ID 2 (Render Function): 17,176,904 cycles (7 hits, 34,336 bytes processed);
- ID 3 (Inner Loop): 8,133,871 cycles (274,688 hits, 29 cycles/hit).
Nth Frame :
- ID 1 (The entire game loop, one frame): 17,155,770 cycles (1 hit);
- ID 2 (Render Function): 17,034,994 cycles (7 hits, 34,336 bytes processed);
- ID 3 (Inner Loop): 8,477,079 cycles (274,688 hits, 30 cycles/hit).
There is a significant disparity between the first frame and subsequent frames in the game loop: the initial frame required 300,253,484 cycles , while later frames (Nth frames) consumed only 17,155,770 cycles . We will explore the reason for that later. Notably, the render function (ID 2) accounts for the majority of CPU usage, consuming approximately 17 million cycles per frame.
Sprite Memory Breakdown (First Frame):
A typical frame renders multiple sprites. A sprite is an image composed of pixels, where each pixel is a 32-bit value . This includes 24 bits for color (8 bits each for red, green, and blue) and 8 bits for the alpha channel (transparency).- Background: (512 * 512 pixels * 4 bytes/pixel) / 8 = 32,768 bytes;
- Main Character: (64 * 32) / 8 = 256 bytes;
- Main Button: (64 * 64) / 8 = 512 bytes;
- Blank Position Button: (64 * 64) / 8 = 512 bytes;
- Position Icon: (32 * 32) / 8 = 128 bytes;
- Arrow Icon: (16 * 16) / 8 = 32 bytes;
- Total: 34,336 bytes per frame.
Memory Throughput Analysis:
At 30 FPS, the renderer processes 1,030,080 bytes/sec (~0.00096 GB/s) . This is orders of magnitude lower than modern memory bandwidth, indicating the bottleneck is computational, not memory-bound. However, before making any conclusions, let's run tests to validate system-level performance.
- Overview
- Profiling the game code;
- File I/O and page faults;
- Measuring memory bandwidth;
- Instruction decoding;
- Testing branch prediction;
- Execution ports and schedulers;
- Cache sizes and bandwidth;
- Introducing SIMD;
- Multithreading.
In progress