DEV Community

dima853
dima853

Posted on

Branch prediction (all processors)

Document - https://www.agner.org/optimize/microarchitecture.pdf

The pipeline in a modern microprocessor contains many stages, including instruction fetch, decoding, register allocation and renaming, ยตop reordering, execution, and retirement.

Handling instructions in a pipelined manner allows the microprocessor to do many things at the same time.

Branch prediction (in processors) is a technique used by modern processors to improve performance by guessing which branch in a conditional statement (e.g., if-else) will be executed.

๐Ÿ“ Summary: Pipeline and Branch Prediction in CPU

๐Ÿ”น 1. Pipelining

Modern CPUs execute instructions in stages, breaking them down into stages:

  1. Fetch โ€“ loading instructions from memory.
  2. Decode โ€“ decoding in microop (ยตop).
  3. Rename/Allocate โ€“ rename registers (for Out-of-Order).
  4. Execute โ€“ execution (ALU, memory access).
  5. Retire โ€“ fixing the result (writing to registers/memory).

๐Ÿ”ป Problem: Branches they interfere with the pipeline, since the CPU does not know which instruction will be next until the condition is calculated.


๐Ÿ”น 2. Speculative Execution

Speculative execution is an optimization technique where a computer system performs some task that may not be needed.

The CPU predicts which code branch will execute and starts executing it before the condition is checked.

๐Ÿ”ธ How does it work?

  • If the prediction is correct โ†’ the results ** are fixed (retire)**.
  • If incorrect โ†’ the pipeline is reset (flush), the correct branch is executed.

  • What is dangerous?

  • Spectre attacks use speculative execution to leak data.

Image description

Image description

Spectre (security vulnerability)

Image description
Spectre is one of the speculative execution CPU vulnerabilities which involve microarchitectural side-channel attacks. (These affect modern microprocessors that perform branch prediction and other forms of speculation)

Spectre: CPU vulnerability analysis, operation mechanism and protection


1. What is Spectre?

Spectre (CVE-2017-5753, CVE-2017-5715) is a vulnerability of the class of side-channel attacks that uses speculative execution in modern processors to read protected data from memory.

Key Features:

  • Almost all CPUs are affected (Intel, AMD, ARM, IBM)
  • Cannot be fixed by the microarchitecture patch (requires software changes)
  • Allows you to read the memory of the kernel, other processes, and the hypervisor.

2. How does Spectre work?

๐Ÿ”น The main components of the attack

  1. Speculative Execution
  2. The CPU anticipates branches and executes the code before checking the conditions
  3. The results are not recorded, but leave traces in the cache

  4. Timing Attack

  5. Measuring data access time allows you to determine whether a secret key has been speculatively downloaded

  6. Cache Side-Channel Attack

  7. Using Flush+Reload or Prime+Probe to extract data


๐Ÿ”น Detailed mechanism (using the example of Spectre v1)

Step 1: Cheating the Branch Predictor

if (x < array1_size) { // The CPU speculatively executes this block, even if x > array1_size
    value = array2[array1[x] * 256]; // Reading outside the array!
}
Enter fullscreen mode Exit fullscreen mode
  • The attacker selects x to bypass the border check
  • CPU speculatively loads array1[x]

Step 2: Impact on Cache

  • Processor caches array2[array1[x] * 256]
  • Even if the branch is rolled back, the cache remains changed

Step 3: Reading through Timing

// Attacker measures access time to array2 elements
for (int i = 0; i < 256; i++) {
if(measure_access_time(array2[i * 256]) { // Quick access = was in cache
        // Guessing the value of array1[x]
    }
}
Enter fullscreen mode Exit fullscreen mode

3. Spectre Variants

Version CVE The essence
Spectre v1 CVE-2017-5753 Bounds Check Bypass
Spectre v2 CVE-2017-5715 Injection into the Branch Target Buffer (BTB) โ†’ arbitrary speculative code
Spectre v4 CVE-2018-3639 Speculative Store Bypass (read after write)

4. Protection methods

๐Ÿ”น Hardware (microcode/architecture)

  1. Retpoline (Google)

    • Replacing indirect calls with protected sequences
    • Example for GCC: -mretpoline
  2. IBRS/STIBP (Intel)

  3. BTB isolation between processes

  4. Enhanced IBRS (Ice Lake+)

  5. Hardware suppression of speculative execution

๐Ÿ”น Software (compilers/OS)

  1. LFENCE-barriers
  2. Elimination of branches
  3. KPTI (Kernel Page Table Isolation)
  4. Separation of the core and user page tables

5. Current status

  • Spectre is still relevant (new variations appear regularly)
  • Protection reduces productivity (up to 30% for some workloads)
  • Best prevention:
  • CPU microcode updates
  • Compilation with -mretpoline -fno-strict-aliasing
  • Disabling hypertrading in critical systems

๐Ÿ“š Sources

  1. Original article about Spectre (M. Lipp et al.)
  2. Intel Analysis of Speculative Execution
  3. ARM Mitigations Guide

๐Ÿ”น 3. Branch Prediction

Image description
The CPU uses two mechanisms:

  1. Will there be a jump?
  2. The history of previous jumps (for example, 1-bit/2-bit predictor).

    • If a branch is frequently taken โ†’ the CPU predicts "taking".
  3. Where will he jump?

  4. BTB (Branch Target Buffer) โ€“ cache of hop addresses.

    • Stores the address of the target for each jump (jump/call).

๐Ÿ”ธ Problems:

  • BTB overflows โ†’ different jumps crowd each other โ†’ prediction errors.
  • Conditional jumps (JZ, JNZ) predicted worse than unconditional (JMP).

Branch Target Buffer (BTB) is a cache in the processor that remembers where conditional instructions go (for example, if). It predicts where the program will go next time, so that the processor does not wait, but immediately starts executing the necessary code. This speeds up the work, but sometimes it makes mistakes, and then you have to redo everything.


๐Ÿ”น 4. Prediction Error (Branch Misprediction)

  • What's going on?
  • The pipeline is reset (flush).
  • The CPU loses N clock cycles, where N = pipeline length (for example, 14 clock cycles in Intel Skylake).

๐Ÿ”ธ How to minimize it?

  • Branch reduction (replacing if with bit operations).
  • Hint instructions (__builtin_expect in C/C++).
// Better (fewer branches):

x = (a > b) * c;

// Worse (branch misprediction):
if (a > b) x = c; else x = 0;
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Œ Output: Key Points

โœ… Pipeline โ€“ execution of instructions in stages for parallelism.

Speculative execution โ€“ The CPU guesses the branch and executes it in advance.

  • BTB โ€“ cache of jump addresses for fast prediction. Prediction error โ†’ pipeline reset โ†’ loss of performance.

the following will be (in detail)
3.1 Prediction methods for conditional jumps

see u later )

Top comments (0)

OSZAR »