Move OpenMP call(s) outside the loop body

OpenMP calls prevent automatic vectorization when the compiler cannot move the calls outside the loop body, such as when OpenMP calls are not invariant. To fix:
  1. Split the OpenMP parallel loop directive into two directives.
    TargetDirective
    Outer#pragma omp parallel [clause, clause, ...]
    Inner#pragma omp for [clause, clause, ...]
  2. Move the OpenMP calls outside the loop when possible.

Example (original code)

#pragma omp parallel for private(tid, nthreads)
for (int k = 0; k < N; k++)
{
    tid = omp_get_thread_num(); // this call inside loop prevents vectorization
    nthreads = omp_get_num_threads(); // this call inside loop prevents vectorization
    ...
}

Example (revised code)

#pragma omp parallel private(tid, nthreads)
{
    // Move OpenMP calls here
    tid = omp_get_thread_num();
    nthreads = omp_get_num_threads();

    #pragma omp for nowait
    for (int k = 0; k < N; k++)
    {
        ...
    }
}

Read More