OpenMP calls prevent automatic vectorization when the compiler cannot move the calls outside the loop body, such as when OpenMP calls are not invariant. To fix:
- Split the OpenMP parallel loop directive into two directives.
Target Directive Outer #pragma omp parallel [clause, clause, ...] Inner #pragma omp for [clause, clause, ...] - Move the OpenMP calls outside the loop when possible.
#pragma omp parallel for private(tid, nthreads)
for (int k = 0; k < N; k++)
{
tid = omp_get_thread_num(); // this call inside loop prevents vectorization
nthreads = omp_get_num_threads(); // this call inside loop prevents vectorization
...
}#pragma omp parallel private(tid, nthreads)
{
// Move OpenMP calls here
tid = omp_get_thread_num();
nthreads = omp_get_num_threads();
#pragma omp for nowait
for (int k = 0; k < N; k++)
{
...
}
}- omp for, omp parallel recommendations in OpenMP* Pragmas Summary
- Getting Started with Intel Compiler Pragmas and Directives and Vectorization Resources for Intel® Advisor Users