Change the chunk size

The loop is threaded and vectorized using the #pragma omp parallel for simd directive, which parallelizes the loop with both threads and SIMD instructions. Specifically, the directive divides loop iterations into chunks (subsets) and distributes the chunks among threads, then chunk iterations execute concurrently using SIMD instructions. In this case, the chunk size (number of iterations per chunk) is not a multiple of vector length. To fix: Add a schedule (simd: [kind]) modifier to the #pragma omp parallel for simd directive.

Example

...
// Guarantee a multiple of vector length.
#pragma omp parallel for simd schedule(simd: static)
for (int i = 0; i < n; i++)
...
void f(int a[], int b[], int[c])
{
    // Guarantee a multiple of vector length.
    #pragma omp parallel for simd schedule(simd: static)
    for (int i = 0; i < n; i++)
    {
        a[i] = b[i] + c[i];
    }
}

Read More