Parallelize the loop with both threads and SIMD instructions

The loop is threaded and auto-vectorized; however, the trip count is not a multiple of vector length. To fix: Do all of the following:

Example (original code)

...
#pragma omp parallel for schedule(static)
for (int i = 0; i < n; i++)
...
void f(int a[], int b[], int c[])
{
    #pragma omp parallel for schedule(static)
    for (int i = 0; i < n; i++)
    {
        a[i] = b[i] + c[i];
    }
}

Example (revised code)

...
#pragma omp parallel for simd schedule(simd:static)
for (int i = 0; i < n; i++)
...
void f(int a[], int b[], int c[])
{
    #pragma omp parallel for simd schedule(simd:static)
    for (int i = 0; i < n; i++)
    {
        a[i] = b[i] + c[i];
    }
}

Read More