Split loop into smaller loops

Possible register spilling along with high vector register pressure is preventing effective vectorization. To fix: Use the directive #pragma distribute_point or rewrite your code to distribute the source loop. This can decrease register pressure as well as enable software pipelining and improve both instruction and data cache use.

Example

...
for (i=0; i< NUM; i++)
{
    ...
    c[i] = c[i] +i;
    #pragma distribute_point
    x[i] = x[i] +i;
    ...
}
...
#define NUM 1024
void loop_distribution_pragma2(
       double a[NUM], double b[NUM], double c[NUM],
       double x[NUM], double y[NUM], double z[NUM] )
{
    int i;
    // After distribution or splitting the loop.
    for (i=0; i< NUM; i++)
    {
        a[i] = a[i] +i;
        b[i] = b[i] +i;
        c[i] = c[i] +i;
        #pragma distribute_point
        x[i] = x[i] +i;
        y[i] = y[i] +i;
        z[i] = z[i] +i;
    }
}

Read More