The compiler generated a masked vectorized remainder loop that contains too few iterations for efficient vector processing. A scalar loop may be more beneficial. To fix: Force scalar remainder generation using a directive: #pragma vector novecremainder.
...
// Force the compiler to not vectorize the remainder loop
#pragma vector novecremainder
for (i=0; i<n; i++)
...void add_floats(float *a, float *b, float *c, float *d, float *e, int n)
{
int i;
// Force the compiler to not vectorize the remainder loop
#pragma vector novecremainder
for (i=0; i<n; i++)
{
a[i] = a[i] + b[i] + c[i] + d[i] + e[i];
}
}