The compiler did not vectorize the remainder loop, even though doing so could improve performance. To fix: Force vectorization using a directive: #pragma vector vecremainder.
...
// Force the compiler to vectorize the remainder loop
#pragma vector vecremainder
for (i=0; i<n; i++)
...void add_floats(float *a, float *b, float *c, float *d, float *e, int n)
{
int i;
// Force the compiler to vectorize the remainder loop
#pragma vector vecremainder
for (i=0; i<n; i++)
{
a[i] = a[i] + b[i] + c[i] + d[i] + e[i];
}
}