Enable non-temporal store using #pragma vector nontemporal. The nontemporal clause instructs the compiler to use non-temporal (that is, streaming) stores on systems based on all supported architectures, unless specified otherwise; optionally takes a comma-separated list of variables.
When this pragma is specified, it is your responsibility to also insert any fences as required to ensure correct memory ordering within a thread or across threads. One typical way to do this is to insert a _mm_sfence intrinsic call just after the loops (such as the initialization loop) where the compiler may insert streaming store instructions.
Streaming stores may cause significant performance improvements over non-streaming stores for large numbers on certain processors. However, the misuse of streaming stores can significantly degrade performance.
float a[1000];
...
int i;
#pragma vector nontemporal
for (i = 0; i < N; i++)
...float a[1000];
void foo(int N)
{
int i;
#pragma vector nontemporal
for (i = 0; i < N; i++)
{
a[i] = 1;
}
}