Enable non-temporal store

Enable non-temporal store using #pragma vector nontemporal. The nontemporal clause instructs the compiler to use non-temporal (that is, streaming) stores on systems based on all supported architectures, unless specified otherwise; optionally takes a comma-separated list of variables.

When this pragma is specified, it is your responsibility to also insert any fences as required to ensure correct memory ordering within a thread or across threads. One typical way to do this is to insert a _mm_sfence intrinsic call just after the loops (such as the initialization loop) where the compiler may insert streaming store instructions.

Streaming stores may cause significant performance improvements over non-streaming stores for large numbers on certain processors. However, the misuse of streaming stores can significantly degrade performance.

Example

float a[1000];
...
int i;
#pragma vector nontemporal
for (i = 0; i < N; i++)
...
float a[1000];
void foo(int N)
{
  int i;
  #pragma vector nontemporal
  for (i = 0; i < N; i++)
  {
    a[i] = 1;
  }
}

Read More