Refactor code with detected regular stride access patterns

The Memory Access Patterns Report shows the following regular stride access(es):
VariablePattern

See details in the Memory Access Patterns Report Source Details view.

To improve memory access: Refactor your code to alert the compiler to a regular stride access. Sometimes, it might be beneficial to use the ipo/Qipo compiler option to enable interprocedural optimization (IPO) between files.

An array is the most common type of data structure containing a contiguous collection of data items that can be accessed by an ordinal index. You can organize this data as an array of structures (AoS) or as a structure of arrays (SoA). Detected constant stride might be the result of AoS implementation. While this organization is excellent for encapsulation, it can hinder effective vector processing. To fix: Rewrite code to organize data using SoA instead of AoS.

However, the cost of rewriting code to organize data using SoA instead of AoS may outweigh the benefit. To fix: Use Intel SIMD Data Layout Templates (Intel SDLT), introduced in version 16.1 of the Intel compiler, to mitigate the cost. Intel SDLT is a C++11 template library that may reduce code rewrites to just a few lines.

Example: Refactor for Vertical Invariant pattern

// main.cpp
int a[8] = {1,0,5,7,4,2,6,3};

// gather.cpp
void test_gather(int* a, int* b, int* c, int* d)
{
    int i, k;
// inefficient access
#pragma omp simd
    for (i = 0; i < INNER_COUNT; i++)
        d[i] = b[a[i%8]] + c[i];

   int b_alt[8];
   for (k = 0; k < 8; ++k)
        b_alt[k] = b[a[k]];

// more effective version
   for (i = 0; i < INNER_COUNT/8; i++)
   {
#pragma omp simd
       for(k = 0; k < 8; ++k)
           d[i*8+k] = b_alt[k] + c[i*8+k];
   }
}

Also make sure vector function clauses match arguments in the calls within the loop (if any). Note: You may use several #pragma declare simd directives to tell the compiler to generate several vector variants of a function.

Example: Compare function calls with their declarations

// functions.cpp
#pragma omp declare simd
int foo1(int* arr, int idx) { return 2 * arr[idx]; }

#pragma omp declare simd uniform(arr) linear(idx)
int foo2(int* arr, int idx) { return 2 * arr[idx]; }

#pragma omp declare simd linear(arr) uniform(idx)
int foo3(int* arr, int idx) { return 2 * arr[idx]; }

// gather.cpp
void test_gather(int* a, int* b, int* c)
{
    int i, k;

// Loop will be vectorized, for complex access patterns gathers could be used for function call.
#pragma omp simd
    for (i = 0; i < INNER_COUNT; i++) a[i] = b[i] + foo1(c,i);

// Loop will be vectorized with vectorized call
#pragma omp simd
    for (i = 0; i < INNER_COUNT; i++) a[i] = b[i] + foo2(c,i);

// Loop will be vectorized with serialized function call
#pragma omp simd
    for (i = 0; i < INNER_COUNT; i++) a[i] = b[i] + foo3(c,i);
}

Read More