| Variable | Pattern |
|---|
See details in the Memory Access Patterns Report Source Details view.
To improve memory access: Refactor your code to alert the compiler to a regular stride access. Sometimes, it might be beneficial to use the ipo/Qipo compiler option to enable interprocedural optimization (IPO) between files.
An array is the most common type of data structure containing a contiguous collection of data items that can be accessed by an ordinal index. You can organize this data as an array of structures (AoS) or as a structure of arrays (SoA). Detected constant stride might be the result of AoS implementation. While this organization is excellent for encapsulation, it can hinder effective vector processing. To fix: Rewrite code to organize data using SoA instead of AoS.
However, the cost of rewriting code to organize data using SoA instead of AoS may outweigh the benefit. To fix: Use Intel SIMD Data Layout Templates (Intel SDLT), introduced in version 16.1 of the Intel compiler, to mitigate the cost. Intel SDLT is a C++11 template library that may reduce code rewrites to just a few lines.
// main.cpp
int a[8] = {1,0,5,7,4,2,6,3};
// gather.cpp
void test_gather(int* a, int* b, int* c, int* d)
{
int i, k;
// inefficient access
#pragma omp simd
for (i = 0; i < INNER_COUNT; i++)
d[i] = b[a[i%8]] + c[i];
int b_alt[8];
for (k = 0; k < 8; ++k)
b_alt[k] = b[a[k]];
// more effective version
for (i = 0; i < INNER_COUNT/8; i++)
{
#pragma omp simd
for(k = 0; k < 8; ++k)
d[i*8+k] = b_alt[k] + c[i*8+k];
}
}Also make sure vector function clauses match arguments in the calls within the loop (if any). Note: You may use several #pragma declare simd directives to tell the compiler to generate several vector variants of a function.
// functions.cpp
#pragma omp declare simd
int foo1(int* arr, int idx) { return 2 * arr[idx]; }
#pragma omp declare simd uniform(arr) linear(idx)
int foo2(int* arr, int idx) { return 2 * arr[idx]; }
#pragma omp declare simd linear(arr) uniform(idx)
int foo3(int* arr, int idx) { return 2 * arr[idx]; }
// gather.cpp
void test_gather(int* a, int* b, int* c)
{
int i, k;
// Loop will be vectorized, for complex access patterns gathers could be used for function call.
#pragma omp simd
for (i = 0; i < INNER_COUNT; i++) a[i] = b[i] + foo1(c,i);
// Loop will be vectorized with vectorized call
#pragma omp simd
for (i = 0; i < INNER_COUNT; i++) a[i] = b[i] + foo2(c,i);
// Loop will be vectorized with serialized function call
#pragma omp simd
for (i = 0; i < INNER_COUNT; i++) a[i] = b[i] + foo3(c,i);
}