Align data

One of the memory accesses in the source loop does not start at an optimally aligned address boundary. To fix: Align the data and tell the compiler the data is aligned.

Dynamic Data:

To align dynamic data, replace malloc() and free() with _mm_malloc() and _mm_free(). To tell the compiler the data is aligned, use __assume_aligned() before the source loop. Also consider using #include <aligned_new> to enable automatic allocation of aligned data.

Static Data:

To align static data, use __declspec(align()). To tell the compiler the data is aligned, use __assume_aligned() before the source loop.

Example - Dynamic Data

Align dynamic data using a 64-byte boundary and tell the compiler the data is aligned:
float *array;
array = (float *)_mm_malloc(ARRAY_SIZE*sizeof(float), 32);
// Somewhere else
__assume_aligned(array, 32);
// Use array in loop
_mm_free(array);

Example - Static Data

Align static data using a 64-byte boundary:
__declspec(align(64)) float array[ARRAY_SIZE]

Read More