One of the memory accesses in the source loop does not start at an optimally aligned address boundary. To fix: Align the data and tell the compiler the data is aligned. To align data, use __declspec(align()). To tell the compiler the data is aligned, use __assume_aligned() before the source loop.