The compiler never targets loops other than innermost ones, so it vectorized the inner loop while did not vectorize the outer loop. However outer loop vectorization could be more profitable because of better Memory Access Pattern, higher Trip Counts or better Dependencies profile.
To enforce outer loop vectorization:
To enforce outer loop vectorization:
| Target | Directive |
|---|---|
| Outer loop | !$OMP SIMD |
| Inner loop | !$OMP NOVECTOR |
Given issue is only about opportunity to vectorize outer loop, to prove profitability you need perform deeper dive analysis (MAP, Trip Counts, Dependencies)
!$OMP SIMD
DO I=1,N
!$OMP NOVECTOR
DO J=1,N
...!$OMP SIMD
DO I=1,N
!$OMP NOVECTOR
DO J=1,N
SUM = SUM + A(i)*A(j)
ENDDO
ENDDO