Possible register spilling along with high vector register pressure is preventing effective vectorization. To fix: Use the directive !DIR$ DISTRIBUTE POINT or rewrite your code to distribute the source loop. This can decrease register pressure as well as enable software pipelining and improve both instruction and data cache use.
!DIR$ DISTRIBUTE POINT
do i = 1, m
...
b(i) = a(i) + 1
...
c(i) = a(i) + b(i) ! Compiler will decide
! where to distribute.
! Data dependencies are observed
...!DIR$ DISTRIBUTE POINT
do i = 1, m
b(i) = a(i) + 1
...
c(i) = a(i) + b(i) ! Compiler will decide
! where to distribute.
! Data dependencies are observed
...
d(i) = c(i) + 1
enddo
do i =1, m
b(i) = a(i) + 1
...
!DIR$ DISTRIBUTE POINT
call sub(a, n)! Distribution will start here,
! ignoring all loop-carried depedencies
c(i) = a(i) + b(i)
...
d(i) = c(i) + 1
enddo