Move OpenMP call(s) outside the loop body

OpenMP calls prevent automatic vectorization when the compiler cannot move the calls outside the loop body, such as when OpenMP calls are not invariant. To fix:
  1. Split the OpenMP parallel loop directive into two directives.
    TargetDirective
    Outer!$OMP PARALLEL [clause[[,] clause] ... ]
    Inner!$OMP DO [clause[[,] clause] ... ]
  2. Move the OpenMP calls outside the loop when possible.

Example (original code)

!$OMP PARALLEL DO PRIVATE(tid, nthreads)
do k = 1, N
    tid = omp_get_thread_num() ! this call inside loop prevents vectorization
    nthreads = omp_get_num_threads() ! this call inside loop prevents vectorization
    ...
enddo

Example (revised code)

!$OMP PARALLEL PRIVATE(tid, nthreads)
! Move OpenMP calls here
tid = omp_get_thread_num()
nthreads = omp_get_num_threads()

!$OMP DO NOWAIT
do k = 1, N
    ...
enddo
!$OMP END PARALLEL

Read More