Instead of using thread-local ndarrays, a very convenient method is to create an array shaped in
(num_of_threads, <whatever the structure>)
to make the changes thread-safe. For example:
cdef np.ndarray[np.double_t, ndim=2] ret
num_threads = multiprocessing.cpu_count()
ret = np.zeros((num_threads, nbins), dtype=np.float)
and in the
prange
loop:
thread_num = openmp.omp_get_thread_num()
ret[thread_num, l]+=1
...
Afterwards,
ret.sum(axis=0)
is returned. The loop is set to use
num_threads
threads:
with nogil, parallel(num_threads=num_threads):
. By default, I set the
prange
with
schedule='dynamic'
option, on my laptop (ArchLinux), with
openmp=201511
and
gcc=9.2.0
, I found interesting outputs, the summation of
ret
is exactly
num_threads
times the desired results, i.e.
ret.sum(axis=0)[0]
should be 200, but if 4 threads are used in parallel computation, then the output becomes 800. However, this does not happen on my server (CentOS 7.6) with
gcc=4.2.0
and
openmp=201107
. All others are the same (Python, Cython, NumPy...) including .c files generated by Cython, for I use Anaconda. This wired problem is solved after setting
schedule='static'
on my laptop. I think this is some new feature of openmp, I shall read the docs to figure this out when I have some time.
No comments:
Post a Comment