Cython
several days ago, to calculate Coulomb energy of $n$ points. I wrote the code as follows:@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret = ret + 1 / sqrt(tmp)
return ret
It passed the compilation but produced an NaN result. I Googled about half an hour with "cython parallel yield nan" but no answers were found, I also tried to add variables tracing the
tmp
variable... After about 1 desperate hour of aimlessly Googling with every possible combinations of meaningless keywords, reading the sample code in Cython doc over and over or seperating the modulus part as an independent function... suddenly I found on the top of the doc, under "Thread-locality and reductions.." line it said clearly about how the variables work in/out the prange
loop. Therefore I tried to modify the code:@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret += 1 / sqrt(tmp)
return ret
Just by changing ret = ret + ...
to ret += ...
, then the problem was solved. What was the meaning of wasting all this time of aimless trials? Why not just READ THE F**KING MANNUL?
No comments:
Post a Comment