A note on Cython Parallelism

Instead of using thread-local ndarrays, a very convenient method is to create an array shaped in (num_of_threads, <whatever the structure>) to make the changes thread-safe. For example:
cdef np.ndarray[np.double_t, ndim=2] ret
num_threads = multiprocessing.cpu_count()
ret = np.zeros((num_threads, nbins), dtype=np.float)
and in the prange loop:
thread_num = openmp.omp_get_thread_num()
ret[thread_num, l]+=1
Afterwards, ret.sum(axis=0) is returned. The loop is set to use num_threads threads: with nogil, parallel(num_threads=num_threads):. By default, I set the prange with schedule='dynamic' option, on my laptop (ArchLinux), with openmp=201511 and gcc=9.2.0, I found interesting outputs, the summation of ret is exactly num_threads times the desired results, i.e. ret.sum(axis=0)[0] should be 200, but if 4 threads are used in parallel computation, then the output becomes 800. However, this does not happen on my server (CentOS 7.6) with gcc=4.2.0 and openmp=201107. All others are the same (Python, Cython, NumPy...) including .c files generated by Cython, for I use Anaconda. This wired problem is solved after setting schedule='static' on my laptop. I think this is some new feature of openmp, I shall read the docs to figure this out when I have some time.

Notes on scipy.optimize.minimize

Problem: evenly distributed points on ellipsoid surfaces.
Solution: numerically minimizing energy of charges constrained on the ellipsoid surface.
Assuming the ellipsoid satisfies equation:
the ratios of 3 axes of the ellipsoid surface are $(a,b,c)^T$. The minimization process is:
  1. let
    $$\mathbf{x}^\prime:=\frac{1}{\sqrt{\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2+\left(\frac{z}{c}\right)^2}}(x, y, z)^T$$
    then $\mathbf{x}^\prime$ satisfy the ellipsoid equation;
  2. minimize the energy function
    $$u=\sum_i \sum_{j>i} \frac{1}{\sqrt{(x^\prime_i-x^\prime_j)^2+(y^\prime_i-y^\prime_j)^2+(z^\prime_i-z^\prime_j)^2}}$$
  3. the gradient vector is, e.g. the $x$ component of ith particle:
    $$\partial u/\partial x_i=\sum_{j\ne i} - ((x^{\prime 2}_i/a^2-1)(x^\prime_j-x^\prime_i) +$$ $$(x^{\prime}_i y^{\prime}_i(y^{\prime}_j-y^{\prime}_i)+x^{\prime}_i z^{\prime}_i (z^{\prime}_j-z^{\prime}_i))/a^2)/d^3$$ with $d=\sqrt{(x^{\prime}_i -x^{\prime}_j)^2+(y^{\prime}_i -y^{\prime}_j)^2+(z^{\prime}_i -z^{\prime}_j)^2}$
    ***NOTE THE GRADIENT IS CALCULATED WITH RESPECT TO $\mathbf{x}$, $\nabla_\mathbf{x}U$, NOT $\mathbf{x}^\prime$, $\nabla_{\mathbf{x}^\prime}U$***
The scipy.optimize.minimize function takes jac=True parameter if the function provides gradient vector, this would boost up the program dramatically.

Dealing with aggregates: I don't like tedious work

I used to study nanoparticles (NPs) self-assembly in polymer matrices before 2017 by coarse-grained simulation method. Analyzing morphologies of self-assemblies is a tedious work: one needs to repeatedly watch frames from simulation trajectories, finding appropriate view points, counting number of aggregates, measure size of aggregates, etc. Obtaining integrate aggregates under periodic boundary conditions is also a challenging work. In addition, one usually needs to run plenty of simulations under different conditions to find a desired morphology; selecting desired morphologies automatically is even more challenging: most of the morphologies are ill-defined, it is hard to use some common  characterization methods, e.g., $g(r)$ or $S(q)$ of NPs give little differences amongst percolated morphologies.
Based on such demands, I designed a co-pilot which can:
  1. Automatically clustering the clusters;
  2. Remove periodic boundary conditions and make the center-of-mass at $(0,0,0)^T$;
  3. Adjust the view vector along the minor axis of the aggregate;
  4. Classify the aggregate.
In step 1, firstly, I calculate the $g(r)$ of NP centers and find $r$ where $g(r)$ reaches its 1st valley, simultaneously, the distance matrix under periodic boundary condition. Using this distance matrix and $r$, a DBSCAN or HDBSCAN algorithm is performed to cluster NPs. In simulations, usually systems with very low loading of NPs are considered, DBSCAN is efficient enough; for some co-polymer self-assembly systems, a coarse-grained method is preferred: divide simulation box into grids and DBSCAN on the grids with density as weights, then perform DBSCAN on each coarse-grained clusters. DFS or BFS + Linked Cell List is also a good choice if particles are too many. All these struggles are due to that  kd-tree is not working under periodic boundary conditions.

After obtaining clusters in step 1, we must remove periodic boundary conditions of the cluster. If in step 1, one uses BFS or DFS + Linked Cell List method, then one can remove periodic boundary condition during clustering; but this method has limitations, it does not work properly if the cluster is percolated throughout the box. Therefore, in this step, I use circular statistics to deal with the clusters. In periodic boundary condition simulation box, distance between an NP in an aggregate and the midpoint will never exceed $L/2$ in corresponding box dimension. Midpoint here is not center-of-mass, e.g., distance between a nozzle point and center-of-mass of an ear-syringe-bulb is clearly larger than its half length; midpoint is a actually a "de-duplicated" center-of-mass. Besides, circular mean also puts most points in the center in case of percolation. Therefore, in part 2, we have following steps:
  1. Choose a $r_\text{cut}$ to test whether the aggregate is percolate;
  2. If the aggregate is percolate, evaluate the circular mean of all points $r_c$;
  3. Set all coordinates $r$ as $r\to pbc(r-r_c)$;
  4. If the aggregate is not percolate, midpoint is evaluated by calculating circular mean of coordinates $r$ where $\rho(r)>0$, $\rho(r)$ is calculated using bin size that smaller than $r_\text{cut}$ used in step 1;
  5. Same as step 3, update coordinates;
  6. After step 5, the aggregates are unwrapped from the box, set $r\to r-\overline{r}$ to set center-of-mass at $(0,0,0)^T$
Circular mean $\alpha$ of samples $\lbrace\alpha_i\rbrace$ is defined as the minimization of summation of a circular metric function $d(\alpha,\beta):=1-\cos(\alpha-\beta)$
$$\alpha=\underset{\beta}{\operatorname{argmin}}\sum_i (1-\cos(\alpha_i-\beta))$$
Adjusting the view vector is simple, evaluate the eigenspace of gyration tensor as $rr^T/n$ and sort the eigenvectors by eigenvalue, i.e., $\lambda_1\ge\lambda_2\ge\lambda_3$, then the minor axis is corresponding eigenvector $v_3$, the aggregate then can be rotated by $[v_1, v_2, v_3]$ so that the minor axis is $z$-axis.

The last step is a bit more tricky, the best trail I attempted was to use SVC, a binary classification method. I used about 20 samples labeled as "desired", these 20 samples were extended to 100 samples by adding some noises to the samples, e.g., moving coordinates a little bit, adding several NPs into the aggregate or removing several NPs randomly, without "breaking" the category of the morphology. Together with 100 "undesired" samples, I trained the SVC with a Gaussian kernel. The result turned out to be pretty good. I also tried to use ANN to classify all 5 categories of morphologies obtained from simulations, but ANN model did not work very well, perhaps the reason was lack of samples or the model I built was too rough. I didn't try other multi-class methods, anyway, that part of work was done, I stopped developing this co-pilot long time ago.

Eigenvalues of circulant matrices

A circulant matrix is defined as
$$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$
where $C_{j, k}=c_{j-k \mod n}$, the $k$-th eigenvalue $\lambda_k$ and eigenvector $x_k$ satisfy $C\cdot x_k=\lambda_k x_k$, or, equivalently, $n$ equations as:
$$\sum_{j=0}^{m-1}c_{m-j}x_j+\sum_{j=m}^{n-1}c_{n-j+m}x_j=\lambda_k x_m\quad m=0,1,\dots,n-1$$
with $c_n=c_0$, $x_m$ is the $m$-th compoent of an eigenvector $x_k$. Changing the dummy summing ($j\to m-j$ and $j\to n-j+m$) variables yields
$$\sum_{j=1}^{m}c_j x_{m-j} +\sum_{j=m+1}^{n}c_j x_{n+m-j}=\lambda_k x_m$$
with $m=0,1,2,\dots,n-1$. One can "guess" a solution that $x_j=\omega^j$, therefore the equation above turns into
$$\begin{align}&\sum_{j=1}^{m}c_j \omega^{m-j} +\sum_{j=m+1}^{n}c_j \omega^{n+m-j}=\lambda_k \omega^m\\ \leftrightarrow &\sum_{j=1}^{m}c_j \omega^{-j} +\omega^{n}\sum_{j=m+1}^{n}c_j \omega^{-j}=\lambda_k\end{align}$$
Let $\omega$ be one of the $n$-th square root of unity, i.e., $\omega = \exp(-2\pi\mathbb{i}/n)$,  then $\omega^{n}=1$; we have an eigenvalue
$$\lambda = \sum_{j=0}^{n-1}\omega^{-j} c_j$$
with corresponding eigenvector
$$x= (1, \exp(2\pi\mathbb{i}/n), \exp(2\pi\mathbb{i}/n)^2,\dots,\exp(2\pi\mathbb{i}/n)^{n-1})^T$$
The $k$-th eigenvalue is generated from $\omega^{-k} = \exp(2\pi k\mathbb{i}/n)$ with $k\in [0,n-1]$, yields the $k$-th eigenvalue and eigenvector:
$$\lambda_k = \sum_{j=0}^{n-1}c_j \exp(2\pi k\mathbb{i}/n)$$
$$x_k = (1, \exp(2\pi k\mathbb{i}/n), \exp(2\pi k\mathbb{i}/n)^2,\dots,\exp(2\pi k\mathbb{i}/n)^{n-1})^T$$
The eigenspace is just the DFT matrix, and ALL circulant matrices share same eigenspace, it is easily to verify that circulant matrices have following properties:
If $A$ and $B$ are circulant matrices, then
  1. $AB=BA=W^\ast\Gamma W$ with $W$ is the DFT matrix and $\Gamma=\Gamma_A\Gamma_B$, $\Gamma_i$ represents diagonal matrix consists of eigenvalues of $i$; $\Gamma_A = WAW^\ast$;
  2. $B+A=A+B=W^\ast\Omega W$, $\Omega=\Gamma_A+\Gamma_B$;
  3. If $\mathrm{det}(A)\ne 0$, then $A^{-1}=W^\ast \Gamma_A^{-1}W$;
The proof is straightforward:
  1. $AB=W^\ast \Gamma_AW W^\ast\Gamma_BW=$
    $W^\ast\Gamma_A\Gamma_BW=W^\ast \Gamma_B\Gamma_AW=BA$
  2. $W(A+B)W^\ast=\Gamma_A+\Gamma_B$; 
  3. $AW^\ast \Gamma_A^{-1}W=W^\ast \Gamma_AW W^\ast \Gamma_A^{-1}W=\mathbb{I}$
Back2Top ^