Based on such demands, I designed a co-pilot which can:
- Automatically clustering the clusters;
- Remove periodic boundary conditions and make the center-of-mass at $(0,0,0)^T$;
- Adjust the view vector along the minor axis of the aggregate;
- Classify the aggregate.
After obtaining clusters in step 1, we must remove periodic boundary conditions of the cluster. If in step 1, one uses BFS or DFS + Linked Cell List method, then one can remove periodic boundary condition during clustering; but this method has limitations, it does not work properly if the cluster is percolated throughout the box. Therefore, in this step, I use circular statistics to deal with the clusters. In periodic boundary condition simulation box, distance between an NP in an aggregate and the midpoint will never exceed $L/2$ in corresponding box dimension. Midpoint here is not center-of-mass, e.g., distance between a nozzle point and center-of-mass of an ear-syringe-bulb is clearly larger than its half length; midpoint is a actually a "de-duplicated" center-of-mass. Besides, circular mean also puts most points in the center in case of percolation. Therefore, in part 2, we have following steps:
- Choose a $r_\text{cut}$ to test whether the aggregate is percolate;
- If the aggregate is percolate, evaluate the circular mean of all points $r_c$;
- Set all coordinates $r$ as $r\to pbc(r-r_c)$;
- If the aggregate is not percolate, midpoint is evaluated by calculating circular mean of coordinates $r$ where $\rho(r)>0$, $\rho(r)$ is calculated using bin size that smaller than $r_\text{cut}$ used in step 1;
- Same as step 3, update coordinates;
- After step 5, the aggregates are unwrapped from the box, set $r\to r-\overline{r}$ to set center-of-mass at $(0,0,0)^T$
$$\alpha=\underset{\beta}{\operatorname{argmin}}\sum_i (1-\cos(\alpha_i-\beta))$$
Adjusting the view vector is simple, evaluate the eigenspace of gyration tensor as $rr^T/n$ and sort the eigenvectors by eigenvalue, i.e., $\lambda_1\ge\lambda_2\ge\lambda_3$, then the minor axis is corresponding eigenvector $v_3$, the aggregate then can be rotated by $[v_1, v_2, v_3]$ so that the minor axis is $z$-axis.
The last step is a bit more tricky, the best trail I attempted was to use SVC, a binary classification method. I used about 20 samples labeled as "desired", these 20 samples were extended to 100 samples by adding some noises to the samples, e.g., moving coordinates a little bit, adding several NPs into the aggregate or removing several NPs randomly, without "breaking" the category of the morphology. Together with 100 "undesired" samples, I trained the SVC with a Gaussian kernel. The result turned out to be pretty good. I also tried to use ANN to classify all 5 categories of morphologies obtained from simulations, but ANN model did not work very well, perhaps the reason was lack of samples or the model I built was too rough. I didn't try other multi-class methods, anyway, that part of work was done, I stopped developing this co-pilot long time ago.
No comments:
Post a Comment