This Blog is Untitled: January 2020

Structure factor (SQ) is a characterization quantity which is frequently calculated in molecular simulations. It is easily to code the calculation program according to the definition. In most molecular simulations, periodic boundary condition is adopted therefore

$$S(\mathbf{q}):=\mathcal{FT}\lbrace\langle\rho(\mathbf{r})\rho(\mathbf{0})\rangle\rbrace(\mathbf{q})$$

is actually a circular convolution, where FFT can give a dramatic boost in contrast to calculate directly. The steps using FFT are:

Calculate $\rho(\mathbf{r})$, which is a summation of Dirac-Delta function that can be estimated as a hisotgram;
Calculate $\hat{\rho}(\mathbf{q})$ by FFT;
$S(\mathbf{q})=|\hat{\rho}(\mathbf{q})|^2$;
Calculate mean over modulus: $S(q)=\int S(\mathbf{q})\delta(|\mathbf{q}|-q)\mathrm{d}\mathbf{q}/\int \delta(|\mathbf{q}|-q)\mathrm{d}\mathbf{q}$

Efficiency

If the simulation box is divided into $N$ (in 3D systems, $N=N_xN_yN_z$ for example) bins, the FFT gives $O(N\log(N))$ complexity and step 4 is $O(N)$. Generally, in comparison with the direct method, one needs at least loop over number of particles and $N$ bins for $\mathbf{q}$, the number of particles is obviously, way larger than $\log(N)$. For data of most simulations are real numbers, rFFT designed for real inputs could further boost the program. Here is code for a 3D example, i, j, k represent $N_x$, $N_y$, $N_z$ respectively.

fftn(a) == np.concatenate([rfftn(a), conj(rfftn(a))[-np.arange(i),-np.arange(j),np.arange(k-k//2-1,0,-1)]], axis=-1)

Accuracy

Binsize effect.
Randomness in the bin

Formulae of umbrella sampling method can be found on wikipedia. In umbrella sampling, a reaction coordinate $\xi$ is pre-defined from the atomic coordinates, a bias potential is added to the atoms of interest to keep $\xi$ of the system at a specific window $\xi_w$. The bias form is usually a harmonic potential:
$$u^b_w(\xi)=0.5k_w(\xi-\xi_w)^2$$
Therefore, the energy of biased system $A^b_w = A^{ub}_w + u^b_w(\xi)$. The superscript $ub$ is short for "un-biased". In the simulations, we can sample the reaction coordinate in each window and evaluate their distribution $P^b(\xi)$, since the free energy $A=-k_BT\ln(P)$, we have:
$$A_w^{ub}(\xi) = -k_BT\ln(P^b_w(\xi))-u^b_w(\xi)-F_w$$
with $F_w$ is a reference free energy of each window and remains an unknown constant. One method to derive $F_w$ is WHAM, in year 2005, Kästner et al. (The Journal of Chemical Physics 123, no. 14 (October 8, 2005): 144104.) have proposed a new method whose idea is to take derivative of $A^u_w$ with respect to $\xi$ to eliminate $F_w$, In this method, $P(\xi)$ is assumed to be analytic, e.g., a Gaussian distribution. The general idea is to expand $A(\xi)$ into Taylor series (at $\langle \xi \rangle_w$):
$$ A(\xi)=a_1 \xi + a_2 \xi^2 + a_3 \xi^3 +\ldots$$
if $a_i=0$ with $i\ge 3$, the Gaussian form is restored. For systems with higher order terms, the distributions, for example, generally have a non-zero skewness. The crucial step is to determine $\lbrace a_i\rbrace$. In practice, the probability with form of $\exp(-\sum_i a_i \xi^i)$ is difficult to determine, in year 2012, Kästner et al. (The Journal of Chemical Physics 136, no. 23 (June 21, 2012): 234102) studied cases with order $4$ and small $a_3, a_4$. I have tried another more "numerical" method to calculate $\lbrace a_i\rbrace$ when datasets are poorer: fit $\exp(-\sum_i a_i \xi ^i)$ with a Gaussian KDE to find $\lbrace a_i \rbrace$. For some "poor" datasets, higher-order terms of expansion of free energy are required. The normalization factor of the distribution is estimated as $n=\int_{\xi_\mathrm{min}}^{\xi_\mathrm{max}} P(\xi)\mathrm{d}\xi$, the fitting range should be carefully chosen to ensure the convergence of $n$.

This Blog is Untitled

Not SQ again...

Free energy calculation: umbrella integration

About Me

Search This Blog

Archive

Labels

Links

Pages

License