# Optical Spectra with fastBSE¶

by Benedikt Maurer for exciting neon

Purpose: In this tutorial, you will learn how calculate the absorption spectrum of diamond on using fastBSE within exciting.

### 0. Preparation¶

Read the following paragraph before starting with the rest of this tutorial!

Before running any Jupyter tutorials, please refer to the 00_before_starting.md document on how to correctly set up the environment. This only needs to be done once. After which, the venv can be (re)activated from exciting's root directory:

source tools/excitingjupyter/venv/excitingvenv/bin/activate


Important note:

• All input parameters that will appear will be given in atomic units!
• For running fastBSE calculations, you need to link exciting to HDF5 and FFTW3

### 1. Introduction¶

In this tutorial you will learn how to calculate the optical absorption spectrum of with the fastBSE algorithm. To demonstrate that the results of the new method are equivalent to the old one, you will also calculate the spectrum with the direct algorithm. If you are not familiar how this is done, begin with the tutorial Excited states from BSE.

fastBSE is low scaling, matrix-free, iterative algorithm to solve the Bethe-Salpeter equation (BSE). It scales with $\mathcal O(N_v N_c(1 + \log N_k))$. This is massive speed up compared to the direct algorithm that scales with $\mathcal O(N_v^3 N_c^3 N_k^3)$.

### 2. Theory¶

In this tutorial, we only consider the Tamm-Dancoff-approximation (TDA) and neglect the coupling between the resonant and anti-resonant components of the Bethe-Salpeter Hamiltonian (BSH). We also neglect finite momentum transitions, such that $\mathbf{q} = \mathbf{0}$.

The heavy lifting of solving the BSE involves setting up and diagonalizing the BSH:

$$H^{BSE} = D + 2V - W,$$

where $D$ represents the diagonal part, accounting for the single electron properties; $V$ is the repulsive, long-range exchange interaction; and $W$ is the attractive screened interaction. To map the problem to linear algebra, we expand the exciton wave function in the two-particle basis:

$$\Phi^\lambda(\mathbf{r}, \mathbf{r'}) = \sum_{vck} C^\lambda_{vck} \psi_{vk}(\mathbf{r}) \psi^*_{ck}(\mathbf{r'}),$$

where $\psi_{ik}(\mathbf{r})$ is a single-electron wave function of band $i$ at $\mathbf{r}$ and the $k$'th Brillouin zone (BZ) vector. The indices $v$ and $c$ refer to valence and conduction bands, respectively, and $\lambda$ is the index of the exciton. Thus, the diagonalization can be rewritten as the following eigenvalue problem:

$$H^{BSE}|\Phi_\lambda\rangle = E^\lambda |\Phi_\lambda\rangle,$$

In this expansion, the diagonal part is given by

$$D_{vck, v'c'k'} = \big(\epsilon_{ck} - \epsilon_{vk}\big) \delta_{cc'} \delta_{vv'} \delta_{kk'},$$

the exchange interaction kernel by

$$V_{vck, v'c'k'} = \int d^3r \int d^3 r' \frac{ u^*_{v'k'}(\mathbf{r'})u_{c'k'}(\mathbf{r'})u^*_{ck}(\mathbf{r})u_{vk}(\mathbf{r}) } {| \mathbf{r - r'} |},$$

and the screened interaction kernel by

$$W_{vck, v'c'k'} = \int d^3r \int d^3 r' u^*_{v'k'}(\mathbf{r'})u_{vk}(\mathbf{r'}) W(\mathbf{r, r'})_{\mathbf{k-k'}} u^*_{ck}(\mathbf{r})u_{c'k'}(\mathbf{r}).$$

Here, it is taken into account that for periodic materials, the wave functions are Bloch waves and can be written as

$$\psi_{ik}(\mathbf{r}) = u_{ik}(\mathbf{r}) \cdot e^{i\mathbf{r} \mathbf{k}},$$

where $u_{ik}(\mathbf{r})$ is the periodic part of the wave functions.

In total, this results in a matrix with $(N_v N_c N_k)^2$ elements. Solving the BSE directly means calculating the full matrix and diagonalizing it. This scales with $\mathcal{O}\big((N_v N_c N_k)^3\big)$. For many problems, a converged solution of the BSE cannot be calculated with reasonable resources.

The fastBSE algorithm avoids calculating the full matrix by employing interpolative separable density fitting (ISDF) to approximate the wavefunction products as $u_{ik}(\mathbf{r})u^*_{jk}(\mathbf{r})$ in the formulas for the integrals:

$$u_{ik}^*(\mathbf{r})u_{jk'}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu} \zeta_\mu(\mathbf{r}) u_{ik}^*(\mathbf{r}_\mu) u_{jk}(\mathbf{r}_\mu).$$

When discretized on a real-space grid, $\{\mathbf{r}_i\}_{i=1}^{N_r}$, $u_{ik}^*(\mathbf{r})u_{jk'}(\mathbf{r})$ can be seen as a matrix $Z \in \mathbb{C}^{N_r \times N_iN_jN_k^2}$, where $N_i$ and $N_j$ represent the number of bands of $u_{ik}$ and $u_{jk'}$, respectively, to be combined. Therefore, for a given $\mathbf{r}$, $u_{ik}(\mathbf{r})^*u_{jk}(\mathbf{r})$ can be viewed as a row vector of size $N_iN_jN_k^2$. The ISDF approximates all such rows as a linear combination of matrix rows with respect to a selected set of rows or interpolation points, and then it interpolates on a sub-grid $\{\mathbf{r}_\mu\}_{\mu=1}^{N_\mu} \subseteq \{\mathbf{r}_i\}_{i=1}^{N_r}$ with typically $N_\mu \ll N_r$. Similarly, we can represent the interpolation coefficients $\zeta_\mu(\mathbf{r})$ in a matrix $\Theta \in \mathbb{C}^{N_r \times N_\mu}$ and the interpolation rows $u_{ik}(\mathbf{r})u_{jk'}(\mathbf{r})$ in a matrix $C \in \mathbb{C}^{N_\mu \times N_iN_jN_k^2}$. Using linear algebra notation, we can express the approximation as:

$$Z \approx \Theta C.$$

In terms of the interpolation coefficients $\Theta$, this equation becomes an over-determined linear system and can be solved using the least squares approximation:

$$\Theta = ZC^* (CC^*)^{-1}.$$

Due to the tensor structure of $Z$ and $C$, the operations $ZC^*$ and $CC^*$ can be computed efficiently. The overall cost of evaluating $\Theta$ is bounded by $\mathcal{O}(N_g N N_kN_\mu + N_\mu^3 + N_gN_\mu^2)$, where $N=N_i + N_j$.

The interpolation points are computed using centroidal Voronoi tessellation (CVT). This method divides the unit cell into clusters with the same weight based on the electron density and can be calculated very efficiently in $\mathcal{O}(N_rN_\mu)$.

Since we have three different wavefunction pairings, we need to perform ISDF calculations three times:

$$u^*_{ck}(\mathbf{r})u_{vk}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu^V} \zeta^V_\mu(\mathbf{r}) u^*_{ck}(\mathbf{r}^V_\mu)u_{vk}(\mathbf{r}^V_\mu).$$$$u^*_{vk}(\mathbf{r})u_{vk'}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu^{W_v}} \zeta^{W_v}_\mu(\mathbf{r}) u^*_{vk}(\mathbf{r}^{W_c}_\mu)u_{vk'}(\mathbf{r}^{W_c}_\mu),$$$$u^*_{ck}(\mathbf{r})u_{ck'}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu^{W_c}} \zeta^{W_c}_\mu(\mathbf{r}) u^*_{ck}(\mathbf{r}^{W_c}_\mu)u_{ck'}(\mathbf{r}^{W_c}_\mu).$$

Replacing the wavefunction pairs in the formulas with these approximations, we obtain for $V$:

$$V_{vc\mathbf{k}, v'c'\mathbf{k}'} \approx \frac{1}{N_k^2} \sum_{\mu=1}^{N_\mu^V} \sum_{\nu=1}^{N_\mu^V} u^*_{c\mathbf{k}} (\mathbf{r}_\mu) u_{v\mathbf{k}} (\mathbf{r}_\mu) \tilde V_{\mu \nu} u^*_{c'\mathbf{k}} (\mathbf{r}_\nu) u_{v'\mathbf{k}} (\mathbf{r}_\nu),$$

with

$$\tilde V_{\mu \nu} = \int_{\Omega^l\times \Omega^l}drdr' \zeta_\mu^{*V}(\mathbf{r}) V(\mathbf{r},\mathbf{r'}) \zeta_\nu^{V}(\mathbf{r'}),$$

and for $W$:

$$W_{vc\mathbf{k}, v'c'\mathbf{k}'} \approx \frac{1}{N_k^2} \sum_{\mu=1}^{N_{\mu}^{W_c}} \sum_{\nu=1}^{N_{\mu}^{W_v}} u^*_{c\mathbf{k}} (\mathbf{r}_\mu) u_{c'\mathbf{k'}} (\mathbf{r}_\mu) \tilde W_{\mu\nu, \mathbf{k-k'}} u^*_{v'\mathbf{k'}} (\mathbf{r}_\nu) u_{v\mathbf{k}} (\mathbf{r}_\nu),$$

with

$$\tilde W_{\mu\nu, \mathbf{k-k'}} = \int_{\Omega^l\times \Omega^l}drdr' \zeta_\mu^{*W_c}(\mathbf{r}) W_{\mathbf{k}-\mathbf{k'}}(\mathbf{r},\mathbf{r'}) \zeta_\nu^{W_v}(\mathbf{r'}).$$

The first thing to note is that this drastically reduces the number of integrals to solve from $\left( N_k N_v N_c \right)^2$ to $\left( N_\mu^V \right)^2$ and $N_{\mu}^{W_v}N_{\mu}^{W_c}$ for $V$ and $W$, respectively, given that we can choose:

$$N_\mu^V \ll N_k N_v N_c \\ N_{\mu}^{W_v} \ll N_k^2N_v^2 \\ N_{\mu}^{W_c} \ll N_k^2N_c^2,$$

which is always true.

Secondly note that this approach does not reduce the number of matrix elements. To utilize the deconstructed matrix, we employ the Lanczos algorithm for diagonalization. The Lanczos algorithm constructs an approximation to the lowest eigenpairs of a matrix through iterative multiplication by a vector. Consequently, the computational complexity of diagonalization is scaled down to that of matrix-vector multiplication. As our focus is solely on the lowest eigenvalues, we do not lose any relevant information by not computing the entire solution. In the decomposed form, as illustrated above, we can efficiently apply the interaction kernels to a vector $X \in \mathbb{C}^{N_vN_cN_k}$:

For the matrix $V$, we derive:

$$[V \cdot X]_{vck} = \frac{1} {N_k} \sum_{k'} \sum_{v'} \sum_{c'} \sum_{\mu=1}^{N_\mu^V} \sum_{\nu=1}^{N_\mu^V} u_{ck}^*(\mathbf{r}_\mu) u_{vk}(\mathbf{r}_\mu) \tilde{V}_{\mu\nu} u^*_{v'k'}(\mathbf{r}_\nu) u_{c'k'}(\mathbf{r}_\nu) \cdot X_{k' v' c'}.$$

Rearranging the summations reveals an efficient evaluation of the matrix-vector product:

$$[V \cdot X]_{vck} = \frac{1} {N_k} \sum_{\mu=1}^{N_\mu^V} u^*_{ck}(\mathbf{r}_\mu) u_{vk}(\mathbf{r}_\mu) \left\{ \sum_{\nu=1}^{N_\mu^V} \tilde V_{\mu\nu} \left[ \sum_{k'} \left( \sum_{c'} u_{c'k'}(\mathbf{r}_\nu) \left[ \sum_{v'} u^*_{v'k'}(\mathbf{r}_\nu) \cdot X_{k' v' c'} \right] \right) \right] \right\},$$

which scales as $\mathcal{O}(N_\mu^V N_v N_c N_k + N_\mu N_c N_k)$, thus linear with respect to the number of $\mathbf{k}$-points and quadratic with the number of electrons ($\mathcal{O}(N_e) \approx \mathcal{O}(N_v) \approx \mathcal{O}(N_c)$).

For the matrix $W$, the expression becomes:

$$[W_A\cdot X]_{vck} = \frac{1}{N_k} \sum_{k'} \sum_{v'} \sum_{c'} \sum_{\nu=1}^{N_\mu^{W_v}} \sum_{\mu=1}^{N_\mu^{W_c}} u^*_{v'k'}(\mathbf r_\nu^{W_v}) u_{vk}(\mathbf r_\nu^{W_v}) \tilde{W}_{\mu\nu, \mathbf{k-k'}} u^*_{ck}(\mathbf r_\mu^{W_c}) u_{c'k'}(\mathbf r_\mu^{W_c}) \cdot X_{v'c'k'}.$$

Once again, reordering the summations yields an efficient matrix-vector product evaluation:

$$[W_A\cdot X]_{vck} = \frac{1} {N_k} \sum_{\nu=1}^{N_\mu^{W_v}} u_{vk}(\mathbf r_\nu^{W_v}) \left\{ \sum_{\mu=1}^{N_\mu^{W_c}} u^*_{ck}(\mathbf r_\mu^{W_c}) \left[ \sum_{\mathbf{k'}} \tilde{W}_{\mu\nu, \mathbf{k-k'}} \left( \sum_{c'} u_{c'k'}(\mathbf r_\mu^{W_c}) \left[ \sum_{v'} \cdot u^*_{\mathbf{k'}j_o}(\mathbf{\hat{r}}_\nu^W) X(\mathbf{k'} j_o j_u) \right] \right) \right] \right\},$$

which scales as $\mathcal{O}(N_\mu^{W_v}N_v N_c N_k + N_\mu^{W_v}N_\mu^{W_c}(N_c N_k + N_k \log N_k))$. The term $\mathcal{O}(N_k \log N_k)$ arises from the sum over $k'$:

$$\sum_{k'} W_{k-k'}A_{k'},$$

which is a discrete convolution that can be efficiently calculated using fast Fourier transformations.

### 3. Calculations¶

As an example, you will calculate the absorption spectrum for diamond on a $4 \times 4 \times 4$ $\mathbf k$-grid. Please note that the parameters are not converged. A converged calculation would require too many resources for a tutorial calculation.

As for all other tutorials, you can choose between executing all steps manually from your command line, or to simply execute the python cells in this Jupyter notebook.

In [1]:
%%bash
# Create working directory
mkdir -p run_fastBSE_diamond


#### 3.1 Groundstate and Screening¶

First, calculate the common properties for both the direct and the fastBSE algorithm. These are the electronic eigen energies $\epsilon_{ik}$, the corresponding momentum matrix elements and the screened coulomb interaction. In this example we use the electronic properties from a DFT groundstate calculation. As for the direct algorithm, you can also use a GW calculations as starting point for the fastBSE algorithm. The following input.xml tells exciting to calculate what we need:

<input>

<title>Diamond: Groundstate and Screening</title>

<structure speciespath="$EXCITINGROOT/species"> <crystal scale="6.74632"> <basevect> 0.5 0.5 0.0 </basevect> <basevect> 0.5 0.0 0.5 </basevect> <basevect> 0.0 0.5 0.5 </basevect> </crystal> <species speciesfile="C.xml" rmt="1.25"> <atom coord="0.00 0.00 0.00" /> <atom coord="0.25 0.25 0.25" /> </species> </structure> <groundstate do="fromscratch" ngridk="13 13 13" xctype="GGA_PBE_SOL" rgkmax="8.0" gmaxvr="18" nempty="20"> </groundstate> <xs xstype="BSE" ngridk="4 4 4" ngridq="4 4 4" vkloff="0.05 0.15 0.25" nempty="50" gqmax="3.0" broad="0.005" tappinfo="true" tevout="true"> <qpointset> <qpoint>0.0 0.0 0.0</qpoint> </qpointset> <energywindow intv="0.0 1.5" points="1500" /> <screening screentype="full" nempty="100"/> <BSE bsetype="singlet" nstlbse="1 4 1 20" solver="direct"/> <plan> <doonly task="scrgeneigvec"/> <doonly task="scrwritepmat"/> <doonly task="screen"/> <doonly task="xsgeneigvec"/> <doonly task="writepmatxs"/> </plan> </xs> </input>  In the xs element we define the general proerties of the BSE calculation. We can leave most of this as it if in the following, all important edits are explained. Note that in the plan element we define a workflow that only calculates what we need for solving the BSE but not solving it. solver="direct" inside the xs element tells exciting to use the direct solver workflow. This attribute is only important if you have not defined a costum workflow as we did in the plan element. In order to run exciting from the terminal, you need to execute the following commands. In [ ]: %%bash cd run_fastBSE_diamond # replace the string$EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
python3 -m excitingscripts.execute.single
cd ..


#### 3.2 Direct BSE Calculation¶

Now calculate the absorption spectrum by solving the BSE directly. Therefore edit the input.xml as follows:

<input>

<title>Diamond: Direct BSE</title>

<structure speciespath="$EXCITINGROOT/species"> <crystal scale="6.74632"> <basevect> 0.5 0.5 0.0 </basevect> <basevect> 0.5 0.0 0.5 </basevect> <basevect> 0.0 0.5 0.5 </basevect> </crystal> <species speciesfile="C.xml" rmt="1.25"> <atom coord="0.00 0.00 0.00" /> <atom coord="0.25 0.25 0.25" /> </species> </structure> <groundstate do="skip" ngridk="13 13 13" xctype="GGA_PBE_SOL" rgkmax="8.0" gmaxvr="18" nempty="20"> </groundstate> <xs xstype="BSE" ngridk="4 4 4" ngridq="4 4 4" vkloff="0.05 0.15 0.25" nempty="50" gqmax="3.0" broad="0.005" tappinfo="true" tevout="true"> <qpointset> <qpoint>0.0 0.0 0.0</qpoint> </qpointset> <energywindow intv="0.0 1.5" points="1500" /> <screening screentype="full" nempty="100"/> <BSE bsetype="singlet" nstlbse="1 4 1 20" solver="direct"/> <plan> <doonly task="scrcoulint"/> <doonly task="exccoulint"/> <doonly task="bse"/> </plan> </xs> </input>  do="skip" inside the groundstate element. We do not need to calculate the groundstate again. The plan element defines now a workflow to setup the interaction kernels and solve the BSE. In order to run exciting from the terminal, you need to execute the following commands. In [ ]: %%bash cd run_fastBSE_diamond # replace the string$EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
python3 -m excitingscripts.execute.single
cd ..


#### 3.3 fastBSE¶

Now we will calculate the same spectrum but use the fastBSE algorithm. Therefore edit the input.xml as follows:

<input>

<title>Diamond: fastBSE</title>

<structure speciespath="$EXCITINGROOT/species"> <crystal scale="6.74632"> <basevect> 0.5 0.5 0.0 </basevect> <basevect> 0.5 0.0 0.5 </basevect> <basevect> 0.0 0.5 0.5 </basevect> </crystal> <species speciesfile="C.xml" rmt="1.25"> <atom coord="0.00 0.00 0.00" /> <atom coord="0.25 0.25 0.25" /> </species> </structure> <groundstate do="skip" ngridk="13 13 13" xctype="GGA_PBE_SOL" rgkmax="8.0" gmaxvr="18" nempty="20"> </groundstate> <xs xstype="BSE" ngridk="4 4 4" ngridq="4 4 4" vkloff="0.05 0.15 0.25" nempty="50" gqmax="3.0" broad="0.005" tappinfo="true" tevout="true"> <qpointset> <qpoint>0.0 0.0 0.0</qpoint> </qpointset> <energywindow intv="0.0 2.0" points="1500" /> <screening screentype="full" nempty="100"/> <BSE bsetype="singlet" nstlbse="1 4 1 20" solver="fastBSE"/> <fastBSE ngridr="20 20 20" nisdf="80 200 200" nlanczos="200"/> <plan> <doonly task="write_screened_coulomb"/> <doonly task="fastBSE_groundstate_properties"/> <doonly task="fastBSE_isdf_cvt"/> <doonly task="fastBSE_main"/> <doonly task="fastBSE_human_readable_output"/> </plan> </xs> </input>  The plan element defines a workflow with the remaining tasks to calculate the absorption spectrum with the fastBSE algorithm. Without it, exciting will run a workflow, that automatically calculates all necessary properties. solver="fastBSE" inside the xs element tells exciting to use the fastBSE solver. Since we use here a custom plan it has no effect. The fastBSE element in the xs element defines the following parameters for fast BSE. • ngridr="20 20 20" inside the xs defines the real space grid on which$u_{ik}(\mathbf r)$is calculated. • nisdf="80 200 200" inside the xs defines the number of interpolation points for the ISDF of the$u_{vk}(\mathbf r)u_{ck}(\mathbf r)$pairs in$V$and$u_{vk}(\mathbf r)u_{v'k'}(\mathbf r)$and$u_{ck}(\mathbf r)u_{c'k'}(\mathbf r)$in$W$respectively. • nlanczos="200" inside the xs defines the maximum number of Lanczos iterations to be calculated. As you can see, the properties of the BSE calculation do not need to be edited or copied. You only need to set solver="fastBSE" and define the properties for the fastBSE algorithm. In order to run exciting from the terminal, you need to execute the following commands. In [7]: %%bash cd run_fastBSE_diamond # replace the string$EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
python3 -m excitingscripts.execute.single
cd ..


### 4. Compare the results¶

By comparing the runtime, you already see that the fastBSE algorithm is significantly faster.

To convince yourself that the absorption spectra calculated with both algorithms are equivalent plot the absorption spectrum by running the following commands. The image is saved as PLOT.png.

In [8]:
%%bash
cd run_fastBSE_diamond
cp EPSILON/EPSILON_BSE-singlet-TDA-BAR_SCR-full_OC11.OUT direct
cp fastBSE_absorption_spectrum.out fastBSE
python3 -m excitingscripts.plot.files -t 'Absorption Spectrum: Diamond' -f direct fastBSE -cx 1 -cy 3 -lx '$\omega$ [eV]'  -ly 'Im $\epsilon_M$'  -x 0 40
cd ..