**Purpose**: In this tutorial, you will learn how calculate the absorption spectrum of diamond on using **fastBSE** within exciting.

**Read the following paragraph before starting with the rest of this tutorial!**

Before running any Jupyter tutorials, please refer to the ** 00_before_starting.md** document on how to correctly set up the environment. This only needs to be done once. After which, the

```
source tools/excitingjupyter/venv/excitingvenv/bin/activate
```

**Important note:**

- All input parameters that will appear will be given in
**atomic units**! - For running
**fastBSE**calculations, you need to link exciting to**HDF5**and**FFTW3**

In this tutorial you will learn how to calculate the optical absorption spectrum of with the **fastBSE** algorithm. To demonstrate that the results of the new method are equivalent to the old one, you will also calculate the spectrum with the **direct** algorithm. If you are not familiar how this is done, begin with the tutorial **Excited states from BSE**.

**fastBSE** is low scaling, matrix-free, iterative algorithm to solve the Bethe-Salpeter equation (**BSE**). It scales with $\mathcal O(N_v N_c(1 + \log N_k))$. This is massive speed up compared to the **direct** algorithm that scales with $\mathcal O(N_v^3 N_c^3 N_k^3)$.

The **fastBSE** algorithm was published by **Henneke and collegues**.

In this tutorial, we only consider the **Tamm-Dancoff-approximation** (**TDA**) and neglect the coupling between the resonant and anti-resonant components of the **Bethe-Salpeter Hamiltonian** (**BSH**). We also neglect finite momentum transitions, such that $\mathbf{q} = \mathbf{0}$.

The heavy lifting of solving the **BSE** involves setting up and diagonalizing the **BSH**:

where $D$ represents the diagonal part, accounting for the single electron properties; $V$ is the repulsive, long-range exchange interaction; and $W$ is the attractive screened interaction. To map the problem to linear algebra, we expand the exciton wave function in the two-particle basis:

$$ \Phi^\lambda(\mathbf{r}, \mathbf{r'}) = \sum_{vck} C^\lambda_{vck} \psi_{vk}(\mathbf{r}) \psi^*_{ck}(\mathbf{r'}), $$where $\psi_{ik}(\mathbf{r})$ is a single-electron wave function of band $i$ at $\mathbf{r}$ and the $k$'th Brillouin zone (BZ) vector. The indices $v$ and $c$ refer to valence and conduction bands, respectively, and $\lambda$ is the index of the exciton. Thus, the diagonalization can be rewritten as the following eigenvalue problem:

$$ H^{BSE}|\Phi_\lambda\rangle = E^\lambda |\Phi_\lambda\rangle, $$In this expansion, the diagonal part is given by

$$ D_{vck, v'c'k'} = \big(\epsilon_{ck} - \epsilon_{vk}\big) \delta_{cc'} \delta_{vv'} \delta_{kk'}, $$the exchange interaction kernel by

$$ V_{vck, v'c'k'} = \int d^3r \int d^3 r' \frac{ u^*_{v'k'}(\mathbf{r'})u_{c'k'}(\mathbf{r'})u^*_{ck}(\mathbf{r})u_{vk}(\mathbf{r}) } {| \mathbf{r - r'} |}, $$and the screened interaction kernel by

$$ W_{vck, v'c'k'} = \int d^3r \int d^3 r' u^*_{v'k'}(\mathbf{r'})u_{vk}(\mathbf{r'}) W(\mathbf{r, r'})_{\mathbf{k-k'}} u^*_{ck}(\mathbf{r})u_{c'k'}(\mathbf{r}). $$Here, it is taken into account that for periodic materials, the wave functions are Bloch waves and can be written as

$$ \psi_{ik}(\mathbf{r}) = u_{ik}(\mathbf{r}) \cdot e^{i\mathbf{r} \mathbf{k}}, $$where $u_{ik}(\mathbf{r})$ is the periodic part of the wave functions.

In total, this results in a matrix with $(N_v N_c N_k)^2$ elements. Solving the **BSE** **directly** means calculating the full matrix and diagonalizing it. This scales with $\mathcal{O}\big((N_v N_c N_k)^3\big)$. For many problems, a converged solution of the **BSE** cannot be calculated with reasonable resources.

The **fastBSE** algorithm avoids calculating the full matrix by employing **interpolative separable density fitting** (**ISDF**) to approximate the wavefunction products as $u_{ik}(\mathbf{r})u^*_{jk}(\mathbf{r})$ in the formulas for the integrals:

When discretized on a real-space grid, $\{\mathbf{r}_i\}_{i=1}^{N_r}$, $u_{ik}^*(\mathbf{r})u_{jk'}(\mathbf{r})$ can be seen as a matrix $Z \in \mathbb{C}^{N_r \times N_iN_jN_k^2}$, where $N_i$ and $N_j$ represent the number of bands of $u_{ik}$ and $u_{jk'}$, respectively, to be combined. Therefore, for a given $\mathbf{r}$, $u_{ik}(\mathbf{r})^*u_{jk}(\mathbf{r})$ can be viewed as a row vector of size $N_iN_jN_k^2$. The ISDF approximates all such rows as a linear combination of matrix rows with respect to a selected set of rows or interpolation points, and then it interpolates on a sub-grid $\{\mathbf{r}_\mu\}_{\mu=1}^{N_\mu} \subseteq \{\mathbf{r}_i\}_{i=1}^{N_r}$ with typically $N_\mu \ll N_r$. Similarly, we can represent the interpolation coefficients $\zeta_\mu(\mathbf{r})$ in a matrix $\Theta \in \mathbb{C}^{N_r \times N_\mu}$ and the interpolation rows $u_{ik}(\mathbf{r})u_{jk'}(\mathbf{r})$ in a matrix $C \in \mathbb{C}^{N_\mu \times N_iN_jN_k^2}$. Using linear algebra notation, we can express the approximation as:

$$ Z \approx \Theta C. $$In terms of the interpolation coefficients $\Theta$, this equation becomes an over-determined linear system and can be solved using the least squares approximation:

$$ \Theta = ZC^* (CC^*)^{-1}. $$Due to the tensor structure of $Z$ and $C$, the operations $ZC^*$ and $CC^*$ can be computed efficiently. The overall cost of evaluating $\Theta$ is bounded by $\mathcal{O}(N_g N N_kN_\mu + N_\mu^3 + N_gN_\mu^2)$, where $N=N_i + N_j$.

The interpolation points are computed using **centroidal Voronoi tessellation** (**CVT**). This method divides the unit cell into clusters with the same weight based on the electron density and can be calculated very efficiently in $\mathcal{O}(N_rN_\mu)$.

Since we have three different wavefunction pairings, we need to perform ISDF calculations three times:

$$ u^*_{ck}(\mathbf{r})u_{vk}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu^V} \zeta^V_\mu(\mathbf{r}) u^*_{ck}(\mathbf{r}^V_\mu)u_{vk}(\mathbf{r}^V_\mu). $$$$ u^*_{vk}(\mathbf{r})u_{vk'}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu^{W_v}} \zeta^{W_v}_\mu(\mathbf{r}) u^*_{vk}(\mathbf{r}^{W_c}_\mu)u_{vk'}(\mathbf{r}^{W_c}_\mu), $$$$ u^*_{ck}(\mathbf{r})u_{ck'}(\mathbf{r}) \approx \sum_{\mu=1}^{N_\mu^{W_c}} \zeta^{W_c}_\mu(\mathbf{r}) u^*_{ck}(\mathbf{r}^{W_c}_\mu)u_{ck'}(\mathbf{r}^{W_c}_\mu). $$Replacing the wavefunction pairs in the formulas with these approximations, we obtain for $V$:

$$ V_{vc\mathbf{k}, v'c'\mathbf{k}'} \approx \frac{1}{N_k^2} \sum_{\mu=1}^{N_\mu^V} \sum_{\nu=1}^{N_\mu^V} u^*_{c\mathbf{k}} (\mathbf{r}_\mu) u_{v\mathbf{k}} (\mathbf{r}_\mu) \tilde V_{\mu \nu} u^*_{c'\mathbf{k}} (\mathbf{r}_\nu) u_{v'\mathbf{k}} (\mathbf{r}_\nu), $$with

$$ \tilde V_{\mu \nu} = \int_{\Omega^l\times \Omega^l}drdr' \zeta_\mu^{*V}(\mathbf{r}) V(\mathbf{r},\mathbf{r'}) \zeta_\nu^{V}(\mathbf{r'}), $$and for $W$:

$$ W_{vc\mathbf{k}, v'c'\mathbf{k}'} \approx \frac{1}{N_k^2} \sum_{\mu=1}^{N_{\mu}^{W_c}} \sum_{\nu=1}^{N_{\mu}^{W_v}} u^*_{c\mathbf{k}} (\mathbf{r}_\mu) u_{c'\mathbf{k'}} (\mathbf{r}_\mu) \tilde W_{\mu\nu, \mathbf{k-k'}} u^*_{v'\mathbf{k'}} (\mathbf{r}_\nu) u_{v\mathbf{k}} (\mathbf{r}_\nu), $$with

$$ \tilde W_{\mu\nu, \mathbf{k-k'}} = \int_{\Omega^l\times \Omega^l}drdr' \zeta_\mu^{*W_c}(\mathbf{r}) W_{\mathbf{k}-\mathbf{k'}}(\mathbf{r},\mathbf{r'}) \zeta_\nu^{W_v}(\mathbf{r'}). $$The first thing to note is that this drastically reduces the number of integrals to solve from $\left( N_k N_v N_c \right)^2$ to $\left( N_\mu^V \right)^2$ and $N_{\mu}^{W_v}N_{\mu}^{W_c}$ for $V$ and $W$, respectively, given that we can choose:

$$ N_\mu^V \ll N_k N_v N_c \\ N_{\mu}^{W_v} \ll N_k^2N_v^2 \\ N_{\mu}^{W_c} \ll N_k^2N_c^2, $$which is always true.

Secondly note that this approach does not reduce the number of matrix elements. To utilize the deconstructed matrix, we employ the Lanczos algorithm for diagonalization. The Lanczos algorithm constructs an approximation to the lowest eigenpairs of a matrix through iterative multiplication by a vector. Consequently, the computational complexity of diagonalization is scaled down to that of matrix-vector multiplication. As our focus is solely on the lowest eigenvalues, we do not lose any relevant information by not computing the entire solution. In the decomposed form, as illustrated above, we can efficiently apply the interaction kernels to a vector $X \in \mathbb{C}^{N_vN_cN_k}$:

For the matrix $V$, we derive:

$$ [V \cdot X]_{vck} = \frac{1} {N_k} \sum_{k'} \sum_{v'} \sum_{c'} \sum_{\mu=1}^{N_\mu^V} \sum_{\nu=1}^{N_\mu^V} u_{ck}^*(\mathbf{r}_\mu) u_{vk}(\mathbf{r}_\mu) \tilde{V}_{\mu\nu} u^*_{v'k'}(\mathbf{r}_\nu) u_{c'k'}(\mathbf{r}_\nu) \cdot X_{k' v' c'}. $$Rearranging the summations reveals an efficient evaluation of the matrix-vector product:

$$ [V \cdot X]_{vck} = \frac{1} {N_k} \sum_{\mu=1}^{N_\mu^V} u^*_{ck}(\mathbf{r}_\mu) u_{vk}(\mathbf{r}_\mu) \left\{ \sum_{\nu=1}^{N_\mu^V} \tilde V_{\mu\nu} \left[ \sum_{k'} \left( \sum_{c'} u_{c'k'}(\mathbf{r}_\nu) \left[ \sum_{v'} u^*_{v'k'}(\mathbf{r}_\nu) \cdot X_{k' v' c'} \right] \right) \right] \right\}, $$which scales as $\mathcal{O}(N_\mu^V N_v N_c N_k + N_\mu N_c N_k)$, thus linear with respect to the number of $\mathbf{k}$-points and quadratic with the number of electrons ($\mathcal{O}(N_e) \approx \mathcal{O}(N_v) \approx \mathcal{O}(N_c)$).

For the matrix $W$, the expression becomes:

$$ [W_A\cdot X]_{vck} = \frac{1}{N_k} \sum_{k'} \sum_{v'} \sum_{c'} \sum_{\nu=1}^{N_\mu^{W_v}} \sum_{\mu=1}^{N_\mu^{W_c}} u^*_{v'k'}(\mathbf r_\nu^{W_v}) u_{vk}(\mathbf r_\nu^{W_v}) \tilde{W}_{\mu\nu, \mathbf{k-k'}} u^*_{ck}(\mathbf r_\mu^{W_c}) u_{c'k'}(\mathbf r_\mu^{W_c}) \cdot X_{v'c'k'}. $$Once again, reordering the summations yields an efficient matrix-vector product evaluation:

$$ [W_A\cdot X]_{vck} = \frac{1} {N_k} \sum_{\nu=1}^{N_\mu^{W_v}} u_{vk}(\mathbf r_\nu^{W_v}) \left\{ \sum_{\mu=1}^{N_\mu^{W_c}} u^*_{ck}(\mathbf r_\mu^{W_c}) \left[ \sum_{\mathbf{k'}} \tilde{W}_{\mu\nu, \mathbf{k-k'}} \left( \sum_{c'} u_{c'k'}(\mathbf r_\mu^{W_c}) \left[ \sum_{v'} \cdot u^*_{\mathbf{k'}j_o}(\mathbf{\hat{r}}_\nu^W) X(\mathbf{k'} j_o j_u) \right] \right) \right] \right\}, $$which scales as $\mathcal{O}(N_\mu^{W_v}N_v N_c N_k + N_\mu^{W_v}N_\mu^{W_c}(N_c N_k + N_k \log N_k))$. The term $\mathcal{O}(N_k \log N_k)$ arises from the sum over $k'$:

$$ \sum_{k'} W_{k-k'}A_{k'}, $$which is a discrete convolution that can be efficiently calculated using **fast Fourier transformations**.

As an example, you will calculate the absorption spectrum for diamond on a $4 \times 4 \times 4$ $\mathbf k$-grid. Please note that the parameters are not converged. A converged calculation would require too many resources for a tutorial calculation.

As for all other tutorials, you can choose between executing all steps manually from your command line, or to simply execute the python cells in this Jupyter notebook.

In [1]:

```
%%bash
# Create working directory
mkdir -p run_fastBSE_diamond
```

First, calculate the common properties for both the **direct** and the **fastBSE** algorithm. These are the **electronic eigen energies** $\epsilon_{ik}$, the corresponding **momentum matrix elements** and the **screened coulomb interaction**. In this example we use the electronic properties from a **DFT** groundstate calculation. As for the **direct** algorithm, you can also use a **GW** calculations as starting point for the **fastBSE** algorithm. The following **input.xml** tells exciting to calculate what we need:

```
<input>
<title>Diamond: Groundstate and Screening</title>
<structure speciespath="$EXCITINGROOT/species">
<crystal scale="6.74632">
<basevect> 0.5 0.5 0.0 </basevect>
<basevect> 0.5 0.0 0.5 </basevect>
<basevect> 0.0 0.5 0.5 </basevect>
</crystal>
<species speciesfile="C.xml" rmt="1.25">
<atom coord="0.00 0.00 0.00" />
<atom coord="0.25 0.25 0.25" />
</species>
</structure>
<groundstate
do="fromscratch"
ngridk="13 13 13"
xctype="GGA_PBE_SOL"
rgkmax="8.0"
gmaxvr="18"
nempty="20">
</groundstate>
<xs
xstype="BSE"
ngridk="4 4 4"
ngridq="4 4 4"
vkloff="0.05 0.15 0.25"
nempty="50"
gqmax="3.0"
broad="0.005"
tappinfo="true"
tevout="true">
<qpointset>
<qpoint>0.0 0.0 0.0</qpoint>
</qpointset>
<energywindow
intv="0.0 1.5"
points="1500" />
<screening
screentype="full"
nempty="100"/>
<BSE
bsetype="singlet"
nstlbse="1 4 1 20"
solver="direct"/>
<plan>
<doonly task="scrgeneigvec"/>
<doonly task="scrwritepmat"/>
<doonly task="screen"/>
<doonly task="xsgeneigvec"/>
<doonly task="writepmatxs"/>
</plan>
</xs>
</input>
```

In the `xs`

element we define the general proerties of the **BSE** calculation. We can leave most of this as it if in the following, all important edits are explained.

Note that in the `plan`

element we define a workflow that only calculates what we need for solving the **BSE** but not solving it.

`solver="direct"`

inside the `xs`

element tells exciting to use the **direct** solver workflow. This attribute is only important if you have not defined a costum workflow as we did in the `plan`

element.

In order to run ** exciting** from the terminal, you need to execute the following commands.

In [ ]:

```
%%bash
cd run_fastBSE_diamond
# replace the string $EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
python3 -m excitingscripts.execute.single
cd ..
```

Now calculate the absorption spectrum by solving the **BSE** **directly**. Therefore edit the **input.xml** as follows:

```
<input>
<title>Diamond: Direct BSE</title>
<structure speciespath="$EXCITINGROOT/species">
<crystal scale="6.74632">
<basevect> 0.5 0.5 0.0 </basevect>
<basevect> 0.5 0.0 0.5 </basevect>
<basevect> 0.0 0.5 0.5 </basevect>
</crystal>
<species speciesfile="C.xml" rmt="1.25">
<atom coord="0.00 0.00 0.00" />
<atom coord="0.25 0.25 0.25" />
</species>
</structure>
<groundstate
do="skip"
ngridk="13 13 13"
xctype="GGA_PBE_SOL"
rgkmax="8.0"
gmaxvr="18"
nempty="20">
</groundstate>
<xs
xstype="BSE"
ngridk="4 4 4"
ngridq="4 4 4"
vkloff="0.05 0.15 0.25"
nempty="50"
gqmax="3.0"
broad="0.005"
tappinfo="true"
tevout="true">
<qpointset>
<qpoint>0.0 0.0 0.0</qpoint>
</qpointset>
<energywindow
intv="0.0 1.5"
points="1500" />
<screening
screentype="full"
nempty="100"/>
<BSE
bsetype="singlet"
nstlbse="1 4 1 20"
solver="direct"/>
<plan>
<doonly task="scrcoulint"/>
<doonly task="exccoulint"/>
<doonly task="bse"/>
</plan>
</xs>
</input>
```

`do="skip"`

inside the `groundstate`

element. We do not need to calculate the groundstate again.

The `plan`

element defines now a workflow to setup the interaction kernels and solve the **BSE**.

In order to run ** exciting** from the terminal, you need to execute the following commands.

In [ ]:

```
%%bash
cd run_fastBSE_diamond
# replace the string $EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
python3 -m excitingscripts.execute.single
cd ..
```

Now we will calculate the same spectrum but use the **fastBSE** algorithm. Therefore edit the **input.xml** as follows:

```
<input>
<title>Diamond: fastBSE</title>
<structure speciespath="$EXCITINGROOT/species">
<crystal scale="6.74632">
<basevect> 0.5 0.5 0.0 </basevect>
<basevect> 0.5 0.0 0.5 </basevect>
<basevect> 0.0 0.5 0.5 </basevect>
</crystal>
<species speciesfile="C.xml" rmt="1.25">
<atom coord="0.00 0.00 0.00" />
<atom coord="0.25 0.25 0.25" />
</species>
</structure>
<groundstate
do="skip"
ngridk="13 13 13"
xctype="GGA_PBE_SOL"
rgkmax="8.0"
gmaxvr="18"
nempty="20">
</groundstate>
<xs
xstype="BSE"
ngridk="4 4 4"
ngridq="4 4 4"
vkloff="0.05 0.15 0.25"
nempty="50"
gqmax="3.0"
broad="0.005"
tappinfo="true"
tevout="true">
<qpointset>
<qpoint>0.0 0.0 0.0</qpoint>
</qpointset>
<energywindow
intv="0.0 2.0"
points="1500" />
<screening
screentype="full"
nempty="100"/>
<BSE
bsetype="singlet"
nstlbse="1 4 1 20"
solver="fastBSE"/>
<fastBSE
ngridr="20 20 20"
nisdf="80 200 200"
nlanczos="200"/>
<plan>
<doonly task="write_screened_coulomb"/>
<doonly task="fastBSE_groundstate_properties"/>
<doonly task="fastBSE_isdf_cvt"/>
<doonly task="fastBSE_main"/>
<doonly task="fastBSE_human_readable_output"/>
</plan>
</xs>
</input>
```

The `plan`

element defines a workflow with the remaining tasks to calculate the absorption spectrum with the **fastBSE** algorithm. Without it, exciting will run a workflow, that automatically calculates all necessary properties.

`solver="fastBSE"`

inside the `xs`

element tells exciting to use the **fastBSE** solver. Since we use here a custom plan it has no effect.

The `fastBSE`

element in the `xs`

element defines the following parameters for fast BSE.

`ngridr="20 20 20"`

inside the`xs`

defines the real space grid on which $u_{ik}(\mathbf r)$ is calculated.`nisdf="80 200 200"`

inside the`xs`

defines the number of interpolation points for the**ISDF**of the $u_{vk}(\mathbf r)u_{ck}(\mathbf r)$ pairs in $V$ and $u_{vk}(\mathbf r)u_{v'k'}(\mathbf r)$ and $u_{ck}(\mathbf r)u_{c'k'}(\mathbf r)$ in $W$ respectively.`nlanczos="200"`

inside the`xs`

defines the maximum number of Lanczos iterations to be calculated.

As you can see, the properties of the BSE calculation do not need to be edited or copied. You only need to set `solver="fastBSE"`

and define the properties for the **fastBSE** algorithm.

In order to run ** exciting** from the terminal, you need to execute the following commands.

In [7]:

```
%%bash
cd run_fastBSE_diamond
# replace the string $EXCITINGROOT
python3 -m excitingscripts.setup.excitingroot
# execute exciting
python3 -m excitingscripts.execute.single
cd ..
```

By comparing the runtime, you already see that the **fastBSE** algorithm is significantly faster.

To convince yourself that the absorption spectra calculated with both algorithms are equivalent plot the absorption spectrum by running the following commands. The image is saved as **PLOT.png**.

In [8]:

```
%%bash
cd run_fastBSE_diamond
cp EPSILON/EPSILON_BSE-singlet-TDA-BAR_SCR-full_OC11.OUT direct
cp fastBSE_absorption_spectrum.out fastBSE
python3 -m excitingscripts.plot.files -t 'Absorption Spectrum: Diamond' -f direct fastBSE -cx 1 -cy 3 -lx '$\omega$ [eV]' -ly 'Im $\epsilon_M$' -x 0 40
cd ..
```