Highlights from "Limitations of variational quantum algorithms: a quantum optimal transport approach"
\[ \newcommand{\one}{\mathbb{1}} \newcommand{\Id}{\mathbb{I}} \newcommand\supp{\mathrm{supp}} \newcommand\tr{\mathrm{tr}} \]
1. Definitions
The Lipschitz constant for \(H\) a self-adjoint operator is defined by:
\begin{equation} \|H\|_L = 2 \max_{v\in V} \min_{H_{\setminus v}} \|H - H_{\setminus v}\otimes \Id_v\|_{\infty} \end{equation}The sandwiched Renyi divergence of order \(\alpha \in (1,+\infty)\) is defined for two quantum states \(\rho, \sigma\) with \(\supp \rho \subset \supp \sigma\) as
\begin{equation} D_{\alpha} (\rho \| \sigma) = \frac{1}{\alpha-1} \log \tr \left[\left(\sigma^{\frac{1-\alpha}{2\alpha}} \rho \sigma^{\frac{1-\alpha}{2\alpha}}\right )^\alpha\right] \end{equation}A state \(\sigma\) on \(V\) qudits satisfies a Gaussian concentration inequality of parameter \(c\) if there is a constant \(K\) such that for any \(a\) and any observable \(O\):
\begin{equation} \Pr_{\sigma} (|O - \langle O\rangle_\sigma \Id| > a |V|) \leq K \exp \left( - \frac{c a^2 |V|}{\| \sigma^{-1/2} O \sigma^{1/2}\|_L^2}\right) \end{equation}Above \(|A| = \sqrt {A^\dagger A}\), and \(\Pr_\sigma(|O-\langle O\rangle_\sigma \Id > a|V|)\) means that for \(E\) the positive part of \(|O-\langle O\rangle_{\sigma}\Id|-a|V|\Id\), we have \(\tr(E\sigma)\) bounded by the right hand side of the equation. Note that in the cases where \(\sigma\) and \(O\) commute \(\| \sigma^{-1/2} O \sigma^{1/2}\|_L^2\) reduces to \(\| O\|_L^2\).
2. Transferring inequalities
Let \(\rho, \sigma\) be two quantum states on \(\mathcal H_V\). Then for any POVM element \(0 \leq E \leq \Id\) and \(\alpha >1\),
\begin{equation} \tr[E\rho] \leq \exp\left( \frac{\alpha -1}{\alpha} (D_\alpha(\rho\|\sigma) + \log(\tr[E\sigma]))\right) \end{equation}Using the circularity of the trace we have:
\begin{align} \tr[E\rho] & = \tr[\sigma^{-\frac{1-\alpha}{2\alpha}} E \sigma^{-\frac{1-\alpha}{2\alpha}} \times \sigma^{\frac{1-\alpha}{2\alpha}} \rho \sigma^{\frac{1-\alpha}{2\alpha}}] \\ & \leq \left\{\tr \left[ \left( \sigma^{-\frac{1-\alpha}{2\alpha}} E \sigma^{-\frac{1-\alpha}{2\alpha}}\right)^\beta \right ]\right\}^{\frac{1}{\beta}} \times \left\{\tr \left[ \left( \sigma^{\frac{1-\alpha}{2\alpha}} \rho \sigma^{\frac{1-\alpha}{2\alpha}} \right)^\alpha \right ]\right\}^{\frac{1}{\alpha}} \end{align}where the last line is obtained by applying H\"older inequality, which holds for \(\frac{1}{\alpha} + \frac{1}{\beta} = 1\).
Using the Araki-Lieb-Thirring inequality followed by \(E \leq \Id\) we obtain:
\begin{align} \tr \left( \sigma^{-\frac{1-\alpha}{2\alpha}} E \sigma^{-\frac{1-\alpha}{2\alpha}}\right)^\beta & \leq \tr \left( \sigma^{-\frac{(1-\alpha)\beta}{2\alpha}} E^\beta \sigma^{-\frac{(1-\alpha)\beta}{2\alpha}}\right) \\ & \leq \tr \left( \sigma^{-\frac{(1-\alpha)\beta}{2\alpha}} E \sigma^{-\frac{(1-\alpha)\beta}{2\alpha}}\right). \end{align}As \(\beta = \frac{\alpha}{\alpha -1}\) we have
\begin{equation} \tr \left( \sigma^{-\frac{(1-\alpha)\beta}{2\alpha}} E \sigma^{-\frac{(1-\alpha)\beta}{2\alpha}}\right)=\tr(E\sigma). \end{equation}Additionally, note that
\begin{equation} \left \{ \tr \left[ \left( \sigma^{\frac{1-\alpha}{2\alpha}} \rho \sigma^{\frac{1-\alpha}{2\alpha}} \right)^\alpha \right ]\right\}^{\frac{1}{\alpha}} = \exp (\frac{\alpha -1}{\alpha} D_\alpha(\rho\|\sigma)). \end{equation}Combining the two gives:
\begin{align} \tr[E\rho] & \leq (tr[E\sigma]) \times \exp\left( \frac{\alpha -1}{\alpha} (D_\alpha(\rho\|\sigma))\right) \\ & \leq \exp\left( \frac{\alpha -1}{\alpha} (D_\alpha(\rho\|\sigma)) \log (\tr[E\sigma])\right). \end{align}Remark that the theorem bounds the probability of observing \(E\) on \(\rho\) based on the product of the probability of observing \(E\) on \(\sigma\) times a quantity that depends on \(D_\alpha(\rho\|\sigma)\), which can be seen as a measure of the distance between the two states \(\rho\) and \(\sigma\). This will become interesting whenever \(D_\alpha(\rho\|\sigma)\) becomes small.
If \(\sigma\) satisfies a Gaussian concentration inequality
\begin{equation} \Pr_{\sigma} (|O - \langle O\rangle_\sigma \Id| > a |V|) \leq K \exp \left( - \frac{c a^2 |V|}{\| \sigma^{-1/2} O \sigma^{1/2}\|_L^2}\right) \end{equation}for some constant \(c, K > 0\), then for any \(\alpha > 1\):
\begin{equation} \Pr_{\rho} (|O - \langle O\rangle_\sigma \Id| > a |V|) \leq \exp \left(\frac{\alpha -1}{\alpha}\left(D_\alpha(\rho\|\sigma) - \frac{c a^2 |V|}{\| \sigma^{-1/2} O \sigma^{1/2}\|_L^2} + \log(K)\right)\right) \end{equation}For an input state \(\rho\), a noisy circuit evolution \(\mathcal{N}(\rho)\) and a state \(\sigma\) satisfying a Gaussian concentration inequality, whenever there is a value of \(\alpha\) and \(a\) such that
\begin{equation} \frac{D_{\alpha}(\mathcal N(\rho) \| \sigma)}{|V|} < \frac{c a^2}{\| \sigma^{-1/2} O \sigma^{1/2}\|_L^2} - \frac{\log(K)}{|V|}, \end{equation}then the probability of observing an outcome of \(O\) outside the interval \(\langle O \rangle_{\sigma} \pm a|V|\) decreases exponentially with \(|V|\).
3. Bounding \(D_{\alpha}(\mathcal N(\rho) \| \sigma)\)
Such bounds can be obtained for noisy circuits where each layer \(i\) is followed by a noisy channel \(\mathcal N\) with a fixed point \(\sigma\) satisfying a strong data processing inequality:
\begin{equation} D_\alpha(\mathcal N(\rho) \| \sigma)\leq (1-q) D_\alpha (\rho \|\sigma), \ \forall \rho. \end{equation}In such cases we obtain
Let \(\mathcal N\) be a quantum channel with a unique fixed point \(\sigma\) and satisfying a strong data processing inequality for some \(\alpha > 1\), then for any other channels \(\{\Phi_i\}_{i\leq l}\) we have:
\begin{equation} D_\alpha\left( \mathcal P (\rho) \| \sigma \right) \leq (1-q)^l D_\alpha(\rho\|\sigma) + \sum_{i\leq l} (1-q)^{l-i} D_\infty (\Phi_i(\sigma) \| \sigma) \end{equation}where \(D_\infty(\rho \| \sigma) = \log \| \sigma^{-\frac{1}{2}} \rho \sigma^{-\frac{1}{2}}\|_\infty\), and \(\mathcal P = \prod_{i \leq l} \Phi_i \circ \mathcal N\).
We proceed by induction on \(l\). For \(l=1\), we use the data processed triangle inequality:
\begin{equation} D_\alpha(\Phi_1 \circ \mathcal N (\rho) \|\sigma) \leq D_\alpha(\mathcal N (\rho) \| \sigma) + D_\infty (\Phi_1 (\sigma) \| \sigma) \leq (1-q) D_\alpha(\rho \| \sigma) + D_\infty (\Phi_1 (\sigma) \| \sigma). \end{equation}The induction is performed in the same way, assuming the property holds for some \(l\) and then applying the data processed triangle inequality.
Note that for unital channels \(\mathcal N(\sigma) = \sigma\) as the fixed point of the channel is \(\Id\). This implies that \(D_\alpha\left( \mathcal P (\rho) \| \sigma \right)\) will always converge to 0.
4. Additional notes
4.1. Renyi entropies
In the above definition of Renyi divergence, note that when \(\alpha \rightarrow 1\) we recover the usual relative entropy:
\begin{equation} D_1 (\rho \| \sigma) = \tr(\rho (\log \rho - \log \sigma)) \end{equation}This is not a surprise as these sandwiched Renyi divergences have been introduced as a generalization of the usual relative entropy. Renyi entropies \(\frac{1}{1-\alpha} \log \frac{\tr \rho^\alpha}{\tr \rho}\) are also generalizations of von Neumann entropy in that they preserve important properties:
- Continuity
- Unitary invariance
- Normalization
- Additivity.
4.2. Relative entropy and Pinsker inequality
The Pinsker inequality relates the relative entropy to the trace distance. As such it allows to bound the trace distance using information theoretic arguments:
\begin{equation} D_1(\rho\|\sigma) \geq \frac{1}{2\log 2} \| \rho -\sigma \|_{\tr}^2 \end{equation}4.3. Hölder inequality
For \(A\), \(B\) operators on \(\mathcal H\)
\begin{equation} \|AB\|_1 \leq \|A\|_p \|B\|_q, \quad \frac{1}{p} + \frac{1}{q} = 1 \end{equation}which translates for matrices into
\begin{equation} \tr(|AB|) \leq [\tr(|A|^{p})]^{1/p} \times [\tr(|B|^{q})]^{1/q}, \quad \frac{1}{p} + \frac{1}{q} = 1. \end{equation}4.4. Araki-Lieb-Thirring inequality
For \(A,B \geq 0\), \(q\geq 0\) and \(0\leq r \leq 1\) then
\begin{equation} \tr[(A^rB^rA^r)^q] \leq\tr[(ABA)^{rq}]. \end{equation}When \(r \geq 1\) the inequality is reversed.