Probability 3 -- Inequality and Convergence
This paper mainly comes from Probability and Statistical Inference, Second Edition, Robert Bartoszynski, Magdalena Niewiadomska-Bugaj
Inequalities
Here we list some inequalities without proof.
If \(V\) is a random variable such that \(E(V) = 0, Var(V) = 1.\) Then \[ \forall t > 0, P(|V| \ge t) \le \frac{1}{t^2}. \]
(Chebyshev Inequality) If \(X\) is a random variable such that \(E(X) = \mu, Var(X) = \sigma^2 < \infty,\) then \[ \forall \varepsilon > 0, P(|X-\mu| \ge \varepsilon) \le \frac{\sigma^2}{\varepsilon^2}. \]
(Markov Inequality) If \(X\) is a random variable such that \(X \ge 0, E(X) = \mu < \infty,\) then \[ \forall t > 0, P(X > t) \le \frac{\mu}{t}. \]
(Cantelli's inequality) If \(X\) is a random variable such that \(E(X) = \mu, Var(X) = \sigma^2 \in (0, \infty),\) then \[ \forall t > 0, P(X - \mu \ge t) \le \frac{\sigma^2}{\sigma^2 + t^2}. \]
(Kolmogorov Inequality) If \(\{X_i\}_{i=1}^{\infty}\) are i.i.d. random variables with \(\forall i, E(X_i) = 0, Var(X_i) = \sigma_i^2 < \infty,\) then \(S_j = \sum_{i=1}^j X_i\) follows \[ P(\underset{1\le j \le n}{max}|S_j| \ge t) \le \frac{Var(S_n)}{t^2}. \]
(Paley-Zygmund Inequality) If \(X\) is a random variable such that \(X \ge 0, E(X) = \mu,\) then \[ \forall t \in [0, 1], P(X > t\mu) \ge (1-t)^2\frac{\mu^2}{E(X^2)}. \]
(Azuma's Inequality) Let \(Z_n, n \ge 1\) be a martingale with \(E(Z_n) = \mu.\) Suppose that there exist non-negative constants \(\alpha_i, \beta_i, i \ge 1,\) \[-\alpha_i \le Z_i - Z_{i-1} \ge \beta_i.\] Then for any \(n \ge 0, a > 0,\) \[\begin{aligned} (i) & P(Z_n - \mu \ge a) \le \exp\{-2a^2/\sum_{i=1}^n(\alpha_i+\beta_i)^2\} \\ (ii) & P(Z_n - \mu \le -a) \le \exp\{-2a^2/\sum_{i=1}^n(\alpha_i+\beta_i)^2\} \end{aligned}\]
Convergence
Here we consider probability space \(\{\Omega, \mathcal{F}, P\}.\) The probability \(P\) is a measure defined on measurable space \(\{\Omega, \mathcal{F}\}.\) Therefore, we directly borrow pointwise and uniform convergence from real analysis.
Pointwise Convergence(Convergence Everywhere): Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) pointwisely iff \[ \forall \omega \in \Omega, \forall \varepsilon > 0, \exists n_0 \in \mathbb{Z}^+, \forall n \ge n_0, |X_n(\omega) - X(\omega)| < \varepsilon. \]We denote it as \(X_n \overset{e}{\longrightarrow} X.\) Here \(e\) means everywhere.
Uniform Convergence: Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) uniformly iff \[ \forall \varepsilon > 0, \exists n_0 \in \mathbb{Z}^+, \forall n \ge n_0, |X_n(\omega) - X(\omega)| < \varepsilon, \forall w \in \Omega. \] We denote it as \(X_n \overset{u}{\longrightarrow} X.\) Here \(u\) means uniform.
However, the pointwise and uniform convergence are hard to reach. What is more likely to happen is to have an almost version. This almost allows exterior to exist with measure 0.
Almost Pointwise Convergence(Convergence Almost Everywhere, Convergence Almost Sure, Convergence with Probability One, Strong Convergence):
Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) almost surely iff \[ P(\lim_{n \rightarrow \infty} X_n = X) = 1. \] iff \[ P(\lim_{n \rightarrow \infty} X_n \neq X) = 0. \]We denote it as \(X_n \overset{a.s.}{\longrightarrow} X.\) Here \(a.e.\) means almost surely.
Almost Uniform Convergence:
Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) almost uniformly iff \[ \forall \varepsilon > 0, \exists E \subset \mathcal{F}, P(E) < \varepsilon, \forall \delta > 0, \exists n_0 \in \mathbb{Z}^+, \forall n \ge n_0, |X_n(\omega) - X(\omega)| < \delta, \forall \omega \notin E. \] iff \[ \forall \varepsilon > 0, \exists E \subset \mathcal{F}, P(E) < \varepsilon, \lim_{n \rightarrow \infty} \sup_{\omega \notin E} |X_n(\omega) - X(\omega)| = 0. \] We denote it as \(X_n \overset{a.u.}{\longrightarrow} X.\) Here \(a.u.\) means almost uniformly.
Next, there come three convergence depending on structure of probability.
Convergence in Probability(Convergence in Measure):
Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) in probability iff \[ \lim_{n \rightarrow \infty}P(|X_n(\omega) - X(\omega)| > \epsilon) = 0. \] iff \[ \lim_{n \rightarrow \infty}P(|X_n(\omega) - X(\omega)| \le \epsilon) = 1. \]We denote it as \(X_n \overset{P}{\longrightarrow} X.\) Here \(P\) means in probability.
Convergence in Mean:
Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) in r-th mean iff \[ \lim_{n \rightarrow \infty} E(|X_n - X|^r) = 0. \] We denote it as \(X_n \overset{r-m}{\longrightarrow} X.\) Here \(r-m\) means r-th mean.
Convergence in Distribution(Weak Convergence):
Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}\) with cdf \(\{F_n\}_{n=1}^{\infty}\). We call \(X_n\) converges to \(X\) in distribution iff \[ \lim_{n \rightarrow \infty} F_n(x) = F(x) \]for every \(x\) where \(F(x)\) is continuous. We denote it as \(X_n \overset{d}{\longrightarrow} X.\) Here \(d\) means in distribution.
cite:
Probability and Statistical Inference, Second Edition, Robert Bartoszynski, Magdalena Niewiadomska-bugaj
Stochastic Process, Second Edition, Sheldon M. Ross