0%

Probability 3 -- Inequality and Convergence

Probability 3 -- Inequality and Convergence

This paper mainly comes from Probability and Statistical Inference, Second Edition, Robert Bartoszynski, Magdalena Niewiadomska-Bugaj

Inequalities

Here we list some inequalities without proof.

  1. If \(V\) is a random variable such that \(E(V) = 0, Var(V) = 1.\) Then \[ \forall t > 0, P(|V| \ge t) \le \frac{1}{t^2}. \]

  2. (Chebyshev Inequality) If \(X\) is a random variable such that \(E(X) = \mu, Var(X) = \sigma^2 < \infty,\) then \[ \forall \varepsilon > 0, P(|X-\mu| \ge \varepsilon) \le \frac{\sigma^2}{\varepsilon^2}. \]

  3. (Markov Inequality) If \(X\) is a random variable such that \(X \ge 0, E(X) = \mu < \infty,\) then \[ \forall t > 0, P(X > t) \le \frac{\mu}{t}. \]

  4. (Cantelli's inequality) If \(X\) is a random variable such that \(E(X) = \mu, Var(X) = \sigma^2 \in (0, \infty),\) then \[ \forall t > 0, P(X - \mu \ge t) \le \frac{\sigma^2}{\sigma^2 + t^2}. \]

  5. (Kolmogorov Inequality) If \(\{X_i\}_{i=1}^{\infty}\) are i.i.d. random variables with \(\forall i, E(X_i) = 0, Var(X_i) = \sigma_i^2 < \infty,\) then \(S_j = \sum_{i=1}^j X_i\) follows \[ P(\underset{1\le j \le n}{max}|S_j| \ge t) \le \frac{Var(S_n)}{t^2}. \]

  6. (Paley-Zygmund Inequality) If \(X\) is a random variable such that \(X \ge 0, E(X) = \mu,\) then \[ \forall t \in [0, 1], P(X > t\mu) \ge (1-t)^2\frac{\mu^2}{E(X^2)}. \]

  7. (Azuma's Inequality) Let \(Z_n, n \ge 1\) be a martingale with \(E(Z_n) = \mu.\) Suppose that there exist non-negative constants \(\alpha_i, \beta_i, i \ge 1,\) \[-\alpha_i \le Z_i - Z_{i-1} \ge \beta_i.\] Then for any \(n \ge 0, a > 0,\) \[\begin{aligned} (i) & P(Z_n - \mu \ge a) \le \exp\{-2a^2/\sum_{i=1}^n(\alpha_i+\beta_i)^2\} \\ (ii) & P(Z_n - \mu \le -a) \le \exp\{-2a^2/\sum_{i=1}^n(\alpha_i+\beta_i)^2\} \end{aligned}\]

Convergence

Here we consider probability space \(\{\Omega, \mathcal{F}, P\}.\) The probability \(P\) is a measure defined on measurable space \(\{\Omega, \mathcal{F}\}.\) Therefore, we directly borrow pointwise and uniform convergence from real analysis.

Pointwise Convergence(Convergence Everywhere): Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) pointwisely iff \[ \forall \omega \in \Omega, \forall \varepsilon > 0, \exists n_0 \in \mathbb{Z}^+, \forall n \ge n_0, |X_n(\omega) - X(\omega)| < \varepsilon. \]We denote it as \(X_n \overset{e}{\longrightarrow} X.\) Here \(e\) means everywhere.

Uniform Convergence: Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) uniformly iff \[ \forall \varepsilon > 0, \exists n_0 \in \mathbb{Z}^+, \forall n \ge n_0, |X_n(\omega) - X(\omega)| < \varepsilon, \forall w \in \Omega. \] We denote it as \(X_n \overset{u}{\longrightarrow} X.\) Here \(u\) means uniform.

However, the pointwise and uniform convergence are hard to reach. What is more likely to happen is to have an almost version. This almost allows exterior to exist with measure 0.

Almost Pointwise Convergence(Convergence Almost Everywhere, Convergence Almost Sure, Convergence with Probability One, Strong Convergence):

Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) almost surely iff \[ P(\lim_{n \rightarrow \infty} X_n = X) = 1. \] iff \[ P(\lim_{n \rightarrow \infty} X_n \neq X) = 0. \]We denote it as \(X_n \overset{a.s.}{\longrightarrow} X.\) Here \(a.e.\) means almost surely.

Almost Uniform Convergence:

Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) almost uniformly iff \[ \forall \varepsilon > 0, \exists E \subset \mathcal{F}, P(E) < \varepsilon, \forall \delta > 0, \exists n_0 \in \mathbb{Z}^+, \forall n \ge n_0, |X_n(\omega) - X(\omega)| < \delta, \forall \omega \notin E. \] iff \[ \forall \varepsilon > 0, \exists E \subset \mathcal{F}, P(E) < \varepsilon, \lim_{n \rightarrow \infty} \sup_{\omega \notin E} |X_n(\omega) - X(\omega)| = 0. \] We denote it as \(X_n \overset{a.u.}{\longrightarrow} X.\) Here \(a.u.\) means almost uniformly.

Next, there come three convergence depending on structure of probability.

Convergence in Probability(Convergence in Measure):

Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) in probability iff \[ \lim_{n \rightarrow \infty}P(|X_n(\omega) - X(\omega)| > \epsilon) = 0. \] iff \[ \lim_{n \rightarrow \infty}P(|X_n(\omega) - X(\omega)| \le \epsilon) = 1. \]We denote it as \(X_n \overset{P}{\longrightarrow} X.\) Here \(P\) means in probability.

Convergence in Mean:

Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}.\) We call \(X_n\) converges to \(X\) in r-th mean iff \[ \lim_{n \rightarrow \infty} E(|X_n - X|^r) = 0. \] We denote it as \(X_n \overset{r-m}{\longrightarrow} X.\) Here \(r-m\) means r-th mean.

Convergence in Distribution(Weak Convergence):

Let \(\{X_n\}_{n=1}^{\infty}\) be random variables defined on \(\{\Omega, \mathcal{F}, P\}\) with cdf \(\{F_n\}_{n=1}^{\infty}\). We call \(X_n\) converges to \(X\) in distribution iff \[ \lim_{n \rightarrow \infty} F_n(x) = F(x) \]for every \(x\) where \(F(x)\) is continuous. We denote it as \(X_n \overset{d}{\longrightarrow} X.\) Here \(d\) means in distribution.

cite:

Probability and Statistical Inference, Second Edition, Robert Bartoszynski, Magdalena Niewiadomska-bugaj

Stochastic Process, Second Edition, Sheldon M. Ross

概率论复习笔记(9)——几种收敛的关系

(概率论)随机变量的收敛模式

概率论和机器学习中的不等式(一)

概率论和机器学习中的不等式

Convergence of random variables

Convergence in measure