Hypothesis Testing

Table of Contents

Concepts

Null and Alternative Hypothesis

In short, the null hypothesis is the “baseline" that we would build statistics on, but alternative hypothesis is the claim that we are testing to see if there is enough evidence from the data to support.

Test Statistic and Rejection Region

To test the claims, we need to build a test statistic, which is a function of the observed data, and it is also a random variable. Say, the statistics is $T=T(X_1, X_2,\dots,X_n)$.

The rejection region is a set of values chosen for $T$ so that, when $T$ falls into the region, we can conclude that the null is rejected, and there is strong evidence to support the alternative hypothesis.

The rejection region is chosen so that we can minimize the type I error

  • Type I error: when null is true, but we reject
  • Type II error: when alternative is true, we fail to reject the null

Example: form a rejection region

The idea of choosing the rejection region is to constrain the type I error in a probability level $\alpha$, under the null hypothesis. The lower the $\alpha$, the rule is more strict, and the rejection region should be narrower and harder for test statistic to lie in.

The following example shows how to form the rejection region.

p-value

p-value is the probability of obtaining test results at least as extreme as the result actually observed under the assumption that the null hypothesis is correct.

p-value is a function (a random variable) of the test statistic (or observed data).

Wald Test

One simple design of the test statistic is the Wald Test. Especially, when the parameter is estimated by MLE.

$$ T=\frac{\hat{\theta}-\theta_0}{\operatorname{SE}(\hat{\theta})} $$

If $\hat{\theta}$ is an MLE estimate, then by the Asymptotic Normality of the MLE, it is not hard to derive its standard error using the fisher information.

$$ \begin{aligned} \hat{\theta}&\approx N\left( \theta _0,\frac{1}{nI\left( \theta _0 \right)} \right)\\ T&=\frac{\hat{\theta}-\theta_0}{\sqrt{1 / n I\left(\theta_0\right)}} \end{aligned} $$

Example: Bernoulli distribution

Multiple Testing Problem

In here, we show an example how problematic it is to do the multiple testing. There are high probability of false positives when we do multiple individual tests at the same time. So we need some “corrections” to control the false discovery rate (FDR). There are several procedures to do so, like the two famous ones list here https://en.wikipedia.org/wiki/False_discovery_rate.

For example, the Benjamini-Hochberg procedure (BH step-up procedure) controls the FDR at level $\alpha$ It works as follows, say we want to control the FDR rate under a level $\alpha$. We first sort the $m$ p-values from smallest to the largest, denote them as $p_{(1)}$ to $p_{(m)}$. Then,

  1. For a given $\alpha$, find the largest $k$ such that $P_{(k)} \leq \frac{k}{m} \alpha$
  2. Reject the null hypothesis (i.e., declare discoveries) for all $H_{(i)}$ for $i=1, \ldots, k$

In python we can use the multipletests from statsmodels. See here. The second return will be the adjusted p-values.

This multiple testing problem can arise in the pairs trading setting where we need to select pairs based on some test statistics. There are a great number of tests. We definitely dislike the false positives to be many, there should be few valid pairs to be traded.

Likelihood Ratio Test

We need a general framework to build test statistics. And here is the likelihood ratio test.

The test is referred to as the likelihood ratio test because it constructs a test statistic for testing $H_0: \theta \in \Theta_0$ versus $H_1: \theta \notin \Theta_0$ via the ratio

$$ T=-2 \log \left(\frac{\max _{\theta \in \Theta_0} L(\theta)}{\max _{\theta \notin \Theta_0} L(\theta)}\right) $$

Important Result: Under “regularity conditions” and the assumption that $P(T=0)=0$, then $T$ has (asymptotically) the chi-squared distribution with degrees of freedom equal to Number of free parameters under $H_1-$ Number of free parameters under $H_0$.

Intuition: Under both null and alternative hypothesis, we choose parameters that maximize the likelihoods of the observed data. And use their ratio as our criteria. If the null hypothesis likelihood is lower, we tend to reject it, so the log will be smaller, and the $T$ will be larger, and more likely to lie in the rejection region.

t-distribution

$t$ distribution is symmetric and bell-shaped. Compared with Normal distribution, it has heavier tails. The following chart shows that as the degree of freedoms increasing, the distribution becomes more concentrated in the middle, and thinner on tails. Especially, when the degree of freedom approaches to infinity, the $t$ distribution becomes a normal distribution.

Student t pdf.svg

How Student’s distribution arises from sampling

Consider some IID random normal variables from $\mathcal{N}\left(\mu, \sigma^2\right)$. Define the sample mean and sample variance as the following

$$ \begin{aligned} \bar{X}&=\frac{1}{n} \sum_{i=1}^n X_i\\ S^2&=\frac{1}{n-1} \sum_{i=1}^n\left(X_i-\bar{X}\right)^2 \end{aligned} $$

Then $\frac{\bar{X}-\mu}{S / \sqrt{n}}$ follows a $t$ distribution with $n-1$ degrees of freedom.

Questions

Suppose you do 20 independent hypothesis testing. What is the probability that at least one $t$ statistics is greater than 2. (since no degree of freedoms and critical values of $t$ is provided. Then let's assume using normal greater then 1.96).

The probability that a standard normal smaller than $1.96$ is $97.5%$. So all of the normal variables smaller than $1.96$ is $0.975^{20}$. So at least one greater than this threshold is $1-0.975^{20}$.

How to estimate this quantity? Let’s use $\log(1+x)\approx x$ around $x=0$.

Then $\log(0.975^{20})=\log(1-0.025)^{20}\approx20\times(-0.025)=-0.5$.

Then $1-0.975^{20}\approx 1-e^{-0.5}\approx0.4$.

Yiming Zhang
Yiming Zhang
Quantitative Researcher Associate, JP Morgan