Weighted Least Squares and Heteroskedasticity
Table of Contents
Context
One assumption in the linear regression is the error terms $\epsilon$ has constant variance. While, often, the magnitude of the noise is not, and this is called heteroskedasticity. Two popular ways to solve this,
- Transform the response $y$, for example, the residual plots are funnel shape, then we can do $log(y)$
- Use weighted least squares, and set smaller weights to observations that have large variances. The ordinary least square attaches equal importance to every observation, but the WLS concerns more about fitting well on spots where noise is small.
Weighted Least Square
Minimize the weighted mean squared error:
$$ \frac{1}{n} \sum_{i=1}^{n} w_{i}\left(y_{i}-\mathbf{x}_{i} \cdot \mathbf{b}\right)^{2} $$To write this in a matrix format, we need to make $\mathbf{w}$ a diagonal matrix with $w_i$ on the diagonal. Then the objective can be written as,
$$ \begin{aligned} W M S E &=n^{-1}(\mathbf{y}-\mathbf{x} \mathbf{b})^{T} \mathbf{w}(\mathbf{y}-\mathbf{x} \mathbf{b}) \\ &=\frac{1}{n}\left(\mathbf{y}^{T} \mathbf{w} \mathbf{y}-\mathbf{y}^{T} \mathbf{w} \mathbf{x} \mathbf{b}-\mathbf{b}^{T} \mathbf{x}^{T} \mathbf{w} \mathbf{y}+\mathbf{b}^{T} \mathbf{x}^{T} \mathbf{w} \mathbf{x} \mathbf{b}\right) \end{aligned} $$Differentiate the above with respect to $\mathbf{b}$ and we set the gradient to 0 to get the optimum, we have
$$ \widehat{\beta}_{W L S}=\left(\mathbf{x}^{T} \mathbf{w} \mathbf{x}\right)^{-1} \mathbf{x}^{T} \mathbf{w} \mathbf{y} $$Suppose the true relationship is $\mathbf{y}=\mathbf{x}\beta+{\epsilon}$. Then,
$$ \begin{aligned} \widehat{\beta}_{W L S} &=\left(\mathbf{x}^{T} \mathbf{w} \mathbf{x}\right)^{-1} \mathbf{x}^{T} \mathbf{w} \mathbf{x} \beta+\left(\mathbf{x}^{T} \mathbf{w} \mathbf{x}\right)^{-1} \mathbf{x}^{T} \mathbf{w} \epsilon \\ &=\beta+\left(\mathbf{x}^{T} \mathbf{w} \mathbf{x}\right)^{-1} \mathbf{x}^{T} \mathbf{w} \epsilon \end{aligned} $$Since $\mathbb{E}[\epsilon \mid \mathbf{x}]=0$, thus $\mathbb{E}\left[\widehat{\beta}_{W L S} \mid \mathbf{x}\right]=\beta$.
Key Points
- WLS estimator is still an unbiased estimator.
- The objective is minimized if we choose $w_{i}=1 / \sigma_{i}^{2}$. This is ideal because the true variance is unknown.
Conditional Variance
How to estimate $\sigma_i$ so that we can choose the weight? Here, we need to conditional variance. Say the $\sigma_i$ is a function of $\mathbf{x}_i$.
Squared Residuals Method
- Model the mean: maybe this is a regression itself, but you get $\hat{m}(x)$
- Construct the squared residuals: $u_{i}=\left(y_{i}-\hat{m}\left(x_{i}\right)\right)^{2}$
- Model the conditional mean of residual squared. This should be a non-parametric method (like KDE) that describes the distribution of the residual squared. This is a function of $x$, say $\widehat{q}(x)$.
- The predicted value of the function is an estimates of the conditional variance. $\widehat{\sigma}_{x}^{2}=\widehat{q}(x)$.
- Calculate the weights and fit the WLS.
Reference
Content summarized from the following notes: