ML03 Decision Trees
Table of Contents
Questions
Bagging
Bagging (bootstrap aggregation) is a general procedure for reducing the variance of a model by averaging model predictions over bootstrapped training data sets. Suppose bootstrap the data to get $B$ training data sets, then we have $B$ model predictions for a single instance. Averaging them to get,
$$ \hat{f}_{\mathrm{avg}}(x)=\frac{1}{B} \sum_{b=1}^B \hat{f}^b(x) $$Out-of-Bag Error Estimation
Since each training dataset is just a bootstrapped sample, so there are instances that are not used, which creates some free out of sample test data. On average, each sample will make use of $\frac{2}{3}$ of the instances. From an instance’s view, $\frac{B}{3}$ numbers of bootstrapped samples will not include itself as the training data, and therefore generates a out-of-sample prediction. And a OOB prediction for an instance $i$ will be the average of these $\frac{B}{3}$ predictions.
Random Forests
Random forests improves the bagged trees by decorrelating the trees. The RF still samples the instances, but it also samples the features. Typically, $m \approx \sqrt{p}$ number of features are used in each tree fits. By doing so, we prevent the situations that trees look similar in the first few splits due to some dominating features, and thus decorrelates the trees.
Boosting
Unlike Bagging that trees are fit separately, here the trees are grown sequentially. Each tree works on the residuals that the previous trees fail to predict well. The RF tends to fit the data hard and potentially overfitting, the boosting approach leans slowly with very simple structure of each learner.