import numpy as np
import pandas as pd
import statsmodels.api as sm
import pandas_datareader as pdr
import matplotlib.pyplot as plt
import statsmodels.tsa.api as tsa
PLOT_STYLE = "seaborn-colorblind"
import warnings
warnings.filterwarnings("ignore")
adjclose = pdr.get_data_yahoo("aapl", "2019-03-16", "2022-03-16")["Adj Close"]
logret = np.log(adjclose/adjclose.shift(1)).dropna()
with plt.style.context(PLOT_STYLE):
fig, ax = plt.subplots(1, 1)
# auto_ylims
tsa.graphics.plot_pacf(logret, method="yw", ax=ax, auto_ylims=True)
X = logret.loc['2019-03-16':'2021-03-16']
y = logret.loc['2021-03-17':'2022-03-16']
First, apply the Yule-Walker equations to estimate the AR coefficients. The method returns the coefficients as well as the sigma of the residuals. By default, this method will remove the mean beforeing fitting, and there is no constant term.
The method returns the $\widehat{\phi}$, not $\mathbf{R}$, though variable is called so. The $\mathbf{R}$ and $\mathbf{r}$ are estimated by Yule-Walker.
rho, sigma = sm.regression.yule_walker(X, order=9, method="mle")
print("The coefficients of AR(9) model, fitted by Yule-Walker.")
print(rho)
The coefficients of AR(9) model, fitted by Yule-Walker. [-0.15326416 0.00625946 0.01964356 -0.02028379 0.02395939 -0.08346254 0.08615506 -0.08927608 0.15616983]
sigma
0.02257395886693094
Then, apply the AutoReg method (least square estimates). The coefficients obtained are very close, except for the second one (close to zero, trivial). Also, the sigmas from both residuals are close
from statsmodels.tsa.ar_model import AutoReg
ar_ols = AutoReg(X, lags = range(1,10)).fit()
print("The coefficients of AR(9) model, fitted by Least Squares.")
ar_ols.params
The coefficients of AR(9) model, fitted by Least Squares.
const 0.002089 Adj Close.L1 -0.152338 Adj Close.L2 0.008230 Adj Close.L3 0.023569 Adj Close.L4 -0.021845 Adj Close.L5 0.024433 Adj Close.L6 -0.086268 Adj Close.L7 0.089491 Adj Close.L8 -0.092066 Adj Close.L9 0.161035 dtype: float64
np.std(ar_ols.resid)
0.022649010745273625
The predictions using least squares are relatively easy.
pred_ols = ar_ols.predict(start = len(X) + 1, end = len(y) + len(X))
Comments before predicting by Yule-Walker
import copy
ar_yw = copy.deepcopy(ar_ols)
ar_yw.params[0] = 0 #reset the consant to 0
for i in range(1, 10):
ar_yw.params[i] = rho[i-1]
ar_yw.params
const 0.000000 Adj Close.L1 -0.153264 Adj Close.L2 0.006259 Adj Close.L3 0.019644 Adj Close.L4 -0.020284 Adj Close.L5 0.023959 Adj Close.L6 -0.083463 Adj Close.L7 0.086155 Adj Close.L8 -0.089276 Adj Close.L9 0.156170 dtype: float64
pred_yw = ar_yw.predict(start = len(X) + 1, end = len(y) + len(X)) + np.mean(X)
Load another predictions on the same data set that we have obtained before, and compare the results
pred_2e = pd.Series(np.load("preds_2e.npy"))
pred_2e.index = pred_ols.index
fig, axs = plt.subplots(1, 3, sharex=True, sharey=True, figsize = (15,4))
axs[0].plot(pred_ols);
axs[0].set_title("OLS");
axs[1].plot(pred_yw);
axs[1].set_title("Yule-Walker");
axs[2].plot(pred_2e);
axs[2].set_title("Problem 2e");
plt.tight_layout()
from sklearn.metrics import mean_squared_error
print("OLS: ", mean_squared_error(y, pred_ols))
print("Yule-Walker: ", mean_squared_error(y, pred_yw))
print("Problem 2e: ", mean_squared_error(y, pred_2e))
OLS: 0.00023772875933569045 Yule-Walker: 0.0002375987341581026 Problem 2e: 0.00027102017846111477
Comments