Statistical Jump Model

This model is now implemented in my package quantbullet Please refer to this doc for the test results and example usages.
Table of Contents

Ideas

Modeling market regimes can be helpful in many ways,

  1. identify the prevailing market conditions and potential future shifts
  2. tailor strategies to best suit the current environment, enhancing potential returns and mitigating risks
  3. provide a signal for timely portfolio rebalancing, risk management, and strategic asset allocation

Traditionally, hidden markov model is a popular choice, while research in these years tends to incorporate statistics and ML as well. This statistical jump model provides a good framework to consider both clustering quality and persistance in state sequence.

The model assumes that the market has hidden regimes controlled by some hidden parameters (regimes are therefore specified by their differences in parameters). What we can observe is the the realizations of these parameters. Regimes evolve over time, so they form a hidden state sequence. Each hidden state has its own parameter set, and we assume that the parameters, over certain time, are static within each hidden state. Our goal is to identify the hidden state sequence and their parameters by classifying the observed features while taking into account the regime switching cost. Once the model learns the hidden parameters and their sequence, we can use the model to predict the future states.

The switching cost is included because the model considers persistance in state sequences, that is the state does not change frequently and abruptly.

Model parameters

  1. State sequence: $S = {s_1, s_2, \dots, s_T}$, where $s_t \in {1, 2, \dots, K}$ is the state at time $t$.
  2. State parameters: $\Theta = {\theta_1, \theta_2, \dots, \theta_K}$, where $\theta_k$ is the parameter set of state $k$ and $\theta_k \in \mathbb{R}^D$.
  3. Observed features: $Y = {y_1, y_2, \dots, y_T}$, where $y_t$ is the feature vector at time $t$. $y_t \in \mathbb{R}^D$.
  4. $\lambda$ is the penalty for switching states.

In summary, we have $T$ observations of $D$-dimensional features with our assumption of $K$ hidden states. The goal is to find the hidden state sequence $S$.

The cost function is defined based on K-means clustering algorithm. Of course we can use other clustering algorithms, but K-means is the most straightforward one. We would like to choose $K$ centroids and minimize the sum of squared distances between each observation and its nearest centroid. To keep the state sequence smooth, we add a penalty term to the cost function. Therefore, the minimization problem is defined as:

$$ \underset{\Theta, S}{\arg \min } \sum_{t=0}^{T-1} l\left(\boldsymbol{y}_t, \boldsymbol{\theta}_{s_t}\right)+\lambda \sum_{t=1}^{T-1} \mathbb{1}_{\left\{s_{t-1} \neq s_t\right\}} $$

This is the optimization problem we need to solve for discrete case (discrete state sequence). The continuous case is similar, but omitted here.

Implementation Notes

Due to the non-convexity of the cost function, a coordinate descent algorithm is used to solve the optimization problem. To avoid the local minimum, the algorithm is run multiple times with different initial values. The final result is the one with the lowest cost function value. The initialization is done by an algorithm K-means++, which is not difficult to implement.

Example of Coordinate Descent
Example of Coordinate Descent

After the initialization, we should obtain a initial state sequence. Then the optimization has two steps. First, we fix the state sequence and update the state parameters. Second, we fix the state parameters and update the state sequence. The two steps are repeated until the cost function converges.

Fixed state sequence and update state parameters

The first step is to fix the state sequence and update the state parameters. In the K-means algorithm, the centroids are the mean of the data points assigned to the cluster. Since we start with a known state sequence (at least temporarily) and we know the data points assigned to each state, we can compute the centroids directly. The centroids are the state parameters. The objective function is the sum of squared distances between each observation and its nearest centroid. The implementation is as follows:

def fixed_states_optimize(self, y, s, k=2):
    """
    Optimize the parameters of a discrete jump model with states fixed first.

    Args:
        y (np.ndarray): Observed data of shape (T x n_features).
        s (np.ndarray): State sequence of shape (T x 1).
        theta_guess (np.ndarray): Initial guess for theta of shape (k x n_features).
        k (int): Number of states.

    Returns:
        tuple:
            - np.ndarray: Optimized parameters of shape (k x n_features).
            - float: Optimal value of the objective function.
    """
    if not isinstance(y, np.ndarray) or not isinstance(s, np.ndarray):
        raise TypeError("y and s must be numpy arrays")

    T, n_features = y.shape
    theta = np.zeros((k, n_features))

    # Compute the optimal centroids (theta) for each state
    for state in range(k):
        assigned_data_points = y[s == state]
        if assigned_data_points.size > 0:
            theta[state] = assigned_data_points.mean(axis=0)

    # Compute the objective value
    objective_value = 0.5 * np.sum(
        [np.linalg.norm(y[i, :] - theta[s[i], :]) ** 2 for i in range(T)]
    )

    return theta, objective_value

Fixed state parameters and update state sequence

The second step involves a dynamic programming algorithm. Matrix $V(t, k)$ memorizes the minimum cost of the first $t$ observations with the last state being $k$. The first iteration finds the minimum cost, and the second iteration finds the optimal state sequence by working backward. This is very similar to a 0-1 knapsack problem to find maximum value and what items are selected. The algorithm is as follows:

DP Algo to update state sequence
DP Algo to update state sequence

Thoughts in Practical Usage

I have used the identified states to see if it makes the same strategy performs differently, and the answer is yes. In the case of only two states, it seems this model tends to classify one state as low volatility, one-sided market, and the second state with high volatility that comes with abnormal returns.

The features, including volatilities and average returns on different time horizons, seems to depict the co-occurrence of high volatility and abnormal returns for the second state. And this pattern is surely helpful for a mean reversion strategy.

The first identified state, with low volatility, usually represents some stable market trend, and we do observe momentum strategy works better under this state.

Regarding the prediction power, this is not a simple signal, and the complexity here indeeds introduces some advantages - sometimes earlier detection for a volatility surge. Well I have to consider more carefully to ensure no future information is used.

References

  1. Aydinhan, Afsar Onat and Kolm, Petter N. and Mulvey, John M. and Shu, Yizhan, Identifying Patterns in Financial Markets: Extending the Statistical Jump Model for Regime Identification (September 19, 2023). Available at SSRN: https://ssrn.com/abstract=4556048 or http://dx.doi.org/10.2139/ssrn.4556048
  2. Nystrup P, Lindström E, Madsen H. Learning hidden Markov models with persistent states by penalizing jumps[J]. Expert Systems with Applications, 2020, 150: 113307.
Yiming Zhang
Yiming Zhang
Quantitative Researcher Associate, JP Morgan