Academic Article Series: Economics and Finance Investigation:

Information Demand and Stock Return Predictability (Coded in R): Part 4: Sign Forecast Frameworks

January 08,2021

Jonathan Legrand

This article addresses well established return forecasting challenges via frameworks that focus on the sign of the change in asset index excess returns using a family of GARCH models. It investigates them in the literature's original S&P 500 index to study the predictive power of information demand proxied by Google's internet search vector index and finds evidence suggesting that an efficient trading strategy stemming from this study can be constructed. This article is aimed at academics from undergraduate level up, and thus will explain all mathematical notations to ensure that there is no confusion and so that anyone - no matter their expertise on the subject - can follow.

Introduction

Since Tinbergen (1933)'s modelling work on business cycles post World War I, many econometric methods were developed with the aim of constructing ever better economic and financial forecasts. The subject evolved through statistical linear and non-linear relationships with Autoregressive (AR) (Yule (1927)), Moving Average (MA) (N. and Wold (1939)), Autoregressive Moving Average (ARMA) (Whittle (1951)), Autoregressive Conditional Heteroskedasticity (ARCH) (Engle (1982)), Generalized ARCH (GARCH) (Bollerslev (1986)), Exponential GARCH (EGARCH) (Nelson (1991)), threshold nonlinear ARMA (Tong (1990)), and Glosten- Jagannathan-Runkle GARCH (GJR-GARCH) (Glosten, Jagannathan, and Runkle (1993))) models amongst others. $$ \\ $$

Stylized Facts of Financial Return (SFFR)

These models’ uses in financial time-series analysis have grown to become staple tools in academia; but many came upon what are now well established asset return forecasting challenges: Stylized Facts of Financial Return (SFFR) - a set of properties found in most asset return datasets (Booth and Gurun (2004); Harrison (1998); Mitchell, Brown, and Easton (2002)). Many SFFR exist, but only a few are of interest in this instance: asset returns' (i) non-Gaussian distribution (unconditional leptokurtosis/excess-kurtosis and negative-skewness/asymmetry), (ii) weak serial correlation (id est (i.e.): the lack of correlation between returns on different days), (iii) volatility clustering (heteroskedasticity), as well as (iv) serial correlation in squared returns, and (v) leverage effects (the tendency for variance to rise more following large negative price changes than positive ones of similar magnitude). (Taylor (2005))

Numerous time-series processes fail to encapsulate these SFFR, exempli gratia (e.g.): AR, MA and ARMA models may necessitate the assumption of homoscedasticity (i.e.: constant variance in error terms) - failing SFFR (iii), (iv) and (v) - which is a useful characteristic to include as large changes (in asset prices) tend to be followed by large changes-of either sign-and small changes tend to be followed by small changes" (Mandelbrot (1963), p.418).

In this article, I address these challenges using aforementioned models' forecasts in the framework outlined in Christoffersen and Diebold (2006) (C&D hereon) to focus on asset return sign changes instead of their level changes as highlighted in Pesaran and Timmermann (2000). My investigation expands on the work of Chronopoulos, Papadimitriou, and Vlastakis (2018) (CPV hereon) - replicating their research on the United States (U.S.)' Standard & Poor's 500 Index (SPX), and studies its feasible financial impact by constructing and assessing investment strategies stemming from findings.

Evidence that an efficient trading strategy may be constructible for the SPX are found.

Breakdown

In this article, I primarily study the frameworks outlined in CPV to investigate the explanatory and forecasting powers of information demand on the SPX. $$ \\ $$ Financial applications of my models are studied via the perspective of a single profit maximising investor in a manner akin to CPV (Kandel and Stambaugh (1996)). In the interest of time, I only investigate the scenario where the investing agent's risk aversion implies taking a risk as soon as its expected reward is positive. Later we will look into a framework to optimise this risk aversion level to allow for a truly profit maximising strategy.

Part 1: Data Retrieval and Analysis: In this part, we start the R code, construct some rudimentary functions and collect data from Refinitiv all while explaining their use. We also go through the concept of risk-free rates and excess-returns as well as realised variance & realised volatility in addition to Google’s Search Vector Index before lightly analysing the retrieved data.
Part 2: Econometric Modelling of Non-Linear Variances: In this section, there is no coding. Instead we go – step by step and including working outs – through ARMA models, their parameter estimation process (via log-likelihood), the concepts of stationarity, Taylor’s expansion, the calculus’ chain rule, first order conditions, and GARCH models (including the GACH, GJRGARCH and EGARCH models). Finally, we outline the way in which we construct error values from which to evaluate the performance of our models.
Part 3: Coding Non-Linear Variance Econometric Models: In this section we encode the mathematical models and concepts in Part 2. We build our code, explaining it step-by-step, using in-series (computationally slow) and in-parallel (computationally fast) methods. Finally we expose the Diebold-Mariano test results for each pair of model performances, with and without using SVIs.
Part 4: Sign Forecast Frameworks: Here we outline the Christopherson and Diebold model to estimate the probability of a positive returns, we compare the predictive performance of each model via rudimentary methods, Brier Scores and Diebold and Mariano statistics.
Part 5: Financial significance: Finally, we put ourselves in the shoes of a profit maximising investor who reinvests – on a daily basis - in the risk-free asset or the S&P500 if the model suggests that its excess return will be positive with more probability than not. We graph the cumulative returns of such an investor for each model-following-strategy and discuss the results with the help of Sharpe-Ratios before concluding.

Development Tools & Resources

Refinitiv's DataStream Web Services for R (DatastreamDSWS2R): Access to DataStream data. A DataStream or Refinitiv Workspace IDentification (ID) will be needed to run the code bellow.

Part 4: Sign Forecast Frameworks

Christopherson and Diebold Framework

Following from C\&D, Christoffersen, Diebold, Mariano, Tay & Tse (2006)'s working paper outlined how the conditional probability of a positive return in the next time period is $$\begin{equation} \mathbb{P}(R_{t+1} > 0 | \Omega_{t}) = 1 - \mathbb{P} \bigg( \frac{R_{t+1} - \mu}{\sigma_{t+1|t}} \leq \frac{-\mu}{\sigma_{t+1|t}} \bigg) = 1 - F\bigg( \frac{-\mu}{\sigma_{t+1|t}} \bigg) \end{equation}$$

where $\mathbb{P}$ denotes probability, where there is no conditional mean predictability in returns $\textbf{: } \mu_{t+1|t} = \mu$, where $R_{t+1}\sim D(\mu,\sigma_{t+1|t})$ for a generic distribution $D$ dependent only on $\mu$ and $\sigma$ (such as the Normal distribution), and where $F$ is the distribution function of the “standardized” return $\frac{R_{t+1} - \mu}{\sigma_{t+1|t}}$. Then $R_{t+1}$'s sign is predictable even if its conditional mean isn't, provided that $\mu \neq 0$. If $F$ is asymmetric - as per SFFR(i) - then forecast constructions are still possible if $\mu = 0$.

$R_{t+1}$ sign forecasts are formulated as $$\begin{equation} \label{eq:PiCandD} \hat{\pi}_{C\&D_M, t+1} = \hat{\mathbb{P}}_{C\&D_M}(R_{t+1|t}>0) = 1 - \frac{1}{t} \sum_{k=1}^{t} I \bigg( \frac{R_{k} - {\hat{\mu}_{k|k-1}}}{\hat{\sigma}_{k|k-1}} \leq \frac{-\hat{\mu}_{t+1|t}}{\hat{\sigma}_{t+1|t}} \bigg) \end{equation}$$

where $M$ is the model used to compute recursive one-step-ahead out-of-sample forecasts $\hat{\mu}$ and $\hat{\sigma}$ ranging from GARCH to EGARCH-SVI (note that the model used to estimate $\hat{\mu}_{t+1|t}$ and $\hat{\sigma}_{t+1|t}$ are not specified on the right-hand-side of this equation to keep the notation uncluttered; it is so throughout the article), and where $k \in \mathbb{Z}$.

Estimates of the probability of a positive return

Naive Model and its forecast error statistics

Remember our Naïve Framework. A Naïve model of sign change is used as a benchmark to construct comparisons. It is formulated as $$ \hat{\pi}_{Naive, t+1} = \hat{\mathbb{P}}_{Naive}(R_{t+1|t}>0) = \frac{1}{t} \sum_{k=1}^{t} I (R_k > 0) \text{ .}$$

    	
            # Naive-Model's Indicator function according to the GARCH11 Model:

Naive_I = ifelse(matrix(df[, c("SPX_R")])>0 , 1 , 0)



# Naive Model:

Naive = (1:(length(matrix(df[, c("SPX_R")]))))

for(t in c(1:(length(matrix(df[, c("SPX_R")])))))

{Naive[t] = (1/t) * sum(matrix(df[, c("SPX_R")])[1:t])}

    	
            Naive_zoo = zoo(Naive, index(df[, c("SPX_R")]))

C&D's GARCH models with and without SVI (Model independent variables' construction)

A few pre-required variables (and R variables) have to be constructed first:

    	
            # mean of return up to time 'k'

R_mu = (1:length(matrix(df[, c("SPX_R")])))

for (i in c(1:length(matrix(df[, c("SPX_R")]))))

{R_mu[i] = mean(matrix(df[, c("SPX_R")])[1:i])}



# standard deviation of return up to time 'k'.

# Note that its 1st value is NA since there is no standard deviation for a 

# constant (according to R).

R_sigma = (1:length(matrix(df[, c("SPX_R")])))

for (i in c(1:length(matrix(df[, c("SPX_R")]))))

{R_sigma[i] = sd(matrix(df[, c("SPX_R")])[1:i])}

R_sigma[1] = 0 # Since the variance of a constant is 0.

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.02.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.02.RData")

GARCH11's C&D Model

Series

Computing in series the values $\hat{\pi}_{C\&D_{GARCH(1,1)}, t+1} $

    	
            CnDprocess_ser = function(regressand = df[, c("SPX_R")],

                          GARCH_mu_hat,

                          GARCH_sigma_hat)

{

    # # Create our variables:



    # C&D's Indicator function according to the GARCH11 Model:

    GARCH_CandD_I = matrix(, nrow = roll, ncol = roll)

    GARCH_CandD_I_k = matrix(, nrow = roll, ncol = roll)

    GARCH_CandD_I_t = matrix( - (GARCH_mu_hat/GARCH_sigma_hat))

    

    

    for(t in c(1:roll))

    {for (k in c(1:t))

    {GARCH_CandD_I_k[k,t] =

     ifelse(((matrix(regressand)[252+k] - GARCH_mu_hat[k])/GARCH_sigma_hat[k])

            <= GARCH_CandD_I_t[t+1],

            1,0)



    # Note that we have some missing values due to the model not always managing

    # to converge to estimates of sigma and mu; since we need the t+1 model

    # estimates to compute its CandD_I_k, we also loose its t'th value for

    # each model's missing value.

    # We also lose the last value (the roll'th value) for the same reason.

    GARCH_CandD_I[k,t] =  ifelse((is.na(GARCH_CandD_I_k[k,t])), NA,

                                 (sum(GARCH_CandD_I_k[1:k,t], na.rm = TRUE)))}}

    

    # C&D's model according to the GARCH11 Model:

    GARCH_CandD = (1:(roll-1))

    for(i in c(1:(roll-1)))

    {GARCH_CandD[i] = 1 - ((GARCH_CandD_I[i,i])/

                           length(na.omit(GARCH_CandD_I[1:i,i])))}

    GARCH_CandD_zoo = zoo(GARCH_CandD, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

    

    # This zoo object has missing values, so we can remove them:

    GARCH_CandD_zoo_no_nas = na.omit(GARCH_CandD_zoo)



    ## Cleaning datasets

    # rm(GARCH_CandD_I_t)

    # rm(GARCH_CandD_I_k)

    # rm(GARCH_CandD_I)

    

    return(list(GARCH_CandD_I_k,

                GARCH_CandD_I,

                GARCH_CandD,

                GARCH_CandD_zoo,

                GARCH_CandD_zoo_no_nas

               ))

}

    	
            GARCH11_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],

                                           GARCH_mu_hat = GARCH11_mu_hat_ser,

                                           GARCH_sigma_hat = GARCH11_sigma_hat_ser)

    	
            GARCH11_CandD_I_k_ser = GARCH11_CandD_process_ser[[1]]

GARCH11_CandD_I_ser = GARCH11_CandD_process_ser[[2]]

GARCH11_CandD_ser = GARCH11_CandD_process_ser[[3]]

GARCH11_CandD_zoo_ser = zoo(GARCH11_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

GARCH11_CandD_zoo_no_nas_ser = zoo(na.omit(GARCH11_CandD_zoo_ser))

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.03.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.03.RData")

Parallel

Computing in parallel the values $\hat{\pi}_{C\&D_{GARCH(1,1)}, t+1} $

    	
            CnDprocess_t_par = function(

    t,

    regressand = df[, c("SPX_R")],

    pre_GARCH_mu_hat = GARCH11_mu_hat_par,

    pre_GARCH_sigma_hat = GARCH11_sigma_hat_par,

    pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

    pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

{

    pre_GARCH_CandD_I_t = matrix( - (pre_GARCH_mu_hat/pre_GARCH_sigma_hat))

    

    for (k in c(1:t))

    {pre_GARCH_CandD_I_k[k] =

     ifelse(((matrix(regressand)[252+k] -

              pre_GARCH_mu_hat[k])/

             pre_GARCH_sigma_hat[k])

            <= pre_GARCH_CandD_I_t[t+1],

            1,0)



    # Note that we have some missing values due to the model not always managing

    # to converge to estimates of sigma and mu; since we need the t+1 model

    # estimates to compute its CandD_I_k, we also loose its t'th value for

    # each model's missing value.

    # We also lose the last value (the roll'th value) for the same reason.

    pre_GARCH_CandD_I[k] = 

     ifelse((is.na(pre_GARCH_CandD_I_k[k])),

            NA,

            (sum(pre_GARCH_CandD_I_k[1:k],

                 na.rm = TRUE)))}

    

    return(list(pre_GARCH_CandD_I_k, pre_GARCH_CandD_I))

}

    	
            no_of_clusters = detectCores(logical = FALSE)

cl = makeCluster(no_of_clusters)



GARCH11_CnDprocess_par = clusterApply(cl,

                                     x = 1:roll, # t

                                     fun = CnDprocess_t_par,

                                     regressand = df[, c("SPX_R")],

                                     pre_GARCH_mu_hat = GARCH11_mu_hat_par,

                                     pre_GARCH_sigma_hat = GARCH11_sigma_hat_par,

                                     pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

                                     pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

stopCluster(cl)

    	
            # C&D's model according to the GARCH11 Model:

GARCH11_CandD_par = (1:(roll-1))

for(i in c(1:(roll-1)))

{GARCH11_CandD_par[i] = 1 - ((GARCH11_CnDprocess_par[[i]][[2]][i])/

                       length(na.omit(GARCH11_CnDprocess_par[[i]][[2]])))}

GARCH11_CandD_zoo_par = zoo(GARCH11_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))



# This zoo object has missing values, so we can remove them:

GARCH11_CandD_zoo_no_nas_par = na.omit(GARCH11_CandD_zoo_par)

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.04.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.04.RData")

It is interesting to see that each 'CnDprocess_t' column is not the same. In addition to having 'NA's at different locations, they were built off recursive modellings of $\hat{\mu_t}$ and $\hat{\sigma_t}$. This is most apparent in the second element of the last list in 'CnDprocess_t', where the summation of previous outputs emphasise this effect:

GARCH11SVI's C&D Model

Series

Computing in series the values $\hat{\pi}_{C\&D_{GARCH(1,1)-SVI}, t+1} $

    	
            GARCH11SVI_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],

                                              GARCH_mu_hat = GARCH11SVI_mu_hat_ser,

                                              GARCH_sigma_hat = GARCH11SVI_sigma_hat_ser)

    	
            GARCH11SVI_CandD_I_k_ser = GARCH11SVI_CandD_process_ser[[1]]

GARCH11SVI_CandD_I_ser = GARCH11SVI_CandD_process_ser[[2]]

GARCH11SVI_CandD_ser = GARCH11SVI_CandD_process_ser[[3]]

GARCH11SVI_CandD_zoo_ser = zoo(GARCH11SVI_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

GARCH11SVI_CandD_zoo_no_nas_ser = zoo(na.omit(GARCH11SVI_CandD_zoo_ser))

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.05.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.05.RData")

Parallel

Computing in parallel the values $\hat{\pi}_{C\&D_{GARCH(1,1)-SVI}, t+1} $

    	
            no_of_clusters = detectCores(logical = FALSE)

cl = makeCluster(no_of_clusters)



GARCH11SVI_CnDprocess_par = clusterApply(cl,

                                         x = 1:roll, # t

                                         fun = CnDprocess_t_par,

                                         regressand = df[, c("SPX_R")],

                                         pre_GARCH_mu_hat = GARCH11SVI_mu_hat_par,

                                         pre_GARCH_sigma_hat = GARCH11SVI_sigma_hat_par,

                                         pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

                                         pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

stopCluster(cl)

    	
            # C&D's model according to the GARCH11SVI Model:

GARCH11SVI_CandD_par = (1:(roll-1))

for(i in c(1:(roll-1)))

{GARCH11SVI_CandD_par[i] = 1 - ((GARCH11SVI_CnDprocess_par[[i]][[2]][i])/

                       length(na.omit(GARCH11SVI_CnDprocess_par[[i]][[2]])))}

GARCH11SVI_CandD_zoo_par = zoo(GARCH11SVI_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))



# This zoo object has missing values, so we can remove them:

GARCH11SVI_CandD_zoo_no_nas_par = na.omit(GARCH11SVI_CandD_zoo_par)

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.06.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.06.RData")

GJRGARCH11's C&D Model

Series

Computing in series the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)}, t+1} $

    	
            GJRGARCH11_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],

                                              GARCH_mu_hat = GJRGARCH11_mu_hat_ser,

                                              GARCH_sigma_hat = GJRGARCH11_sigma_hat_ser)

    	
            GJRGARCH11_CandD_I_k_ser = GJRGARCH11_CandD_process_ser[[1]]

GJRGARCH11_CandD_I_ser = GJRGARCH11_CandD_process_ser[[2]]

GJRGARCH11_CandD_ser = GJRGARCH11_CandD_process_ser[[3]]

GJRGARCH11_CandD_zoo_ser = zoo(GJRGARCH11_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

GJRGARCH11_CandD_zoo_no_nas_ser = zoo(na.omit(GJRGARCH11_CandD_zoo_ser))

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.07.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.07.RData")

Parallel

Computing in parallel the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)}, t+1} $

    	
            no_of_clusters = detectCores(logical = FALSE)

cl = makeCluster(no_of_clusters)



GJRGARCH11_CnDprocess_par = clusterApply(cl,

                                         x = 1:roll, # t

                                         fun = CnDprocess_t_par,

                                         regressand = df[, c("SPX_R")],

                                         pre_GARCH_mu_hat = GJRGARCH11_mu_hat_par,

                                         pre_GARCH_sigma_hat = GJRGARCH11_sigma_hat_par,

                                         pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

                                         pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

stopCluster(cl)

    	
            # C&D's model according to the GJRGARCH11 Model:

GJRGARCH11_CandD_par = (1:(roll-1))

for(i in c(1:(roll-1)))

{GJRGARCH11_CandD_par[i] = 1 - ((GJRGARCH11_CnDprocess_par[[i]][[2]][i])/

                       length(na.omit(GJRGARCH11_CnDprocess_par[[i]][[2]])))}

GJRGARCH11_CandD_zoo_par = zoo(GJRGARCH11_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))



# This zoo object has missing values, so we can remove them:

GJRGARCH11_CandD_zoo_no_nas_par = na.omit(GJRGARCH11_CandD_zoo_par)

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.08.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.08.RData")

GJRGARCH11SVI's C&D Model

Series

Computing in series the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)-SVI}, t+1} $

    	
            GJRGARCH11SVI_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],

                                              GARCH_mu_hat = GJRGARCH11SVI_mu_hat_ser,

                                              GARCH_sigma_hat = GJRGARCH11SVI_sigma_hat_ser)

    	
            GJRGARCH11SVI_CandD_I_k_ser = GJRGARCH11SVI_CandD_process_ser[[1]]

GJRGARCH11SVI_CandD_I_ser = GJRGARCH11SVI_CandD_process_ser[[2]]

GJRGARCH11SVI_CandD_ser = GJRGARCH11SVI_CandD_process_ser[[3]]

GJRGARCH11SVI_CandD_zoo_ser = zoo(GJRGARCH11SVI_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

GJRGARCH11SVI_CandD_zoo_no_nas_ser = zoo(na.omit(GJRGARCH11SVI_CandD_zoo_ser))

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.09.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.09.RData")

Parallel

Computing in parallel the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)-SVI}, t+1} $

    	
            no_of_clusters = detectCores(logical = FALSE)

cl = makeCluster(no_of_clusters)



GJRGARCH11SVI_CnDprocess_par = clusterApply(cl,

                                            x = 1:roll, # t

                                            fun = CnDprocess_t_par,

                                            regressand = df[, c("SPX_R")],

                                            pre_GARCH_mu_hat = GJRGARCH11SVI_mu_hat_par,

                                            pre_GARCH_sigma_hat = GJRGARCH11SVI_sigma_hat_par,

                                            pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

                                            pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

stopCluster(cl)

    	
            # C&D's model according to the GJRGARCH11SVI Model:

GJRGARCH11SVI_CandD_par = (1:(roll-1))

for(i in c(1:(roll-1)))

{GJRGARCH11SVI_CandD_par[i] = 1 - ((GJRGARCH11SVI_CnDprocess_par[[i]][[2]][i])/

                                   length(na.omit(GJRGARCH11SVI_CnDprocess_par[[i]][[2]])))}

GJRGARCH11SVI_CandD_zoo_par = zoo(GJRGARCH11SVI_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))



# This zoo object has missing values, so we can remove them:

GJRGARCH11SVI_CandD_zoo_no_nas_par = na.omit(GJRGARCH11SVI_CandD_zoo_par)

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.10.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.10.RData")

EGARCH11's C&D Model

Series

Computing in series the values $\hat{\pi}_{C\&D_{EGARCH(1,1)}, t+1} $

    	
            EGARCH11_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],

                                              GARCH_mu_hat = EGARCH11_mu_hat_ser,

                                              GARCH_sigma_hat = EGARCH11_sigma_hat_ser)

    	
            EGARCH11_CandD_I_k_ser = EGARCH11_CandD_process_ser[[1]]

EGARCH11_CandD_I_ser = EGARCH11_CandD_process_ser[[2]]

EGARCH11_CandD_ser = EGARCH11_CandD_process_ser[[3]]

EGARCH11_CandD_zoo_ser = zoo(EGARCH11_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

EGARCH11_CandD_zoo_no_nas_ser = zoo(na.omit(EGARCH11_CandD_zoo_ser))

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.11.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.11.RData")

Parallel

Computing in parallel the values $\hat{\pi}_{C\&D_{EGARCH(1,1)}, t+1} $

    	
            no_of_clusters = detectCores(logical = FALSE)

cl = makeCluster(no_of_clusters)



EGARCH11_CnDprocess_par = clusterApply(cl,

                                       x = 1:roll, # t

                                       fun = CnDprocess_t_par,

                                       regressand = df[, c("SPX_R")],

                                       pre_GARCH_mu_hat = EGARCH11_mu_hat_par,

                                       pre_GARCH_sigma_hat = EGARCH11_sigma_hat_par,

                                       pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

                                       pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

stopCluster(cl)

    	
            # C&D's model according to the EGARCH11 Model:

EGARCH11_CandD_par = (1:(roll-1))

for(i in c(1:(roll-1)))

{EGARCH11_CandD_par[i] = 1 - ((EGARCH11_CnDprocess_par[[i]][[2]][i])/

                              length(na.omit(EGARCH11_CnDprocess_par[[i]][[2]])))}

EGARCH11_CandD_zoo_par = zoo(EGARCH11_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))



# This zoo object has missing values, so we can remove them:

EGARCH11_CandD_zoo_no_nas_par = na.omit(EGARCH11_CandD_zoo_par)

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.12.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.12.RData")

EGARCH11SVI's C&D Model

Series

Computing in series the values $\hat{\pi}_{C\&D_{EGARCH(1,1)-SVI}, t+1} $

    	
            EGARCH11SVI_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],

                                              GARCH_mu_hat = EGARCH11SVI_mu_hat_ser,

                                              GARCH_sigma_hat = EGARCH11SVI_sigma_hat_ser)

    	
            EGARCH11SVI_CandD_I_k_ser = EGARCH11SVI_CandD_process_ser[[1]]

EGARCH11SVI_CandD_I_ser = EGARCH11SVI_CandD_process_ser[[2]]

EGARCH11SVI_CandD_ser = EGARCH11SVI_CandD_process_ser[[3]]

EGARCH11SVI_CandD_zoo_ser = zoo(EGARCH11SVI_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))

EGARCH11SVI_CandD_zoo_no_nas_ser = zoo(na.omit(EGARCH11SVI_CandD_zoo_ser))

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.13.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.13.RData")

Parallel

Computing in parallel the values $\hat{\pi}_{C\&D_{EGARCH(1,1)-SVI}, t+1} $

    	
            no_of_clusters = detectCores(logical = FALSE)

cl = makeCluster(no_of_clusters)



EGARCH11SVI_CnDprocess_par = clusterApply(cl,

                                          x = 1:roll, # t

                                          fun = CnDprocess_t_par,

                                          regressand = df[, c("SPX_R")],

                                          pre_GARCH_mu_hat = EGARCH11SVI_mu_hat_par,

                                          pre_GARCH_sigma_hat = EGARCH11SVI_sigma_hat_par,

                                          pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),

                                          pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))

stopCluster(cl)

    	
            # C&D's model according to the EGARCH11SVI Model:

EGARCH11SVI_CandD_par = (1:(roll-1))

for(i in c(1:(roll-1)))

{EGARCH11SVI_CandD_par[i] = 1 - ((EGARCH11SVI_CnDprocess_par[[i]][[2]][i])/

                                 length(na.omit(EGARCH11SVI_CnDprocess_par[[i]][[2]])))}

EGARCH11SVI_CandD_zoo_par = zoo(EGARCH11SVI_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))



# This zoo object has missing values, so we can remove them:

EGARCH11SVI_CandD_zoo_no_nas_par = na.omit(EGARCH11SVI_CandD_zoo_par)

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.14.1.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.14.1.RData")

Comparing the predictive performance of each model:

Forecast error mean, sd (standard deviation), Brier Scores and Diebold&Mariano statistics based on results from the parallel methods used above:

Mean of the probabilistic forecast errors derived from each model letting $ \pi_{Observed,t} = I(R_{SPX,t} > 0) = \begin{Bmatrix} 1 & \text{if } R_{SPX,t} > 0 \\ 0 & \text{otherwise} \end{Bmatrix} $ and $ \mathbf{Observed\_pi} = $ $\left[ \begin{matrix} \pi_{Observed,1} \\ \pi_{Observed,2} \\ \vdots\\ \pi_{Observed,T_{Observed\_pi}} \end{matrix} \right] $ = $\left[ \begin{matrix} I(R_{SPX,1} > 0) \\ I(R_{SPX,2} > 0) \\ \vdots\\ I(R_{SPX,T_{Observed\_pi}} > 0) \end{matrix} \right] $

    	
            Observed_pi = ifelse(matrix(df[, c("SPX_R")])>0,1,0)

Observed_pi_zoo = zoo(Observed_pi, index(df[, c("SPX_R")]))

To asses the accuracy of volatility and positive sign probability forecasts, this sub-section specifies error functions.

Variance forecasts from the models GARCH to EGARCH-SVI are simply compared against realised volatilities to compute variance forecast errors as $$ \hat{\sigma}_{t|t-1} - RV_t \text{ .}$$

Sign forecast errors are computed as

$$\begin{equation} \label{eq:SignForecastErrors} \hat{\pi}_{j, t+1} - \mathbb{P}(R_{t+1} > 0 | \Omega_{t \textbf{+1}}) = \hat{\pi}_{j, t+1} - \pi_{Observed,t+1} \end{equation}$$

where $j = (C\&D_M, Naive)$, and where $M$ can be replaced by a model name ranging from GARCH to EGARCH-SVI (Note the bold `$+1$' in $\Omega_{t \textbf{+1}}$ here to emphasise how $\mathbb{P}(R_{t+1}>0|\Omega_{t \textbf{+1}})=I(R_{t+1}>0)$.)

    	
            Naive_pi_error = Naive_zoo - Observed_pi_zoo

GARCH11_pi_error = GARCH11_CandD_zoo_no_nas_par - Observed_pi_zoo

GARCH11SVI_pi_error = GARCH11SVI_CandD_zoo_no_nas_par - Observed_pi_zoo

GJRGARCH11_pi_error = GJRGARCH11_CandD_zoo_no_nas_par - Observed_pi_zoo

GJRGARCH11SVI_pi_error = GJRGARCH11SVI_CandD_zoo_no_nas_par - Observed_pi_zoo

EGARCH11_pi_error = EGARCH11_CandD_zoo_no_nas_par - Observed_pi_zoo

EGARCH11SVI_pi_error = EGARCH11SVI_CandD_zoo_no_nas_par - Observed_pi_zoo

    	
            rbind(Statistics_Table("Naive_pi_error", na.omit(Naive_pi_error)),

      Statistics_Table("GARCH11_pi_error", na.omit(GARCH11_pi_error)),

      Statistics_Table("GARCH11SVI_pi_error", na.omit(GARCH11SVI_pi_error)),

      Statistics_Table("GJRGARCH11_pi_error", na.omit(GJRGARCH11_pi_error)),

      Statistics_Table("GJRGARCH11SVI_pi_error", na.omit(GJRGARCH11SVI_pi_error)),

      Statistics_Table("EGARCH11_pi_error", na.omit(EGARCH11_pi_error)),

      Statistics_Table("EGARCH11SVI_pi_error", na.omit(EGARCH11SVI_pi_error)))

A data.table: 7 × 9
Data_Set	Mean	Mean_of_Absolute_Values	Standard_Deviation	Median	Skewness	Kurtosis	ACF_lag_0	ACF_lag_1
<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
Naive_pi_error	-0.544358428	0.5445473	0.4980632	-0.9996686	0.1790524	-1.968977	1	-0.06672948
GARCH11_pi_error	0.011770315	0.4943595	0.4984904	-0.4006568	0.1766350	-1.951251	1	-0.02481175
GARCH11SVI_pi_error	0.009742772	0.4943771	0.4983300	-0.4033970	0.1779972	-1.951288	1	-0.02831325
GJRGARCH11_pi_error	0.006156740	0.4948573	0.4982681	-0.4130435	0.1777539	-1.954735	1	-0.03268875
GJRGARCH11SVI_pi_error	0.006146078	0.4947756	0.4982166	-0.4111748	0.1777965	-1.954254	1	-0.03216385
EGARCH11_pi_error	0.006432536	0.4947300	0.4982087	-0.4115816	0.1778832	-1.954040	1	-0.03242612
EGARCH11SVI_pi_error	0.006283576	0.4946241	0.4982143	-0.4077604	0.1776422	-1.952094	1	-0.02887203

Only the 'Mean_of_Absolute_Values' and 'Standard_Deviation' really are of interest here. We are looking for the lowest absolute mean and standard deviation. The SVI-implementing model's results are seen to perform better than their SVI-free counterparts (bar the GACH model's). Apart from that, little at this time can be concluded - unfortunately.

Brier scores of the probabilistic forecast errors derived from each model

    	
            Brier_Score_Table = function(name, data)

{as.data.table(list(Data_Set = name, Brier_Score = (1/length(data))*sum(data^2)))}

    	
            rbind(Brier_Score_Table("Naive_pi_error", na.omit(Naive_pi_error)),

      Brier_Score_Table("GARCH11_pi_error", na.omit(GARCH11_pi_error)),

      Brier_Score_Table("GARCH11SVI_pi_error", na.omit(GARCH11SVI_pi_error)),

      Brier_Score_Table("GJRGARCH11_pi_error", na.omit(GJRGARCH11_pi_error)),

      Brier_Score_Table("GJRGARCH11SVI_pi_error", na.omit(GJRGARCH11SVI_pi_error)),

      Brier_Score_Table("EGARCH11_pi_error", na.omit(EGARCH11_pi_error)),

      Brier_Score_Table("EGARCH11SVI_pi_error", na.omit(EGARCH11SVI_pi_error)))

A data.table: 7 × 2
Data_Set	Brier_Score
<chr>	<dbl>
Naive_pi_error	0.5443277
GARCH11_pi_error	0.2485609
GARCH11SVI_pi_error	0.2483573
GJRGARCH11_pi_error	0.2482388
GJRGARCH11SVI_pi_error	0.2481874
EGARCH11_pi_error	0.2481831
EGARCH11SVI_pi_error	0.2481868

The higher the Brier score, the worst the model did. Expectedly, the naive model did worst. Interestingly, SVI-implementing models always outperformed SVI-free ones (apart from the EGARCH case).

Diebold & Mariano statistics

Here the alternative hypothesis is that method 2 is more accurate than method 1; remember that a small p-value indicates strong evidence against the Null Hypothesis.

    	
            GARCH11_pi_error_dm = GARCH11_pi_error - (GARCH11SVI_pi_error*0)

GARCH11SVI_pi_error_dm = GARCH11SVI_pi_error - (GARCH11_pi_error*0)

GARCH11_dm_test = dm.test(matrix(GARCH11_pi_error_dm), matrix(GARCH11SVI_pi_error_dm), alternative = "greater")

    	
            show(GARCH11_dm_test)

Diebold-Mariano Test

data: matrix(GARCH11_pi_error_dm)matrix(GARCH11SVI_pi_error_dm)
DM = 1.8532, Forecast horizon = 1, Loss function power = 2, p-value = 0.03197
alternative hypothesis: greater

    	
            GJRGARCH11_pi_error_dm = GJRGARCH11_pi_error - (GJRGARCH11SVI_pi_error*0)

GJRGARCH11SVI_pi_error_dm = GJRGARCH11SVI_pi_error - (GJRGARCH11_pi_error*0)

GJRGARCH11_dm_test = dm.test(matrix(GJRGARCH11_pi_error_dm), matrix(GJRGARCH11SVI_pi_error_dm), alternative = c("greater"))

    	
            show(GJRGARCH11_dm_test)

Diebold-Mariano Test

data: matrix(GJRGARCH11_pi_error_dm)matrix(GJRGARCH11SVI_pi_error_dm)
DM = 0.7234, Forecast horizon = 1, Loss function power = 2, p-value = 0.2347
alternative hypothesis: greater

    	
            EGARCH11_pi_error_dm = EGARCH11_pi_error - (EGARCH11SVI_pi_error*0)

EGARCH11SVI_pi_error_dm = EGARCH11SVI_pi_error - (EGARCH11_pi_error*0)

EGARCH11_dm_test = dm.test(matrix(EGARCH11_pi_error_dm), matrix(EGARCH11SVI_pi_error_dm), alternative = c("greater"))

    	
            show(EGARCH11_dm_test)

Diebold-Mariano Test

data: matrix(EGARCH11_pi_error_dm)matrix(EGARCH11SVI_pi_error_dm)
DM = -0.042896, Forecast horizon = 1, Loss function power = 2, p-value = 0.5171
alternative hypothesis: greater

The only D-M test with a small enough p-value to allow for a statistically significant conclusion is the GARCH's (at the 5% Confidence Level / 95% Significance Level). It shows a slight preference for the SVI-implementing model.

Saving our data this far to help with de-bugging if needed:

    	
            save.image(file = "IDaSRP_work_space_p4.15.RData")

# # To restore your workspace, type this:

# load("IDaSRP_work_space_p4.15.RData")