Can We Use MARS Multivariate Adaptive Regression Splines In Day Trading?

In a previous post, we talked about how to use neural networks in predicting the weekly candles. We used an Elman Recurrent Neural Network. The problem that we encountered was with the neural network getting stuck up with a local minima. R took 20 minutes to do the calculations.  20 minutes is a pretty long time in day trading.  We cannot use a machine learning algorithm that takes 20 minutes to predict the next candle. What we need is a machine learning algorithm that takes something like 10-20 seconds to make the calculations so that we can use it in day trading. In this post we will discuss whether we can use Regression Splines in day trading. Watch this $9500 simple price action trade video.

What Is Regression?

In order to understand what a Regression Spline is, we will first need to understand what regression is. You must have heard about regression in your college or high school. So hopefully most of us know what regression is. Regression is when you try to fit a straight line to a data.  The best line is the one that minimizes the sum of squared errors. Errors are the difference between the predicted value and the real value. Errors are also known as Residuals. Residuals should have a special property of being white noise. If the residuals are not white noise it means that there is still some information that is not being captured by the model. So we should look for more missing factors to include in the regression model. Still confused? Watch this 14 minute video below that explains in simple terms what regression is!

Now things get a bit complex here. As traders we only have price data with us. This price data can be open, high, low and close for 1 minute, 5 minute, 15 minute, 30 minute, 1 hour, 4 hour, daily, weekly and monthly. So we only have OHLC data in the form of a time series. A time series is formed when you measure data at some fixed time interval. This is precisely what we have when we download price data in form of a csv file. We have open, high, low and close of price measured at some fixed interval. So when we deal with price data, we are dealing with a time series data.

The best method to deal with time series data is to check correlation of recent values with past values. This is known as finding the autocorrelation. This autocorrelation is an important property that tell us how much information past values have about future values. Did you read this past values have information about future values? Yes this is exactly what we are doing in technical analysis when we look at charts and look for patterns that can predict the future.  So what we are trying to do that same thing using 2 different methods one based on technical analysis and another based on statistical analysis. If both methods confirm each other we have a high probability of making the right buy/sell decision.

What Are Multivariate Adaptive Regression Splines?

The problem with regression is that most of the time we are trying to fit a linear model to a problem that is non linear. Price does not follow a straight line. We all know this through our trading experience. So the best model that can fit is non linear. We achieve this by using splines instead of straight lines when using regression in our model. Watch the video that explains spline regression basics.

Using MARS Multivariate Regression Splines in Day Trading

It is assumed now that you have become familiar with regression splines. As you told in the above video splines are piecewise polynomial fitting to the data. The problem with Elman Neural Network as said in the beginning is that they take too much time calculating the model. The predictions are also not that accurate. So we cannot use them in day trading. Now you don’t need to code anything. Everything has been done for you in R MDA package. MDA R package implements MARS pretty fast. In the R code below we fit a lagged model of daily returns in predicting tomorrow’s return. Once we have the return we calculate the daily close price. Return is simply the difference between the logs of close prices of two consecutive days. This is also known as log returns. Log returns are approximately close to the actual percentage returns. They have useful properties that help us in our model. For example. we can add 5 daily returns to obtain the total return for 5 days. This additive property is useful in modelling. It is assumed that you have installed R on your computer.

> # Import the csv file
> quotes <- read.csv("E:/MarketData/GBPUSD1440.csv", header=FALSE)
> 
> 
> 
> # load quantmod package
> library(quantmod)
> 
> #convert the data frame into an xts object
> quotes <- as.ts(quotes)
> 
> 
> #convert time series into a zoo object
> quotes1 <- as.zoo(quotes)
> 
> 
> quotes1 <- log(quotes1[, -(1:2)])
> 
> ## Predict 6 period return
> #calculate log returns
> 
> 
> lr <- diff(quotes1)
> 
> 
> 
> 
> 
> #number of rows in the dataframe
> x <-nrow(lr)
> 
> y <-nrow(quotes)
> 
> # lag the data
> x1 <- lag(lr, k=-1, na.pad=TRUE)
> x2 <- lag(lr, k=-2, na.pad=TRUE)
> x3 <- lag(lr, k=-3, na.pad=TRUE)
> x4 <- lag(lr, k=-4, na.pad=TRUE)
> x5 <- lag(lr, k=-5, na.pad=TRUE)
> x6 <- lag(lr, k=-6, na.pad=TRUE)
> x7 <- lag(lr, k=-7, na.pad=TRUE)
> x8 <- lag(lr, k=-8, na.pad=TRUE)
> x9 <- lag(lr, k=-9, na.pad=TRUE)
> x10 <- lag(lr, k=-10, na.pad=TRUE)
> 
> 
> 
> # combine all the above matrices into one matrix having close prices
> CQuotes <- cbind (x1[ ,4], x2[ ,4], x3[ ,4], x4[ ,4], 
+                   x5[ ,4], x6[ ,4], x7[ ,4], x8[ ,4], 
+                   x9[ ,4], x10[ ,4], lr[,4])
> 
> 
> 
> library(mda)
> m <- mars(CQuotes[(x-1000):(x-1),1:10], CQuotes[(x-1000):(x-1),11])
> m.preds <- predict(m,CQuotes[x:(x+1),1:10])
> m.preds
              [,1]
[1,]  0.0006352749
[2,] -0.0008504523
> 
> 
> exp(m.preds[1])*quotes[y-1,6]
      V6 
1.324801

In the above model, the predicted close for Friday was 1.324801 and the actual close was 1.3013. So you can see we have a bad prediction. Thursday close was 1.32396.The prediction is just a few pips different from Thursday. Did you read our post on how to make 200 pips daily with a small stop loss of 20 pips? We need a model that gives a good prediction. A difference of 30-50 pips between the actual and the predicted price is acceptable to us. Let’s do the calculations for NZDUSD and try to predict the daily closing price.

> # Import the csv file
> quotes <- read.csv("E:/MarketData/NZDUSD1440.csv", header=FALSE)
> 
> 
> 
> # load quantmod package
> library(quantmod)
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
> 
> #convert the data frame into an xts object
> quotes <- as.ts(quotes)
> 
> 
> #convert time series into a zoo object
> quotes1 <- as.zoo(quotes)
> 
> 
> quotes1 <- log(quotes1[, -(1:2)])
> 
> ## Predict 6 period return
> #calculate log returns
> 
> 
> lr <- diff(quotes1)
> 
> 
> 
> 
> 
> #number of rows in the dataframe
> x <-nrow(lr)
> 
> y <-nrow(quotes)
> 
> # lag the data
> x1 <- lag(lr, k=-1, na.pad=TRUE)
> x2 <- lag(lr, k=-2, na.pad=TRUE)
> x3 <- lag(lr, k=-3, na.pad=TRUE)
> x4 <- lag(lr, k=-4, na.pad=TRUE)
> x5 <- lag(lr, k=-5, na.pad=TRUE)
> x6 <- lag(lr, k=-6, na.pad=TRUE)
> x7 <- lag(lr, k=-7, na.pad=TRUE)
> x8 <- lag(lr, k=-8, na.pad=TRUE)
> x9 <- lag(lr, k=-9, na.pad=TRUE)
> x10 <- lag(lr, k=-10, na.pad=TRUE)
> 
> 
> 
> # combine all the above matrices into one matrix having close prices
> CQuotes <- cbind (x1[ ,4], x2[ ,4], x3[ ,4], x4[ ,4], 
+                   x5[ ,4], x6[ ,4], x7[ ,4], x8[ ,4], 
+                   x9[ ,4], x10[ ,4], lr[,4])
> 
> 
> 
> library(mda)
Loading required package: class
Loaded mda 0.4-8

> m <- mars(CQuotes[(x-1000):(x-1),1:10], CQuotes[(x-1000):(x-1),11])
> m.preds <- predict(m,CQuotes[x:(x+1),1:10])
> m.preds
              [,1]
[1,]  0.0007360136
[2,] -0.0002993448
> 
> 
> exp(m.preds[1])*quotes[y-1,6]
       V6 
0.7320886

The MARS model predicted 0.73208 and the actual price is also 0.72658. Can we use this model to predict the close of the weekly candle. Let’s run our MARS model and see how accurately it can predict the weekly close.

> # Import the csv file
> quotes <- read.csv("E:/MarketData/GBPUSD10080.csv", header=FALSE)
> 
> 
> 
> # load quantmod package
> library(quantmod)
> 
> #convert the data frame into an xts object
> quotes <- as.ts(quotes)
> 
> 
> #convert time series into a zoo object
> quotes1 <- as.zoo(quotes)
> 
> 
> quotes1 <- log(quotes1[, -(1:2)])
> 
> ## Predict 6 period return
> #calculate log returns
> 
> 
> lr <- diff(quotes1)
> 
> 
> 
> 
> 
> #number of rows in the dataframe
> x <-nrow(lr)
> 
> y <-nrow(quotes)
> 
> # lag the data
> x1 <- lag(lr, k=-1, na.pad=TRUE)
> x2 <- lag(lr, k=-2, na.pad=TRUE)
> x3 <- lag(lr, k=-3, na.pad=TRUE)
> x4 <- lag(lr, k=-4, na.pad=TRUE)
> x5 <- lag(lr, k=-5, na.pad=TRUE)
> x6 <- lag(lr, k=-6, na.pad=TRUE)
> x7 <- lag(lr, k=-7, na.pad=TRUE)
> x8 <- lag(lr, k=-8, na.pad=TRUE)
> x9 <- lag(lr, k=-9, na.pad=TRUE)
> x10 <- lag(lr, k=-10, na.pad=TRUE)
> 
> 
> 
> # combine all the above matrices into one matrix having close prices
> CQuotes <- cbind (x1[ ,4], x2[ ,4], x3[ ,4], x4[ ,4], 
+                   x5[ ,4], x6[ ,4], x7[ ,4], x8[ ,4], 
+                   x9[ ,4], x10[ ,4], lr[,4])
> 
> 
> 
> library(mda)
> m <- mars(CQuotes[(x-1000):(x-1),1:10], CQuotes[(x-1000):(x-1),11])
> m.preds <- predict(m,CQuotes[x:(x+1),1:10])
> m.preds
              [,1]
[1,] -2.168895e-03
[2,] -7.088972e-05
> 
> 
> exp(m.preds[1])*quotes[y-1,6]
      V6 
1.324544

You can see above the predicted weekly closing price is 1.32027. The actual closing price for the week was 1.3013. So our model is not making good predictions for the weekly close? Why? The inputs that we have used for the model is the lagged returns. Maybe with different inputs we can get a good prediction. Let’s change the inputs to RSI, MACD, William %age R Indicator and see what prediction we get.

> #import the data
> 
> data <- read.csv("E:/MarketData/GBPUSD1440.csv", header = FALSE)
> 
> colnames(data) <- c("Date", "Time", "Open", "High",
+                     "Low", "Close", "Volume")
> 
> library(quantmod)
> 
> data2 <- as.xts(data[,-(1:2)], as.Date(paste(data[,1]),format='%Y.%m.%d'))
> 
> data2 <- data2[, -2]
> 
> data2$rsi <- RSI(data2$Close)
> data2$MACD <- MACD(data2$Close)
> data2$will <- williamsAD(data2[,3:5])
> data2$cci <-  CCI(data2[,3:5])
> data2$STOCH <- stoch(data2[,3:5])
> 
> 
> data2$Return <- diff(log((data2$Close)))
> 
> 
> data1 <- data2[,-(1:4)]
> 
> data1 <- data1[, -(6:8)]
> 
> x <- nrow(data2)
> 
> library(mda)
> m <- mars(data1[(x-1000):(x-1),1:5], data1[(x-1000):(x-1),6])
> m.preds <- predict(m,data1[x,1:5])
> m.preds
             [,1]
[1,] -0.005505432
> 
> 
> exp(m.preds[1])*data[x-1,6]
[1] 1.316691

Now the model is predicting 1.316691 as the daily close while actual close is 1.3013. So we cannot use this model as well. Did you read our post on how to make 1000 pips per month with low risk? The predicted results are highly dependent on your inputs. By changing the inputs we have been able to get a better prediction but still we are 160 pips away from the actual. We cannot give you the actual input model but if you continue working you can improve the predictions. Good Luck!