Time Series and LSTM Models | Comparison | Stock Price Prediction

This challenge is about comparative research of time-series evaluation strategies and ML strategies from the angle of Stock/Index value prediction. Initial evaluation was performed utilizing time-series modeling strategies eg. ARMA, ARIMA, and many others. adopted by the evaluation of various ML fashions to foretell the following day inventory/index value.

After in depth theoretical research of various ANN fashions and based mostly on enter from a mentor, the LSTM mannequin was finalized. Different optimization parameters and strategies on varied analysis standards eg accuracy and precision have been analyzed.

Both the prediction strategies i.e. ARIMA and LSTM gave nearly comparable outcomes by way of analysis of predicted value, however it was discovered that ARIMA predictions are extra correct i.e., much less RMSE for predicted value.

This article is the ultimate challenge submitted by the writer as part of his coursework within the Executive Program in Algorithmic Trading (EPAT) at QuantInsti. Do test us Projects page and take a look at what our college students are constructing.


About the writer

With over 15 years within the IT trade, Ashish Jain is a seasoned skilled specializing in Treasury and Investment Banking. Currently a Senior Manager at Adenza, Ashish oversees technical features of a number of Calypso Implementation and Upgrade tasks. Holding a Master’s Degree in Computer Science from Thapar University, Ashish blends tutorial excellence with sensible experience, driving success in IT options and challenge administration.


Project Abstract

This challenge is about comparative research of time-series evaluation strategies and ML strategies from perspective of Stock/Index value prediction. After analysis of efficiency of predicted value, low frequency buying and selling technique may be constructed utilizing finest mannequin or mixture of a number of fashions.


Introduction

I’m new to buying and selling world however have good expertise with programming. My motivation was to construct respectable buying and selling technique for low-frequency buying and selling utilizing deep studying applied sciences or time-series modeling. Initial thought was to make use of time-series mannequin output as enter to machine studying fashions.

But, based mostly on dialogue with my mentor, I understood that it is higher to develop comparative research of each fashions as each are altogether totally different strategies. Once inventory value prediction is obtainable with affordable accuracy, the technique may be in-built quite a lot of methods.


Data Mining

I had a major problem in sourcing high-quality information. As a retail dealer, I had a restricted finances and I settled on yahoo finance which was giving me adjusted closing costs. Also, as per dialogue with mentor, Yahoo Finance is superb possibility for this sort of comparative research and for scope of low frequency buying and selling.

But for High Frequency Trading (HFT), paid information can be utilized to make sure high quality and consistency of information.

I did take into account downloading straight from the National Stock Exchange (NSE). The preliminary outcomes have been promising however the python wrapper to do that effectively didn’t work constantly therefore I’ve elected to make use of yahoo finance.

I’ve used 5 years of historical data for my challenge. I might return additional however given the adjustments in India, I didn’t see the worth in coaching a mannequin on outdated information.


information evaluation

Initially I began my evaluation utilizing time-series modeling strategies. As time-series modeling requires sequence to be stationary and I used Nifty Index value for my evaluation which isn’t stationary. So, I thought of utilizing return sequence because it was stationary in response to Augmented Dickey Fuller Test (ADF) and ACF/PACF evaluation.

But the prediction of return value and utilizing it to construct technique brings its personal complexity. So, I dropped the concept of ​​utilizing return sequence.

Then I discovered that differing by order of 1 made Nifty value sequence stationary, so I thought of utilizing ARIMA mannequin for value forecasting. To get finest mannequin, want to seek out finest values ​​for AR and MA parameters.

There are two totally different approaches to get similar:

  • Based on vital lags from ACF and PACF chart
  • Using AIC Score standards for various combos of parameters.

As you retain on growing lag numbers, the mannequin turns into computationally intensive and takes an excessive amount of of time to supply outcomes eg, it takes round 20 minutes for 10 years of information. So, I stored parameters ranges in vary of 1-8.

First, I discovered preliminary path params from ACF/PACF after which I used AIC rating standards to get prime three set of params. Then I carried out trial/error for these three params and located finest param set ie, 6,1,2.

I evaluated the mannequin based mostly on two standards:

  • Root Mean Squared Error/Mean Absolute Percentage Error
  • Price Direction prediction

Below are the outcomes of finest performing time-series mannequin:

  • The Mean Absolute Error is 109.84
  • The Mean Squared Error is 20378.46
  • The Root Mean Squared Error is 142.75
  • The Mean Absolute Percentage Error is 0.63

Direction Prediction (1 signifies right course and 0 signifies fallacious course prediction):

Then I began evaluation of ML mannequin to foretell subsequent day inventory/index value. My mentor supplied a lot helpful data/steerage over right here and recommended to discover totally different ANN fashions, ML strategies earlier than finalizing any ANN mannequin. After in depth theoretical research of various ANN fashions and based mostly on inputs from mentor, I made a decision to go forward with LSTM mannequin.

Initially I constructed single variate LSTM mannequin which was taking solely previous value as enter. Then based mostly on inputs from mentor, I added some widespread technical indicators as enter options to my LSTM mannequin, eg.

I used MinMax scaler for normalization of varied enter parameters/options. I used LSTM mannequin with 5 layers. I attempted totally different activation capabilities eg, ‘tanh’, ‘relu’ based mostly on theoretical research in addition to trial/error. I acquired one of the best efficiency for tanh as enter/middle-layer activation capabilities and ‘linear’ as output activation operate. As per suggestion from mentor, I additionally tried totally different epochs and batch sizes.

I acquired finest efficiency for epochs of measurement 100 with early stopping and batch measurement of 10. I used 5 years of historic information and practice/take a look at cut up was 80:20.

The finest mannequin efficiency was as beneath:

  • The Mean Absolute Error is 151.99
  • The Mean Squared Error is 34575.89
  • The Root Mean Squared Error is 185.95
  • The Mean Absolute Percentage Error is 0.82

Direction Prediction (1 signifies right course and 0 signifies fallacious course prediction):


Key Findings

The key findings from this challenge are as beneath:

  1. Both prediction strategies i.e., ARIMA and LSTM gave nearly comparable outcomes by way of analysis of predicted value, however ARIMA predictions are extra correct i.e., much less RMSE.
  2. There is main distinction in directional prediction accuracy. LSTM is method higher than ARIMA however nonetheless it is lower than 50%, so not helpful virtually.
  3. Higher variety of Lags can be utilized to additional enhance efficiency of ARIMA mannequin supplied we’ve got enough computing assets obtainable.
  4. There are n variety of parameters ie, activation operate, variety of neurons, variety of layers, optimization capabilities, variety of epochs, batch sizes and many others. for LSTM mannequin. So finest method is to restrict vary of varied parameters based mostly theoretical research or analysis obtainable.
  5. But there isn’t any particular logic so trial/error for various parameters is necessary/really useful to reach at finest mannequin.
  6. It is vital to shift predicted value earlier than evaluating it with subsequent day shut value in any other case mannequin efficiency will likely be higher on account of look forward bias issue.
  7. Once mannequin is finalized, it must be saved on native file system eg, utilizing pickle. It may be reused for again testing for various units of information to keep away from lengthy computation time each time.
  8. My evaluation was restricted to Nifty value sequence information, however mannequin can be utilized for different shares/indexes and fine-tuned for higher efficiency.

challenges

  1. As time-series fashions are computationally intensive, it is virtually unattainable to check greater lag values ​​on native machine having restricted assets.
  2. Python IDEs eg., Spyder offers higher efficiency in comparison with Jupyter Notebook,
  3. The sourcing of high-quality information is difficult and doubtlessly costly.
  4. Building a Trading Strategy Using predicted value shouldn’t be totally explored. I did vectorized backtesting of LSTM mannequin for NIFTY 10 years information and it produced 13% CAGR for lengthy solely buying and selling technique.
  5. As technique assumes shopping for at at present’s Close value, which is virtually not potential, so program ought to take obtainable shut value earlier than 5mins. of market closing for reside buying and selling.
  6. Event based mostly again testing can be utilized to additional refine mannequin, entry and exit standards of buying and selling technique eg, cease loss, trailing cease loss, reserving revenue and many others…
  7. I used easy technique by producing purchase sign if predicted value is greater by 1% of at present’s shut value. Different forms of methods eg., utilizing each Buy/Sell Signals, taking present day Open value as Input to foretell at present’s Close value, predicting course solely and many others. may be evaluated for higher returns.

Implementation Methodology

  • The challenge has been examined with the nifty50 index information from yahoo finance. It was not potential to check the challenge with dealer equipped information on account of restricted finances.
  • All applications are developed in Python utilizing Jupyter pocket book and Spyder IDE

Conclusion

It is feasible for a retail dealer to construct an efficient technique that makes use of machine studying or time-series modeling. Careful characteristic choice and have engineering are wanted to start to make use of the technique in a manufacturing surroundings. Extensive Trial-error is really useful to check efficiency for various mixture of mannequin params.

If you too need to be taught varied features of Algorithmic buying and selling then take a look at our algo trading course which covers coaching modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT equips you with the required talent units to construct a promising profession in algorithmic buying and selling. Enroll now!


Annexure/Codes

Below Python recordsdata are hooked up and their performance in short is as beneath:

  • timeseries_nifty_arima.py – It makes use of ARIMA mannequin for time-series based mostly forecasting.
  • timeseries_nifty_return.py – It makes use of ARIMA mannequin for return prediction.
  • aic_score.ipynb – It has logic to determine finest ARIMA mannequin based mostly on AIC rating.
  • Nifty_lstm.py – It makes use of single-variate LSTM mannequin which has solely Close value as enter to mannequin for nifty value prediction.
  • nifty_lstm_model_reuse.py – It takes widespread tech. indicators as enter options and makes use of LSTM mannequin for nifty value prediction. It additionally incorporates code for fundamental technique based mostly on predicted value and its back-testing for final 10 years of information.
  • nifty_18_09.csv – It incorporates nifty value information for final 5 years which is downloaded from yahoo finance.
  • tsa_functions_quantra.py – It’s supplied by QuantInsti and incorporates generally required utility capabilities eg, analyzing technique efficiency and mannequin analysis.


Bibliography

Web hyperlinks reference used for Time Series Modelling:

Web hyperlinks reference used for Machine Learning:

Udemy Courses:

  • Deep Learning: Recurrent Neural Networks in Python
  • time-series-analysis-an


Disclaimer: The data on this challenge is true and full to one of the best of our scholar’s data. All suggestions are made with out assure on the a part of the coed or QuantInsti, The scholar and QuantInsti disclaim any legal responsibility in reference to using this data. All content material supplied on this challenge is for informational functions solely and we don’t assure that through the use of the steerage you’ll derive a sure revenue.

Source link

#Time #Series #LSTM #Models #Comparison #Stock #Price #Prediction