Impact of Financial Ratios and Technical Analysis On Stock Price Prediction Using Random Forests [PDF]

  • 0 0 0
  • Suka dengan makalah ini dan mengunduhnya? Anda bisa menerbitkan file PDF Anda sendiri secara online secara gratis dalam beberapa menit saja! Sign Up
File loading please wait...
Citation preview

Impact of Financial Ratios and Technical Analysis on Stock Price Prediction Using Random Forests Loke K.S. Faculty of Engineering, Computing and Science. Swinburne University of Technology Sarawak Campus Sarawak, Malaysia [email protected]



Abstract— A stock movement prediction method is presented using quarterly financial ratio data from Hong Kong companies from the period, 2011-2014. We found that the accuracy of price movement prediction using Random Forest method over multiple quarters to be fairly weak. However we were able to predict with high accuracy in the last quarter of 2014 and not in other years. We attribute this not to the superiority of the method but to the non-stationary nature of the price signals.



returns. Lewellen [9] also found that dividend yield predicts market returns. However, Goyal and Welch [10] argues that the evidence was too weak. Lau et al [11] found earnings- price ratio relationship to market returns to be conditional. Many of these studies (as above) used statistical analysis, regression and ordinary least squares to find the relationship between price and the financial ratios. The application of artificial intelligence and machine learning was not widespread.



Keywords— Stock Prediction, Stock market, Random Forest, Financial Ratios



However in the field of technical analysis the use of machine learning methods were quite common. The use of evolutionary methods such as Genetic Algorithms [12], Swarm Optimization [13] and Evolutionary Learning [14] is also popular. Patel et al compared the use of Artificial Neural Networks, Support Vector Machines, Random Forests and Naïve Bayes on prediction stock movement direction [15]. Their data indicated good results with Random Forests. Similarly, Ballings et al [16] also compared a variety of ensemble algorithms against single classifiers models including Support Vector Machines, Neural Networks and Logistic Regression. They also concluded that Random Forest ensemble method should be used for stock price direction prediction. Ladyzynski et al [17] also used Random Forest for stock price trend detection. Eventhough they failed to generate a profitable trading strategy, they concluded that the artificial intelligence approach is promising. Ash et al [18] used a recency-weighted Random Forest to take into consideration seasonality which they claimed superior results. Given that a number of recent works have been optimistic on ensemble methods like Random Forest, we decided to adopt Random Forest in our tests.



I. INTRODUCTION The use of artificial intelligence and machine learning techniques to determine future trends of the stock market is an active research area. Even though the Efficient Market Hypothesis [1] posits that all relevant information are already reflected in the prices and impossible to outperform the market, this thesis has its critics. Some studies show that at very short time span, price movements can be predicted better than chance [2] . Others have found that news can effect price movements as well [3] [4] [5]. In this research we studied the effect of financial ratios on market price prediction using random forests methods which is a well-known method in machine learning introduced by Breiman [6]. This paper is organized as follows. In section 2, we will review some of the previous related works that used machine learning methods. We describe the Random Forest algorithm in section 3. Next in section 4, we will present our approach and methods. The experimental results are presented in section 5. Finally, the conclusion of our work is summarized in the last section. II.



III.



RANDOM FOREST



Random Forest is an ensemble classification algorithm that uses a collection of decision tree in combination. Random Forest was first introduced by Leo Breiman [6] following on the ideas of Amit et al [19] and Ho [20]. The method requires the random selection of features (or attributes) to split at each of the decision tree node. The random factor makes the individual trees uncorrelated. This makes the Random Forest robust to noise and resistant to over training. Each of the trees, at the end of the tree traversal, will cast a vote for the classification of the input class; the sum of the total vote that constitutes the majority will be the classification. A single random tree classifier will only have a slightly better than



LITERATURE REVIEW



There have been many empirical researches on the predictive power of financial ratios and many of the results are mixed. In his review, Hjalmarsson [7] has found that using dividend- and earnings-price ratios as regressors were sensitive to sample period and choice of frequency. His own research showed that traditional valuation methods such as dividendand earnings-price had very limited predictive power. Fama and French [8] have found weak evidence on dividend yield predictability on monthly (New York Stock Exchange) NYSE



978-1-5386-0765-7/17/$31.00 ©2017 IEEE



38



random classification but combining them as an ensemble can produce very much improved accuracy. A feature of Random Forest is that it does not overfit but will reach a limiting value of generalization error. IV.



2) Experiment B. Varied the attributes (columns) and predicted the returns in the Q3 and Q4 dataset. All those with incomplete data (with NA) were removed. a) Used Q1 2014 data b) Used Q1 2013 data 3) Experiment C. Varied the attributes (columns), trained with Q3 data and predict the returns in the Q4 dataset. This is called a one quarter walk forward test. a) Used combined 2013-2014 data



APPROACH



We used a dataset that consists of 433 companies listed in the Hong Kong Stock Exchange from 2011-2014. We calculate for each of the companies a set of 63 attributes in which the majority of them were related to financial ratios per quarter. For each quarter, the returns in the next quarter were also calculated based on historical data. For example, if the financial data was for quarter 1 (Q1), then the returns column would have the difference in price from the start till the end of the next quarter, i.e. Q2. However, not all the financial ratios were available for all companies, therefore some values were missing. The data format per quarter is as follows: [Companyid, ratio1, ratio2, ratio3, …, ratio62, next-quarter-returns]. So for Q1 data, the returns value is actually the next quarter returns, i.e. Q2. All data have been normalized to have zero mean and variance of 1.



b) Used combined 2011-2012 data Experiments A and B made use of the next-quarter-returns in creating the model. However that makes it not useful for actual prediction because the next-quarter-returns were required to create the model. Experiment C removed that requirement by training on Q2Q4 returns and predicting Q5 return direction using Q3-Q4 values, that is, we treat it as a single quarter moving time series. We created a model using all Q1 attributes, Q2-returns and Q3-returns to classify Q4-returns. This model was then used to classify Q5-return direction. All Q5 values have not been used in the training set.



Some of financial ratios included are: Liquidation value/Market Cap, Book Asset Value/Market Cap, Sales/Market Cap, EBITDA/Enterprise Value, Earnings/Market Cap, Operating Cash Flow/Market Cap, Dividend/Market Cap, Return on Assets, Return on equity, Return on invested capital, Net Asset Value/Total Assets, Revenue growth first half, Earnings per share first half, Net Assets Growth Rate over 5 years, Return on Asset Margin over 5 years, Book to Price previous 5 years average deviation, market capitalization and Dividend Yield ratio previous 5 years average deviation. The rest are variations of the above calculated with slightly different periods and transformations.



In all the experiments we used Weka to perform the training and classification. In some cases we also performed the same experiment using Rattle/R [22]. We used the Random Forest [6] classifier in Rattle/R to obtain the important attributes. We exclusively used the Random Forest classifier in Weka [23], varying the number of trees and random attributes to choose from. We typically use 100-400 trees and with 10fold validation unless mentioned otherwise. Some of the attributes might not be independent since the random forest algorithm would only randomly selects ome of attributes for each tree.Varying the number of trees and attributes only resulted in marginal changes.



We investigated if the calculated financial ratios have any impact on future price direction. Instead of using actual return values, we set the values to be 1 or 0, depending on whether it is above a threshold (e.g. threshold=0) or below it.



V. A. Experiments We performed the following experiments.



EXPERIMENTAL RESULTS



1) Experiment A.Used all attributes (columns) but vary NA values, quarter, year and threshold a) Combined all Q4 as rows from 2011-2014 by using all columns to create a model to classify returns direction with threshold set to 0. We only used Q4 instead of Q1 through Q4 because the data may be seasonal [21]. The table rows were partitioned into 10-fold validation.



Experiment A(a) used combined quarters of Q4 from 2011 to 2014 with the companies that had incomplete information removed. The classes were balanced by removing some classes manually so that class distribution was less skewed (0=195, 1=173). We also tested classes that were resampled using the Weka Class balancer to create an equal class distribution (0=240,1=240). Same quarters were used because there might be seasonal patterns. The classifier was to return Q4-returns (which are the next quarter returns) direction. The results are shown in Table 1.



b) Model each quarter (Q1,Q2,Q3,Q4) in 2013 and 2014 to classify return direction separately.



TABLE I.



c)



Balancing Manual Manual Manual Weka



As in b) above but used different threshold.



d) As in b) above but used different NA values. We wanted to test what NA values should be used.



978-1-5386-0765-7/17/$31.00 ©2017 IEEE



39



2011-2014 Q4 RETURNS DIRECTION CLASSIFICATION Trees 100 200 400 400



Accuracy % 65.22 66:03 66.30 62.90



Kappa 0.2931 0.3081 0.3147 0.2596



AUC 0.699 0.709



There were only marginal differences in using different number of trees. Secondly reporting accuracy (the percentage of correctly classified instances) was misleading because of skewed class distribution, so would be reported as Area under the Curve (AUC) values instead [24]. A value of AUC=0.5 or less was random and a value of 1.0 was perfect classification. A good value of AUC should be around 0.8 and above. A(b)-(c). The per quarter results are presented in Table II. The number of trees was 400. The number of companies is 433 in each quarter. The input attributes were the entire financial ratios to classify the next quarter return direction. The accuracy % precentage should not be used for comparison because of skewed class distribution.



TABLE II.



2013-2014 PER QUARTER RETURNS CLASSIFICATION



QYYYY 12013 22013 32013 42013 12014 12014 22014 42014 42014



Threshold 0.2 0.2 0.2 0.2 0 0.2 0.2 0 0.2



NA



Accuracy 70.43 70.2 67.9 65.5 67.2 83.4 61.4 67.9 66.1



99 99 99 99 0 0 0 0 0



AUC 0.755 0.659 0.712 0.652 0.651 0.616 0.633 0.718 0.718



Fig. 1. Hang Seng Index Jan 2011- Jan 2015 with SMA and EMA overlays



D D D



All



D



D D D D D D D



Return



TABLE III.



2014 Q3 AND Q4 RETURNS DIRECTION PREDICTION WITH DIFFERENT ATTRIBUTES



Return.2 D D



Return.3 D



D



D



D D



D



TABLE IV.



2013 Q3 AND Q4 RETURNS DIRECTION PREDICTION WITH DIFFERENT ATTRIBUTES



Return.2 D D D



Return.3 D D



D



D



D



All



D D D D D



Return



D



RET.2



RET.3



D



R3



D D D



RET.2



D D



Table III shows the results of 2014 quarters using different attributes. Return.2 refers to the third quarter returns. RET.2 refers to the cumulative value by adding Return and Return.2; similarly for RET.3. R3 were set to 1 if the RET.3 were above one, and zero otherwise. Table IV shows similar results for 2013.



D D



RET.3



D D



Class R4 R4 R4 R4 R3 R3 R4 R4 R4



D



D D D D D



D D D



D



R3



Class R4 R4 R4 R4 R4



AUC 0.853 0.739 0.645 0.889 0.741 0.819 0.867 0.849 0.864



AUC 0.937 0.972 0.847 0.724 0.761



The values indicated that R4 values (direction) were fairly predictive without the financial ratios, in other words, it was predictive based on prices alone. Given various financial ratios calculated over every quarter and the returns for subsequent quarter, the 4th quarter was quite predictable. In fact, it was predictable without the financial ratios. However, this form of classification used R4 information in generating the model.



978-1-5386-0765-7/17/$31.00 ©2017 IEEE



40



We have found that prices were informative in predicting price directions. In Table V, the results of using quarter returns as a time series prediction with one quarter lag were presented. The table also shows which attributes were used. We trained the model using up to Q3 data and used it to classify Q4 data. TABLE V. QYYYY 12014



All D



12014 12011 12011



D



These results also explained why some research reported highly predictive values; that was by selecting the appropriate time period for their experiments to obtain high prediction scores. The results implied that the correct predictive model may need to be episodic (essentially non-stationary), that is the model is only useful within a period of time and may be required to be updated when there is a change in the operating environment. Many researches have shown that stock prices are non-stationary [17]. This suggests a research direction to identify what are the triggers for the change in the environment (also called regime change) that will prompt model relearning. Some of these triggers will be an external event in the macro environment, and will be conveyed through news events or social media. It can also be conveyed through the early price movements. Many of the researches use these signals directly for prediction; we believe that it will be useful to study them as triggers for model re-learning.



USING Q3 LAGGED VALUES TO PREDICT PRICE DIRECTION AT Q4 FOR 2011 AND 2014 Return,Return .2,RET.2,R3



Accuracy



Kappa



AUC



D



88.98



0.7761



0.920



D



88.04



0.6936



0.880



D



38.85



-0.1238



0.414



D



34.20



-0.138



0.396



There seemed to be an anomaly here. The data was predictive for 2014 but not for 2011. We would discuss this further in the conclusion section. VI.



REFERENCES



DISCUSSION



[1]



The results reported in Table I and II showed just slightly better than random results. This indicated that using financial ratios to predict the next quarter results was not reliable. This was despite the fact that the next quarter return values were used in creating the model. We did not test further the impact of using different threshold or NA values since the prediction results were not good, increasing slight accuracy was not meaningful.



[2]



[3]



[4]



Tables III and IV showed the results for longer term impact beyond the one quarter returns. They tested for 1 year (4 quarters) cumulative return direction (R3) and for 5 quarters return direction (R4). The results were more positive here, seemingly there was a correlation between the inputs to the predicted class. It seemed there was a close relationship between the quarterly returns to R3 and R4, again indicating that the financial ratios did not play a big role. On closer examination, R3 and R4 were also highly correlated.



[5]



[6] [7]



[8]



However, the same results were not obtained when using 2011 data, as shown in Table V. The results in Table V were obtained using a model trained up to R3 but tested on R4, unlike in previous results. Good results were obtained for 2014. For the year 2011, the model trained up to R3 could not predict values for R4 that it had not seen.



[9] [10] [11]



This can be cleared up by examining the overall Hong Kong Hang Seng index for those relevant years. Fig. 1 shows the period from January 2011 to January 2015 with Simple Moving Average (SMA-50) and Exponential Moving Average (EMA-50), both with a period of 50 days. The difference between the two periods 2011 and 2014 can be seen clearly. The period from 2011 to early 2012 showed an early rise and drop whereas in 2014, it was a steady rise until early 2015. This would explain why the model for 2014 could be predictive for 1Q 2015 (R4) and not for 2012 because the prices in 20112012 were fluctuating. The simple model was not powerful enough to take into consideration all the cause of the fluctuation.



[12]



[13]



[14]



[15]



978-1-5386-0765-7/17/$31.00 ©2017 IEEE



41



E. F. Fama, "Efficient Captial markets: A Review of Theory and Emprical Work.," Journal of Finance, vol. 25, no. 2, pp. 383-417, 1970. M. Rechenthin and W. Street, "Using conditional probability to identify trends in intra-day high-frequencey equity pricing," Physica A; Statistical Mechanics and its Applications, vol. 392, no. 24, pp. 61696188, 2013. G. Gidofalvi and C. Elkan, "Using News Articles to Predict Stock Price Movements," Department of Computer Science and Engineering, University of California, San Diego, 2003. K. S. Loke and P. Chan, "Prediction of Individual Stock Movements in Bursa Malaysia using Online News," in SME-Entrepreunership Global Conference, Kuala Lumpur, 2006. G. P. Fung, J. X. Yu and W. Lam, "News Sensitive Stock Trend Prediction," in Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining., Taipei, 2002. L. Breiman, "Random forestes.," Machine Learning, vol. 45, pp. 5-32, 2001. E. Hjalmarsson, "On the Predictability of Global Stock Returns," School of Business, Economics and Law. Goteborg University, Gothenburg, 2005. E. Fama and K. French, "Dividend yields and expected stock returns," Journal of Financial Economics, vol. 22, pp. 3-25, 1998. J. Lewellen, "Predicting returns with financial ratios," Journal of Financial Economics, vol. 74, pp. 209-235, 2004. A. Goyal and I. Welch, "A Note on "Predicting Returns with Financial Ratios"," Yale School of Management, 2003. S. T. Lau, T. C. Lee and T. H. McInish, "Stock Returns and Beta, Firms Size, E/P, CF/P, Book to Market and Sales Growth: Evidence from Singapore and Malaysia," Journal of Multinational Financial Management, vol. 12, pp. 207-222, 2002. S. Mabu, K. Hirasawa, M. Obayashi and T. Kuremoto, "Enhanced decision making mechanism of rule-based genetic network programming for creating stock trading signals," Expert Systems with Applications, vol. 40, pp. 6311-6320, 2013. F. Wang, P. L. Yu and D. W. Cheung, "Combining technical trading rules using particle swarm optimization," Expert Systems with Applications, vol. 41, pp. 3016-3026, 2014. Y. Hu, B. Feng, X. Zhang, E. Ngai and M. Liu, "Stock trading rule discovery with an evolutionary trend following model," Expert Systems with Applications, vol. 42, pp. 212-222, 2015. J. Patel, S. Shah, P. Thakkar and K. Kotecha, "Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques," Expert Systems with Applications, vol. 42, pp. 259-268, 2015.



[21] R. Ariel, "A monthly effect in stock returns," Journal of Financial Economics, vol. 18, pp. 161-174, 1987. [22] G. J. Williams, Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Springer, 2011. [23] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Explorations, vol. 11, no. 1, 2009. [24] T. Fawcett, "ROC graphs: Notes and practical considerations for researchers," ReCALL, Vols. HPL-2003-4, no. 31, pp. 1-38, 2004. [25] M. Ariff, M. Shamsher and M. N. Annuar, "Stock Pricing in Malaysia: Financial and Investment Management," in Financial Economics Behaviour of an Emerging Capital Market, Serdang, University Putra Malaysia Press, 1998. [26] X. Jiang and B.-S. Lee, "Do Decomposed Financial Ratios Predict Stock Returns and Fundamentals Better?," Social Science Research Network, 2009.



[16] M. Ballings, D. V. den Poel, N. Hespeels and R. Gryp, "Evaluating multiple classifiers for stock price direction prediction," Expert Systems with Applications, vol. 42, pp. 7046-7056, 2015. [17] P. Ladyzynski, K. Zbikowsk and P. Grzegorzewski, "Stock Trading With Random Forests, Trend Detection Tests and Force Index Volume Indicators," Artificial intelligence and soft computing, vol. 1, pp. 441452, 2013. [18] A. Booth, E. Gerding and F. McGroarty, "Automated trading with performance weighted random forests," Expert Systems with Applications, p. 3651–3661, 2014. [19] Y. Amit and D. Geman, "Shape quantization and recognition with randomized trees," Neural Computation, vol. 9, no. 7, pp. 1545-1588, 1997. [20] T. Ho , "The Random Subspace Method for Constructing Decision Forests," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 822-844, 1998.



978-1-5386-0765-7/17/$31.00 ©2017 IEEE



42